Computer Science

New submissions
Cross-lists
Replacements

Total of 657 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2405.03695 [pdf, ps, other]: Title: Evaluating Large Language Models for Material Selection

Daniele Grandi, Yash Patawari Jain, Allin Groom, Brandon Cramer, Christopher McComb

Comments: arXiv admin note: text overlap with arXiv:2307.03109 by other authors

Subjects: Computation and Language (cs.CL)

Material selection is a crucial step in conceptual design due to its significant impact on the functionality, aesthetics, manufacturability, and sustainability impact of the final product. This study investigates the use of Large Language Models (LLMs) for material selection in the product design process and compares the performance of LLMs against expert choices for various design scenarios. By collecting a dataset of expert material preferences, the study provides a basis for evaluating how well LLMs can align with expert recommendations through prompt engineering and hyperparameter tuning. The divergence between LLM and expert recommendations is measured across different model configurations, prompt strategies, and temperature settings. This approach allows for a detailed analysis of factors influencing the LLMs' effectiveness in recommending materials. The results from this study highlight two failure modes, and identify parallel prompting as a useful prompt-engineering method when using LLMs for material selection. The findings further suggest that, while LLMs can provide valuable assistance, their recommendations often vary significantly from those of human experts. This discrepancy underscores the need for further research into how LLMs can be better tailored to replicate expert decision-making in material selection. This work contributes to the growing body of knowledge on how LLMs can be integrated into the design process, offering insights into their current limitations and potential for future improvements.
[2] arXiv:2405.03697 [pdf, ps, html, other]: Title: GeoViz: A Multi-View Visualization Platform for Spatio-temporal Knowledge Graph

Jianping Zhou, Junhao Li, Guanjie Zheng, Yunqiang Zhu, Xinbing Wang, Chenghu Zhou

Comments: 4 pages, 2 figures

Subjects: Human-Computer Interaction (cs.HC)

In this paper, we propose a multi-view visualization technology for spatio-temporal knowledge graph(STKG), which utilizes three distinct perspectives: knowledge tree, knowledge net, and knowledge map, to facilitate a comprehensive analysis of the STKG. The knowledge tree enables the visualization of hierarchical interrelation within the STKG, while the knowledge net elucidates semantic relationships among knowledge entities. Additionally, the knowledge map displays spatial and temporal distributions via spatial maps and time axes, respectively. Our visualization technology addresses the limitations inherent in single-view approaches and the deficiency of interaction in spatio-temporal perspectives evident in existing visualization methods. Moreover, we have encapsulated this technology within an integrated, open-source platform named GeoViz. A demo video of GeoViz can be accessed at https://anonymous.4open.science/r/GeoViz.
[3] arXiv:2405.03699 [pdf, ps, html, other]: Title: HCC Is All You Need: Alignment-The Sensible Kind Anyway-Is Just Human-Centered Computing

Eric Gilbert

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This article argues that AI Alignment is a type of Human-Centered Computing.
[4] arXiv:2405.03702 [pdf, ps, html, other]: Title: Leafy Spurge Dataset: Real-world Weed Classification Within Aerial Drone Imagery

Kyle Doherty, Max Gurinas, Erik Samsoe, Charles Casper, Beau Larkin, Philip Ramsey, Brandon Trabucco, Ruslan Salakhutdinov

Comments: Official Dataset Technical Report. Used in DA-Fusion (arXiv:2302.07944)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Invasive plant species are detrimental to the ecology of both agricultural and wildland areas. Euphorbia esula, or leafy spurge, is one such plant that has spread through much of North America from Eastern Europe. When paired with contemporary computer vision systems, unmanned aerial vehicles, or drones, offer the means to track expansion of problem plants, such as leafy spurge, and improve chances of controlling these weeds. We gathered a dataset of leafy spurge presence and absence in grasslands of western Montana, USA, then surveyed these areas with a commercial drone. We trained image classifiers on these data, and our best performing model, a pre-trained DINOv2 vision transformer, identified leafy spurge with 0.84 accuracy (test set). This result indicates that classification of leafy spurge is tractable, but not solved. We release this unique dataset of labelled and unlabelled, aerial drone imagery for the machine learning community to explore. Improving classification performance of leafy spurge would benefit the fields of ecology, conservation, and remote sensing alike. Code and data are available at our website: this http URL.
[5] arXiv:2405.03706 [pdf, ps, html, other]: Title: Improving Graph Machine Learning Performance Through Feature Augmentation Based on Network Control Theory

Anwar Said, Obaid Ullah Ahmad, Waseem Abbas, Mudassir Shabbir, Xenofon Koutsoukos

Comments: Conference paper

Subjects: Machine Learning (cs.LG)

Network control theory (NCT) offers a robust analytical framework for understanding the influence of network topology on dynamic behaviors, enabling researchers to decipher how certain patterns of external control measures can steer system dynamics towards desired states. Distinguished from other structure-function methodologies, NCT's predictive capabilities can be coupled with deploying Graph Neural Networks (GNNs), which have demonstrated exceptional utility in various network-based learning tasks. However, the performance of GNNs heavily relies on the expressiveness of node features, and the lack of node features can greatly degrade their performance. Furthermore, many real-world systems may lack node-level information, posing a challenge for this http URL tackle this challenge, we introduce a novel approach, NCT-based Enhanced Feature Augmentation (NCT-EFA), that assimilates average controllability, along with other centrality indices, into the feature augmentation pipeline to enhance GNNs performance. Our evaluation of NCT-EFA, on six benchmark GNN models across two experimental setting. solely employing average controllability and in combination with additional centrality metrics. showcases an improved performance reaching as high as 11%. Our results demonstrate that incorporating NCT into feature enrichment can substantively extend the applicability and heighten the performance of GNNs in scenarios where node-level information is unavailable.
[6] arXiv:2405.03708 [pdf, ps, other]: Title: Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake

Zhiwei Bao, Liu Liao-Liao, Zhiyu Wu, Yifan Zhou, Dan Fan, Michal Aibin, Yvonne Coady

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Machine Learning (cs.LG)

The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments
[7] arXiv:2405.03709 [pdf, ps, html, other]: Title: Generating Probabilistic Scenario Programs from Natural Language

Karim Elmaaroufi, Devan Shankar, Ana Cismaru, Marcell Vazquez-Chanlatte, Alberto Sangiovanni-Vincentelli, Matei Zaharia, Sanjit A. Seshia

Comments: 17 pages, 2 figures

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)

For cyber-physical systems (CPS), including robotics and autonomous vehicles, mass deployment has been hindered by fatal errors that occur when operating in rare events. To replicate rare events such as vehicle crashes, many companies have created logging systems and employed crash reconstruction experts to meticulously recreate these valuable events in simulation. However, in these methods, "what if" questions are not easily formulated and answered. We present ScenarioNL, an AI System for creating scenario programs from natural language. Specifically, we generate these programs from police crash reports. Reports normally contain uncertainty about the exact details of the incidents which we represent through a Probabilistic Programming Language (PPL), Scenic. By using Scenic, we can clearly and concisely represent uncertainty and variation over CPS behaviors, properties, and interactions. We demonstrate how commonplace prompting techniques with the best Large Language Models (LLM) are incapable of reasoning about probabilistic scenario programs and generating code for low-resource languages such as Scenic. Our system is comprised of several LLMs chained together with several kinds of prompting strategies, a compiler, and a simulator. We evaluate our system on publicly available autonomous vehicle crash reports in California from the last five years and share insights into how we generate code that is both semantically meaningful and syntactically correct.
[8] arXiv:2405.03710 [pdf, ps, html, other]: Title: Automating the Enterprise with Foundation Models

Michael Wornow, Avanika Narayan, Krista Opsahl-Ong, Quinn McIntyre, Nigam H. Shah, Christopher Re

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: this https URL
[9] arXiv:2405.03711 [pdf, ps, html, other]: Title: Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

Comments: 13 pages, 13 figures, accepted to appear on IEEE Access, Mar. 2024

Journal-ref: IEEE Access, vol. 12, pp. 48210-48222, Mar. 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)

Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving this problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Simulation results demonstrate that the proposed guidance design method based on the PPO algorithm is capable of achieving a residual velocity of 67.24 m/s, higher than the residual velocities achieved by the benchmark soft actor-critic and deep deterministic policy gradient algorithms. Furthermore, the proposed ES-enhanced PPO algorithm outperforms the PPO algorithm by 2.7\%, achieving a residual velocity of 69.04 m/s.
[10] arXiv:2405.03712 [pdf, ps, html, other]: Title: Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

Xiaoyan Su, Yinghao Zhu, Run Li

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.
[11] arXiv:2405.03714 [pdf, ps, html, other]: Title: UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification

Siddhant Kharbanda, Devaansh Gupta, Gururaj K, Pankaj Malhotra, Cho-Jui Hsieh, Rohit Babbar

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally used modular approach with (i) a Dual Encoder (DE) to embed the queries and label texts, (ii) a One-vs-All classifier to rerank the shortlisted labels mined through meta-classifier training. While such methods have shown empirical success, we observe two key uncharted aspects, (i) DE training typically uses only a single positive relation even for datasets which offer more, (ii) existing approaches fixate on using only OvA reduction of the multi-label problem. This work aims to explore these aspects by proposing UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together in a unified fashion using a multi-class loss. For the choice of multi-class loss, the work proposes a novel pick-some-label (PSL) reduction of the multi-label problem with leverages multiple (in come cases, all) positives. The proposed framework achieves state-of-the-art results on a single GPU, while achieving on par results with respect to multi-GPU SOTA methods on various XML benchmark datasets, all while using 4-16x lesser compute and being practically scalable even beyond million label scale datasets.
[12] arXiv:2405.03715 [pdf, ps, html, other]: Title: Iterative Filter Pruning for Concatenation-based CNN Architectures

Svetlana Pavlitska, Oliver Bagge, Federico Peccia, Toghrul Mammadov, J. Marius Zöllner

Comments: Accepted for publication at IJCNN 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Model compression and hardware acceleration are essential for the resource-efficient deployment of deep neural networks. Modern object detectors have highly interconnected convolutional layers with concatenations. In this work, we study how pruning can be applied to such architectures, exemplary for YOLOv7. We propose a method to handle concatenation layers, based on the connectivity graph of convolutional layers. By automating iterative sensitivity analysis, pruning, and subsequent model fine-tuning, we can significantly reduce model size both in terms of the number of parameters and FLOPs, while keeping comparable model accuracy. Finally, we deploy pruned models to FPGA and NVIDIA Jetson Xavier AGX. Pruned models demonstrate a 2x speedup for the convolutional layers in comparison to the unpruned counterparts and reach real-time capability with 14 FPS on FPGA. Our code is available at this https URL.
[13] arXiv:2405.03716 [pdf, ps, other]: Title: Predicting the usability of mobile applications using AI tools: the rise of large user interface models, opportunities, and challenges

Abdallah Namoun, Ahmed Alrehaili, Zaib Un Nisa, Hani Almoamari, Ali Tufail

Comments: 12 pages, 3 figures, 4 tables, The 7th International Conference on Emerging Data and Industry (EDI40)

Journal-ref: 2024; Procedia Computer Science

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This article proposes the so-called large user interface models (LUIMs) to enable the generation of user interfaces and prediction of usability using artificial intelligence in the context of mobile applications.
[14] arXiv:2405.03718 [pdf, ps, html, other]: Title: A Single Online Agent Can Efficiently Learn Mean Field Games

Chenyu Zhang, Xu Chen, Xuan Di

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.
[15] arXiv:2405.03720 [pdf, ps, html, other]: Title: Spatial Transfer Learning with Simple MLP

Hongjian Yang

Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

First step to investigate the potential of transfer learning applied to the field of spatial statistics
[16] arXiv:2405.03722 [pdf, ps, html, other]: Title: Class-relevant Patch Embedding Selection for Few-Shot Image Classification

Weihao Jiang, Haoyang Cui, Kun He

Comments: arXiv admin note: text overlap with arXiv:2405.03109

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Effective image classification hinges on discerning relevant features from both foreground and background elements, with the foreground typically holding the critical information. While humans adeptly classify images with limited exposure, artificial neural networks often struggle with feature selection from rare samples. To address this challenge, we propose a novel method for selecting class-relevant patch embeddings. Our approach involves splitting support and query images into patches, encoding them using a pre-trained Vision Transformer (ViT) to obtain class embeddings and patch embeddings, respectively. Subsequently, we filter patch embeddings using class embeddings to retain only the class-relevant ones. For each image, we calculate the similarity between class embedding and each patch embedding, sort the similarity sequence in descending order, and only retain top-ranked patch embeddings. By prioritizing similarity between the class embedding and patch embeddings, we select top-ranked patch embeddings to be fused with class embedding to form a comprehensive image representation, enhancing pattern recognition across instances. Our strategy effectively mitigates the impact of class-irrelevant patch embeddings, yielding improved performance in pre-trained models. Extensive experiments on popular few-shot classification benchmarks demonstrate the simplicity, efficacy, and computational efficiency of our approach, outperforming state-of-the-art baselines under both 5-shot and 1-shot scenarios.
[17] arXiv:2405.03723 [pdf, ps, html, other]: Title: Generative adversarial learning with optimal input dimension and its adaptive generator architecture

Zhiyao Tan, Ling Zhou, Huazhen Lin

Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We investigate the impact of the input dimension on the generalization error in generative adversarial networks (GANs). In particular, we first provide both theoretical and practical evidence to validate the existence of an optimal input dimension (OID) that minimizes the generalization error. Then, to identify the OID, we introduce a novel framework called generalized GANs (G-GANs), which includes existing GANs as a special case. By incorporating the group penalty and the architecture penalty developed in the paper, G-GANs have several intriguing features. First, our framework offers adaptive dimensionality reduction from the initial dimension to a dimension necessary for generating the target distribution. Second, this reduction in dimensionality also shrinks the required size of the generator network architecture, which is automatically identified by the proposed architecture penalty. Both reductions in dimensionality and the generator network significantly improve the stability and the accuracy of the estimation and prediction. Theoretical support for the consistent selection of the input dimension and the generator network is provided. Third, the proposed algorithm involves an end-to-end training process, and the algorithm allows for dynamic adjustments between the input dimension and the generator network during training, further enhancing the overall performance of G-GANs. Extensive experiments conducted with simulated and benchmark data demonstrate the superior performance of G-GANs. In particular, compared to that of off-the-shelf methods, G-GANs achieves an average improvement of 45.68% in the CT slice dataset, 43.22% in the MNIST dataset and 46.94% in the FashionMNIST dataset in terms of the maximum mean discrepancy or Frechet inception distance. Moreover, the features generated based on the input dimensions identified by G-GANs align with visually significant features.
[18] arXiv:2405.03724 [pdf, ps, html, other]: Title: GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets

Junxiang Wang, Liang Zhao

Subjects: Machine Learning (cs.LG)

We present GraphSL, a novel library designed for investigating the graph source localization problem. Our library facilitates the exploration of various graph diffusion models for simulating information spread and enables the evaluation of cutting-edge source localization approaches on established benchmark datasets. The source code of GraphSL is made available at \url{this https URL}. Bug reports and feedback can be directed to the Github issues page (\url{this https URL}).
[19] arXiv:2405.03725 [pdf, ps, html, other]: Title: Deep Oscillatory Neural Network

Nurani Rajagopal Rohan, Vigneswaran C, Sayan Ghosh, Kishore Rajendran, Gaurav A, V Srinivasa Chakravarthy

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We propose a novel, brain-inspired deep neural network model known as the Deep Oscillatory Neural Network (DONN). Deep neural networks like the Recurrent Neural Networks indeed possess sequence processing capabilities but the internal states of the network are not designed to exhibit brain-like oscillatory activity. With this motivation, the DONN is designed to have oscillatory internal dynamics. Neurons of the DONN are either nonlinear neural oscillators or traditional neurons with sigmoidal or ReLU activation. The neural oscillator used in the model is the Hopf oscillator, with the dynamics described in the complex domain. Input can be presented to the neural oscillator in three possible modes. The sigmoid and ReLU neurons also use complex-valued extensions. All the weight stages are also complex-valued. Training follows the general principle of weight change by minimizing the output error and therefore has an overall resemblance to complex backpropagation. A generalization of DONN to convolutional networks known as the Oscillatory Convolutional Neural Network is also proposed. The two proposed oscillatory networks are applied to a variety of benchmark problems in signal and image/video processing. The performance of the proposed models is either comparable or superior to published results on the same data sets.
[20] arXiv:2405.03727 [pdf, ps, html, other]: Title: Large Language Models Synergize with Automated Machine Learning

Jinglue Xu, Zhen Liu, Nagar Anthel Venkatesh Suryanarayanan, Hitoshi Iba

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)

Recently, code generation driven by large language models (LLMs) has become increasingly popular. However, automatically generating code for machine learning (ML) tasks still poses significant challenges. This paper explores the limits of program synthesis for ML by combining LLMs and automated machine learning (autoML). Specifically, our goal is to fully automate the code generation process for the entire ML workflow, from data preparation to modeling and post-processing, utilizing only textual descriptions of the ML tasks. To manage the length and diversity of ML programs, we propose to break each ML program into smaller, manageable parts. Each part is generated separately by the LLM, with careful consideration of their compatibilities. To implement the approach, we design a testing technique for ML programs. Furthermore, our approach enables integration with autoML. In our approach, autoML serves to numerically assess and optimize the ML programs generated by LLMs. LLMs, in turn, help to bridge the gap between theoretical, algorithm-centered autoML and practical autoML applications. This mutual enhancement underscores the synergy between LLMs and autoML in program synthesis for ML. In experiments across various ML tasks, our method outperforms existing methods in 10 out of 12 tasks for generating ML programs. In addition, autoML significantly improves the performance of the generated ML programs. In the experiments, our method, Text-to-ML, achieves fully automated synthesis of the entire ML pipeline based solely on textual descriptions of the ML tasks.
[21] arXiv:2405.03728 [pdf, ps, html, other]: Title: GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

Xiaobin Li, Kai Wu, Yujian Betterest Li, Xiaoyu Zhang, Handing Wang, Jing Liu

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Pretrained Optimization Models (POMs) leverage knowledge gained from optimizing various tasks, providing efficient solutions for new optimization challenges through direct usage or fine-tuning. Despite the inefficiencies and limited generalization abilities observed in current POMs, our proposed model, the general pre-trained optimization model (GPOM), addresses these shortcomings. GPOM constructs a population-based pretrained Black-Box Optimization (BBO) model tailored for continuous optimization. Evaluation on the BBOB benchmark and two robot control tasks demonstrates that GPOM outperforms other pretrained BBO models significantly, especially for high-dimensional tasks. Its direct optimization performance exceeds that of state-of-the-art evolutionary algorithms and POMs. Furthermore, GPOM exhibits robust generalization capabilities across diverse task distributions, dimensions, population sizes, and optimization horizons.
[22] arXiv:2405.03730 [pdf, ps, html, other]: Title: Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers

Johann Schmidt, Sebastian Stober

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks are applied in more and more areas of everyday life. However, they still lack essential abilities, such as robustly dealing with spatially transformed input signals. Approaches to mitigate this severe robustness issue are limited to two pathways: Either models are implicitly regularised by increased sample variability (data augmentation) or explicitly constrained by hard-coded inductive biases. The limiting factor of the former is the size of the data space, which renders sufficient sample coverage intractable. The latter is limited by the engineering effort required to develop such inductive biases for every possible scenario. Instead, we take inspiration from human behaviour, where percepts are modified by mental or physical actions during inference. We propose a novel technique to emulate such an inference process for neural nets. This is achieved by traversing a sparsified inverse transformation tree during inference using parallel energy-based evaluations. Our proposed inference algorithm, called Inverse Transformation Search (ITS), is model-agnostic and equips the model with zero-shot pseudo-invariance to spatially transformed inputs. We evaluated our method on several benchmark datasets, including a synthesised ImageNet test set. ITS outperforms the utilised baselines on all zero-shot test scenarios.
[23] arXiv:2405.03734 [pdf, ps, html, other]: Title: FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering

Silan Hu, Xiaoning Wang

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Applications (stat.AP)

Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance.
We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction.
[24] arXiv:2405.03735 [pdf, ps, html, other]: Title: Select to Perfect: Imitating desired behavior from large multi-agent data

Tim Franzmeyer, Edith Elkind, Philip Torr, Jakob Foerster, Joao Henriques

Comments: ICLR 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

AI agents are commonly trained with large datasets of demonstrations of human behavior. However, not all behaviors are equally safe or desirable. Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories. For example, in a dataset of vehicle interactions, these scores might relate to the number of incidents that occurred. We first assess the effect of each individual agent's behavior on the collective desirability score, e.g., assessing how likely an agent is to cause incidents. This allows us to selectively imitate agents with a positive effect, e.g., only imitating agents that are unlikely to cause incidents. To enable this, we propose the concept of an agent's Exchange Value, which quantifies an individual agent's contribution to the collective desirability score. The Exchange Value is the expected change in desirability score when substituting the agent for a randomly selected agent. We propose additional methods for estimating Exchange Values from real-world datasets, enabling us to learn desired imitation policies that outperform relevant baselines. The project website can be found at this https URL.
[25] arXiv:2405.03764 [pdf, ps, html, other]: Title: GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

Wenjie Zhou, Zhenxin Ding, Xiaodong Zhang, Haibo Shi, Junfeng Wang, Dawei Yin

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. For practical deployment, it is critical to carry out knowledge distillation to preserve high performance under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student performance, how does one effectively ensemble knowledge from multiple teachers at this stage without the guidance of ground-truth labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments. The proposed algorithm has been successfully deployed in a real-world commercial question-answering system.
[26] arXiv:2405.03770 [pdf, ps, html, other]: Title: Foundation Models for Video Understanding: A Survey

Neelu Madan, Andreas Moegelmose, Rajat Modi, Yogesh S. Rawat, Thomas B. Moeslund

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs achieve this by capturing robust and generic features from video data. This survey analyzes over 200 video foundational models, offering a comprehensive overview of benchmarks and evaluation metrics across 14 distinct video tasks categorized into 3 main categories. Additionally, we offer an in-depth performance analysis of these models for the 6 most common video tasks. We categorize ViFMs into three categories: 1) Image-based ViFMs, which adapt existing image models for video tasks, 2) Video-Based ViFMs, which utilize video-specific encoding methods, and 3) Universal Foundational Models (UFMs), which combine multiple modalities (image, video, audio, and text etc.) within a single framework. By comparing the performance of various ViFMs on different tasks, this survey offers valuable insights into their strengths and weaknesses, guiding future advancements in video understanding. Our analysis surprisingly reveals that image-based foundation models consistently outperform video-based models on most video understanding tasks. Additionally, UFMs, which leverage diverse modalities, demonstrate superior performance on video tasks. We share the comprehensive list of ViFMs studied in this work at: \url{this https URL}
[27] arXiv:2405.03775 [pdf, ps, html, other]: Title: Secure Inference for Vertically Partitioned Data Using Multiparty Homomorphic Encryption

Shuangyi Chen, Yue Ju, Zhongwen Zhu, Ashish Khisti

Subjects: Cryptography and Security (cs.CR)

We propose a secure inference protocol for a distributed setting involving a single server node and multiple client nodes. We assume that the observed data vector is partitioned across multiple client nodes while the deep learning model is located at the server node. Each client node is required to encrypt its portion of the data vector and transmit the resulting ciphertext to the server node. The server node is required to collect the ciphertexts and perform inference in the encrypted domain. We demonstrate an application of multi-party homomorphic encryption (MPHE) to satisfy these requirements. We propose a packing scheme, that enables the server to form the ciphertext of the complete data by aggregating the ciphertext of data subsets encrypted using MPHE. While our proposed protocol builds upon prior horizontal federated training protocol~\cite{sav2020poseidon}, we focus on the inference for vertically partitioned data and avoid the transmission of (encrypted) model weights from the server node to the client nodes.
[28] arXiv:2405.03777 [pdf, ps, html, other]: Title: Is ReLU Adversarially Robust?

Korn Sooksatra, Greg Hamerly, Pablo Rivas

Comments: 10 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The efficacy of deep learning models has been called into question by the presence of adversarial examples. Addressing the vulnerability of deep learning models to adversarial examples is crucial for ensuring their continued development and deployment. In this work, we focus on the role of rectified linear unit (ReLU) activation functions in the generation of adversarial examples. ReLU functions are commonly used in deep learning models because they facilitate the training process. However, our empirical analysis demonstrates that ReLU functions are not robust against adversarial examples. We propose a modified version of the ReLU function, which improves robustness against adversarial examples. Our results are supported by an experiment, which confirms the effectiveness of our proposed modification. Additionally, we demonstrate that applying adversarial training to our customized model further enhances its robustness compared to a general model.
[29] arXiv:2405.03782 [pdf, ps, html, other]: Title: Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching

Mengchen Fan, Baocheng Geng, Keren Li, Xueqian Wang, Pramod K. Varshney

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

This paper introduces a representative-based approach for distributed learning that transforms multiple raw data points into a virtual representation. Unlike traditional distributed learning methods such as Federated Learning, which do not offer human interpretability, our method makes complex machine learning processes accessible and comprehensible. It achieves this by condensing extensive datasets into digestible formats, thus fostering intuitive human-machine interactions. Additionally, this approach maintains privacy and communication efficiency, and it matches the training performance of models using raw data. Simulation results show that our approach is competitive with or outperforms traditional Federated Learning in accuracy and convergence, especially in scenarios with complex models and a higher number of clients. This framework marks a step forward in integrating human intuition with machine intelligence, which potentially enhances human-machine learning interfaces and collaborative efforts.
[30] arXiv:2405.03783 [pdf, ps, html, other]: Title: Merging Parameter Estimation and Classification Using LASSO

Le Wang, Ying Wang, Yu Qiu, Mian Li, Håkan Hjalmarsson

Subjects: Systems and Control (eess.SY)

Soft sensing is a way to indirectly obtain information of signals for which direct sensing is difficult or prohibitively expensive. It may not a priori be evident which sensors provide useful information about the target signal. There may be sensors irrelevant for the estimation as well as sensors for which the information is very poor. It is often required that the soft sensor should cover a wide range of operating points. This means that some sensors may be useful in certain operating conditions while irrelevant in others, while others may have no bearing on the target signal whatsoever. However, this type of structural information is typically not available but has to be deduced from data. A further compounding issue is that multiple operating conditions may be described by the same model, but which ones is not known in advance either. In this contribution, we provide a systematic method to construct a soft sensor that can deal with these issues. While the different models can be used, we adopt the multi-input single output finite impulse response models since they are linear in the parameters. We propose a single estimation criterion, where the objectives are encoded in terms of model fit, model sparsity (reducing the number of different models), and model parameter coefficient sparsity (to exclude irrelevant sensors). A post-processing model clustering step is also included. As proof of concept, the method is tested on field test datasets from a prototype vehicle.
[31] arXiv:2405.03786 [pdf, ps, html, other]: Title: TOGLL: Correct and Strong Test Oracle Generation with LLMs

Soneya Binta Hossain, Matthew Dwyer

Subjects: Software Engineering (cs.SE)

Test oracles play a crucial role in software testing, enabling effective bug detection. Despite initial promise, neural-based methods for automated test oracle generation often result in a large number of false positives and weaker test oracles. While LLMs have demonstrated impressive effectiveness in various software engineering tasks, including code generation, test case creation, and bug fixing, there remains a notable absence of large-scale studies exploring their effectiveness in test oracle generation. The question of whether LLMs can address the challenges in effective oracle generation is both compelling and requires thorough investigation.
In this research, we present the first comprehensive study to investigate the capabilities of LLMs in generating correct, diverse, and strong test oracles capable of effectively identifying a large number of unique bugs. To this end, we fine-tuned seven code LLMs using six distinct prompts on the SF110 dataset. Utilizing the most effective fine-tuned LLM and prompt pair, we introduce TOGLL, a novel LLM-based method for test oracle generation. To investigate the generalizability of TOGLL, we conduct studies on 25 large-scale Java projects. Besides assessing the correctness, we also assess the diversity and strength of the generated oracles. We compare the results against EvoSuite and the state-of-the-art neural method, TOGA. Our findings reveal that TOGLL can produce 3.8 times more correct assertion oracles and 4.9 times more exception oracles. Moreover, our findings demonstrate that TOGLL is capable of generating significantly diverse test oracles. It can detect 1,023 unique bugs that EvoSuite cannot, which is ten times more than what the previous SOTA neural-based method, TOGA, can detect.
[32] arXiv:2405.03789 [pdf, ps, html, other]: Title: On Adversarial Examples for Text Classification by Perturbing Latent Representations

Korn Sooksatra, Bikram Khanal, Pablo Rivas

Comments: 7 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
[33] arXiv:2405.03792 [pdf, ps, html, other]: Title: Prize-Collecting Steiner Tree: A 1.79 Approximation

Ali Ahmadi, Iman Gholami, MohammadTaghi Hajiaghayi, Peyman Jabbarzade, Mohammad Mahdavi

Subjects: Data Structures and Algorithms (cs.DS)

Prize-Collecting Steiner Tree (PCST) is a generalization of the Steiner Tree problem, a fundamental problem in computer science. In the classic Steiner Tree problem, we aim to connect a set of vertices known as terminals using the minimum-weight tree in a given weighted graph. In this generalized version, each vertex has a penalty, and there is flexibility to decide whether to connect each vertex or pay its associated penalty, making the problem more realistic and practical.
Both the Steiner Tree problem and its Prize-Collecting version had long-standing $2$-approximation algorithms, matching the integrality gap of the natural LP formulations for both. This barrier for both problems has been surpassed, with algorithms achieving approximation factors below $2$. While research on the Steiner Tree problem has led to a series of reductions in the approximation ratio below $2$, culminating in a $\ln(4)+\epsilon$ approximation by Byrka, Grandoni, Rothvoß, and Sanità, the Prize-Collecting version has not seen improvements in the past 15 years since the work of Archer, Bateni, Hajiaghayi, and Karloff, which reduced the approximation factor for this problem from $2$ to $1.9672$. Interestingly, even the Prize-Collecting TSP approximation, which was first improved below $2$ in the same paper, has seen several advancements since then.
In this paper, we reduce the approximation factor for the PCST problem substantially to 1.7994 via a novel iterative approach.
[34] arXiv:2405.03794 [pdf, ps, html, other]: Title: Detecting Anti-Semitic Hate Speech using Transformer-based Large Language Models

Dengyi Liu, Minghao Wang, Andrew G. Catlin

Subjects: Computation and Language (cs.CL)

Academic researchers and social media entities grappling with the identification of hate speech face significant challenges, primarily due to the vast scale of data and the dynamic nature of hate speech. Given the ethical and practical limitations of large predictive models like ChatGPT in directly addressing such sensitive issues, our research has explored alternative advanced transformer-based and generative AI technologies since 2019. Specifically, we developed a new data labeling technique and established a proof of concept targeting anti-Semitic hate speech, utilizing a variety of transformer models such as BERT (arXiv:1810.04805), DistillBERT (arXiv:1910.01108), RoBERTa (arXiv:1907.11692), and LLaMA-2 (arXiv:2307.09288), complemented by the LoRA fine-tuning approach (arXiv:2106.09685). This paper delineates and evaluates the comparative efficacy of these cutting-edge methods in tackling the intricacies of hate speech detection, highlighting the need for responsible and carefully managed AI applications within sensitive contexts.
[35] arXiv:2405.03798 [pdf, ps, html, other]: Title: Update Rate, Accuracy, and Age of Information in a Wireless Sensor Network

Xinlu Dai, Cyril Leung

Subjects: Information Theory (cs.IT)

Age of Information (AoI), namely the time that has elapsed since the most recently delivered packet was generated, is receiving increasing attention with the emergence of many real-time applications that rely on the exchange of time-sensitive information. AoI captures the freshness of the information from the perspective of the destination. The term "accuracy of information" is used to assess how close the estimate at the destination is to the parameter value measured by the sensor. In this paper, the mean square error (MSE) is used to evaluate the accuracy of information. We focus on a single sensor that monitors a time-sensitive physical process, which is modelled as a random walk. Whenever the state of the random walk changes by more than a specified threshold, the sensor generates a status update packet and transmits it to the destination. When no update packet is received, the destination assumes that the state of the process has not changed. We study the problem of finding the minimum update rate under AoI and accuracy of information constraints. More specifically, we derive analytical expressions for the update rate, the AoI, and the MSE.
[36] arXiv:2405.03799 [pdf, ps, html, other]: Title: Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

Bing Hu, Ashish Saragadam, Anita Layton, Helen Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating data sparsity. Data sparsity makes data curation difficult for researchers looking to answer key research questions requiring values posed across multiple datasets. We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end. We show and provide a methodology for sampling pharmacokinetic data for existing ligands using our Syngand model. We show the initial promising results on the efficacy of the Syngand-generated synthetic target property data on downstream regression tasks with AqSolDB, LD50, and hERG central. Using our proposed model and methodology, researchers can easily generate synthetic ligand data to help them explore research questions that require data spanning multiple datasets.
[37] arXiv:2405.03801 [pdf, ps, html, other]: Title: Finding Most Shattering Minimum Vertex Cuts of Polylogarithmic Size in Near-Linear Time

Kevin Hua, Daniel Li, Jaewoo Park, Thatchaphol Saranurak

Comments: Accepted to ICALP 2024

Subjects: Data Structures and Algorithms (cs.DS)

We show the first near-linear time randomized algorithms for listing all minimum vertex cuts of polylogarithmic size that separate the graph into at least three connected components (also known as shredders) and for finding the most shattering one, i.e., the one maximizing the number of connected components. Our algorithms break the quadratic time bound by Cheriyan and Thurimella (STOC'96) for both problems that has been unimproved for more than two decades. Our work also removes an important bottleneck to near-linear time algorithms for the vertex connectivity augmentation problem (Jordan '95) and finding an even-length cycle in a directed graph, a problem shown to be equivalent to many other fundamental problems (Vazirani and Yannakakis '90, Robertson et al. '99). Note that it is necessary to list only minimum vertex cuts that separate the graph into at least three components because there can be an exponential number of minimum vertex cuts in general.
To obtain near-linear time algorithms, we have extended techniques in local flow algorithms developed by Forster et al. (SODA'20) to list shredders on a local scale. We also exploit fast queries to a pairwise vertex connectivity oracle subject to vertex failures (Long and Saranurak FOCS'22, Kosinas ESA'23). This is the first application of connectivity oracles subject to vertex failures to speed up a static graph algorithm.
[38] arXiv:2405.03803 [pdf, ps, html, other]: Title: MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization

Massimiliano Pappa, Luca Collorone, Giovanni Ficarra, Indro Spinelli, Fabio Galasso

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion Models have revolutionized the field of human motion generation by offering exceptional generation quality and fine-grained controllability through natural language conditioning. Their inherent stochasticity, that is the ability to generate various outputs from a single input, is key to their success. However, this diversity should not be unrestricted, as it may lead to unlikely generations. Instead, it should be confined within the boundaries of text-aligned and realistic generations. To address this issue, we propose MoDiPO (Motion Diffusion DPO), a novel methodology that leverages Direct Preference Optimization (DPO) to align text-to-motion models. We streamline the laborious and expensive process of gathering human preferences needed in DPO by leveraging AI feedback instead. This enables us to experiment with novel DPO strategies, using both online and offline generated motion-preference pairs. To foster future research we contribute with a motion-preference dataset which we dub Pick-a-Move. We demonstrate, both qualitatively and quantitatively, that our proposed method yields significantly more realistic motions. In particular, MoDiPO substantially improves Frechet Inception Distance (FID) while retaining the same RPrecision and Multi-Modality performances.
[39] arXiv:2405.03806 [pdf, ps, html, other]: Title: In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker

Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Vivian Tsai, Michael Terry, Carrie J. Cai

Subjects: Human-Computer Interaction (cs.HC)

Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototyping AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-case-specific prototype, there is a crucial need to understand the wide range of in-the-wild input likely to be provided by the user, as well as their in-context expectations of the AI's behavior. To explore the concept of in situ AI prototyping and testing, we created MobileMaker: an AI prototyping tool that enables designers to rapidly create mobile AI prototypes that can be tested on-device, and enables testers to make on-device, in-the-field revisions of the prototype through natural language. In an exploratory study with 16 users, we explored how user feedback on prototypes created with MobileMaker compares to that of existing prototyping tools (e.g., Figma, prompt editors). We found that MobileMaker prototypes enabled more serendipitous discovery of: model input edge cases, discrepancies between AI's and user's in-context interpretation of the task, and contextual signals missed by the AI. Furthermore, we learned that while the ability to make in-the-wild revisions led users to feel more fulfilled as active participants in the design process, it might also constrain their feedback to the subset of changes perceived as more actionable or implementable by the prototyping tool.
[40] arXiv:2405.03807 [pdf, ps, html, other]: Title: UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

Reza Mahjourian, Rongbing Mu, Valerii Likhosherstov, Paul Mougin, Xiukun Huang, Joao Messias, Shimon Whiteson

Comments: Accepted at ICRA 2024

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenario embedding, we ensure that the final generated scenario is fully conditioned on all available context in the existing scene. Our unified modeling approach, combined with autoregressive agent injection, conditions the placement and motion trajectory of every new agent on all existing agents and their trajectories, leading to realistic scenarios with low collision rates. Our experimental results show that UniGen outperforms prior state of the art on the Waymo Open Motion Dataset.
[41] arXiv:2405.03808 [pdf, ps, other]: Title: The Future of Office and Administrative Support Occupations in the Era of Artificial Intelligence: A Bibliometric Analysis

Priyadarshini R. Pennathur, Valerie Boksa, Arunkumar Pennathur, Andrew Kusiak, Beth Livingston

Comments: This work is being submitted to the IEEE for possible publication

Subjects: Computers and Society (cs.CY)

The U.S. Bureau of Labor Statistics projects that by the year 2029, the United States will lose a million jobs in the office and administrative support occupations because technology, automation, and artificial intelligence (AI) have the potential to substitute or replace the office and administrative functions performed by office workers. Despite the potential impact AI will have on office work and the important role office workers play in the American economy, we have limited knowledge of the state of the art research in office work at the intersection of emerging artificial intelligence technologies. In this study, we conducted a bibliometric analysis of the scholarly literature at the intersection of office work and artificial intelligence. We extracted literature sources from Compendex and Scopus databases and used VOSviewer for visualizing and quantifying our bibliometric analyses. Our findings from keywords analysis indicate that office automation, humans, human-computer interaction, and artificial intelligence occurred more frequently in the scholarly literature and had high link strengths. Keyword clusters from co-occurrence analysis indicate that intelligent buildings, robotics, and the internet of things are emerging topics in the office work domain. The two clusters related to ergonomics, worker characteristics, human performance, and safety indicate the types of human factors concerns that are more widely studied in office work settings. In summary, our findings on the state-of-the-art research in office work indicate that more studies have been conducted on smart buildings, robotics, and technology development for office work, compared to studies on office workers and their professional development.
[42] arXiv:2405.03809 [pdf, ps, html, other]: Title: SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction

Zixu Wang, Zhigang Sun, Juergen Luettin, Lavdim Halilaj

Subjects: Artificial Intelligence (cs.AI)

Accurate trajectory prediction is crucial for ensuring safe and efficient autonomous driving. However, most existing methods overlook complex interactions between traffic participants that often govern their future trajectories. In this paper, we propose SocialFormer, an agent interaction-aware trajectory prediction method that leverages the semantic relationship between the target vehicle and surrounding vehicles by making use of the road topology. We also introduce an edge-enhanced heterogeneous graph transformer (EHGT) as the aggregator in a graph neural network (GNN) to encode the semantic and spatial agent interaction information. Additionally, we introduce a temporal encoder based on gated recurrent units (GRU) to model the temporal social behavior of agent movements. Finally, we present an information fusion framework that integrates agent encoding, lane encoding, and agent interaction encoding for a holistic representation of the traffic scene. We evaluate SocialFormer for the trajectory prediction task on the popular nuScenes benchmark and achieve state-of-the-art performance.
[43] arXiv:2405.03813 [pdf, ps, other]: Title: Large Language Models as Instruments of Power: New Regimes of Autonomous Manipulation and Control

Yaqub Chaudhary, Jonnie Penn

Comments: 21 pages

Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)

Large language models (LLMs) can reproduce a wide variety of rhetorical styles and generate text that expresses a broad spectrum of sentiments. This capacity, now available at low cost, makes them powerful tools for manipulation and control. In this paper, we consider a set of underestimated societal harms made possible by the rapid and largely unregulated adoption of LLMs. Rather than consider LLMs as isolated digital artefacts used to displace this or that area of work, we focus on the large-scale computational infrastructure upon which they are instrumentalised across domains. We begin with discussion on how LLMs may be used to both pollute and uniformize information environments and how these modalities may be leveraged as mechanisms of control. We then draw attention to several areas of emerging research, each of which compounds the capabilities of LLMs as instruments of power. These include (i) persuasion through the real-time design of choice architectures in conversational interfaces (e.g., via "AI personas"), (ii) the use of LLM-agents as computational models of human agents (e.g., "silicon subjects"), (iii) the use of LLM-agents as computational models of human agent populations (e.g., "silicon societies") and finally, (iv) the combination of LLMs with reinforcement learning to produce controllable and steerable strategic dialogue models. We draw these strands together to discuss how these areas may be combined to build LLM-based systems that serve as powerful instruments of individual, social and political control via the simulation and disingenuous "prediction" of human behaviour, intent, and action.
[44] arXiv:2405.03816 [pdf, ps, html, other]: Title: On the invariance of the Kolmogorov complexity of $\beta$-expansions

Valentin Abadie, Helmut Boelcskei

Subjects: Information Theory (cs.IT)

Measuring the complexity of real numbers is of major importance in computer science, for the purpose of knowing which computations are allowed. Consider a non-computable real number $s$, i.e. a real number which cannot be stored on a computer. We can store only an approximation of $x$, for instance by considering a finite bitstring representing a finite prefix of its binary expansion. For a fixed approximation error $\varepsilon>0$, the size of this finite bitstring is dependent on the \textit{algorithmic complexity} of the finite prefixes of the binary expansion of $s$. The \textit{algorithmic complexity} of a binary sequence $x$, often referred to as \textit{Kolmogorov complexity}, is the length of the smallest binary sequence $x'$, for which there exists an algorithm, such that when presented with $x'$ as input, it outputs $x$. The algorithmic complexity of the binary expansion of real numbers is widely studied, but the algorithmic complexity of other ways of representing real numbers remains poorly reported. However, knowing the algorithmic complexity of different representations may allow to define new and more efficient strategies to represent real numbers. Several papers have established an equivalence between the algorithmic complexity of the $q$-ary expansions, with $q \in \mathbb{N}$, $q \geq 2$, i.e. representations of real numbers in any integer base. In this paper, we study the algorithmic complexity of the so-called $\beta$-expansions, which are representations of real numbers in a base $\beta \in (1,2)$ that display a much more complex behavior as compared to the $q$-ary expansion. We show that for a given real number $s$, the binary expansion is a minimizer of algorithmic complexity, and that for every given $\beta \in (1,2)$, there exists a $\beta$-expansion of $s$ which achieves the lower bound of algorithmic complexity displayed by the binary expansion of $s$.
[45] arXiv:2405.03820 [pdf, ps, html, other]: Title: False Sense of Security in Explainable Artificial Intelligence (XAI)

Neo Christopher Chung, Hongkyou Chung, Hearim Lee, Hongbeom Chung, Lennart Brocki, George Dyer

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

A cautious interpretation of AI regulations and policy in the EU and the USA place explainability as a central deliverable of compliant AI systems. However, from a technical perspective, explainable AI (XAI) remains an elusive and complex target where even state of the art methods often reach erroneous, misleading, and incomplete explanations. "Explainability" has multiple meanings which are often used interchangeably, and there are an even greater number of XAI methods - none of which presents a clear edge. Indeed, there are multiple failure modes for each XAI method, which require application-specific development and continuous evaluation. In this paper, we analyze legislative and policy developments in the United States and the European Union, such as the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, the AI Act, the AI Liability Directive, and the General Data Protection Regulation (GDPR) from a right to explanation perspective. We argue that these AI regulations and current market conditions threaten effective AI governance and safety because the objective of trustworthy, accountable, and transparent AI is intrinsically linked to the questionable ability of AI operators to provide meaningful explanations. Unless governments explicitly tackle the issue of explainability through clear legislative and policy statements that take into account technical realities, AI governance risks becoming a vacuous "box-ticking" exercise where scientific standards are replaced with legalistic thresholds, providing only a false sense of security in XAI.
[46] arXiv:2405.03821 [pdf, ps, html, other]: Title: Thoughtful Things: Building Human-Centric Smart Devices with Small Language Models

Evan King, Haoxiang Yu, Sahil Vartak, Jenna Jacob, Sangsu Lee, Christine Julien

Comments: 24 pages (3 pages of references)

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

Everyday devices like light bulbs and kitchen appliances are now embedded with so many features and automated behaviors that they have become complicated to actually use. While such "smart" capabilities can better support users' goals, the task of learning the "ins and outs" of different devices is daunting. Voice assistants aim to solve this problem by providing a natural language interface to devices, yet such assistants cannot understand loosely-constrained commands, they lack the ability to reason about and explain devices' behaviors to users, and they rely on connectivity to intrusive cloud infrastructure. Toward addressing these issues, we propose thoughtful things: devices that leverage lightweight, on-device language models to take actions and explain their behaviors in response to unconstrained user commands. We propose an end-to-end framework that leverages formal modeling, automated training data synthesis, and generative language models to create devices that are both capable and thoughtful in the presence of unconstrained user goals and inquiries. Our framework requires no labeled data and can be deployed on-device, with no cloud dependency. We implement two thoughtful things (a lamp and a thermostat) and deploy them on real hardware, evaluating their practical performance.
[47] arXiv:2405.03824 [pdf, ps, html, other]: Title: Breaking Barriers: Investigating the Sense of Belonging Among Women and Non-Binary Students in Software Engineering

Lina Boman, Jonatan Andersson, Francisco Gomes de Oliveira Neto

Subjects: Software Engineering (cs.SE)

Women in computing were among the first programmers in the early 20th century and were substantial contributors to the industry. Today, men dominate the software engineering industry. Research and data show that women are far less likely to pursue a career in this industry, and those that do are less likely than men to stay in it. Reasons for women and other underrepresented minorities to leave the industry are a lack of opportunities for growth and advancement, unfair treatment and workplace culture. This research explores how the potential to cultivate or uphold an industry unfavourable to women and non-binary individuals manifests in software engineering education at the university level. For this purpose, the study includes surveys and interviews. We use gender name perception as a survey instrument, and the results show small differences in perceptions of software engineering students based on their gender. Particularly, the survey respondents anchor the values of the male software engineer (Hans) to a variety of technical and non-technical skills, while the same description for a female software engineer (Hanna) is anchored mainly by her managerial skills. With interviews with women and non-binary students, we gain insight on the main barriers to their sense of ambient belonging. The collected data shows that some known barriers from the literature such as tokenism, and stereotype threat, do still exist. However, we find positive factors such as role models and encouragement that strengthen the sense of belonging among these students.
[48] arXiv:2405.03825 [pdf, ps, html, other]: Title: Organizing a Society of Language Models: Structures and Mechanisms for Enhanced Collective Intelligence

Silvan Ferreira, Ivanovitch Silva, Allan Martins

Subjects: Artificial Intelligence (cs.AI)

Recent developments in Large Language Models (LLMs) have significantly expanded their applications across various domains. However, the effectiveness of LLMs is often constrained when operating individually in complex environments. This paper introduces a transformative approach by organizing LLMs into community-based structures, aimed at enhancing their collective intelligence and problem-solving capabilities. We investigate different organizational models-hierarchical, flat, dynamic, and federated-each presenting unique benefits and challenges for collaborative AI systems. Within these structured communities, LLMs are designed to specialize in distinct cognitive tasks, employ advanced interaction mechanisms such as direct communication, voting systems, and market-based approaches, and dynamically adjust their governance structures to meet changing demands. The implementation of such communities holds substantial promise for improve problem-solving capabilities in AI, prompting an in-depth examination of their ethical considerations, management strategies, and scalability potential. This position paper seeks to lay the groundwork for future research, advocating a paradigm shift from isolated to synergistic operational frameworks in AI research and application.
[49] arXiv:2405.03827 [pdf, ps, html, other]: Title: Direct learning of home vector direction for insect-inspired robot navigation

Michiel Firlefyn, Jesse Hagenaars, Guido de Croon

Comments: Published at ICRA 2024, project webpage at this https URL

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Insects have long been recognized for their ability to navigate and return home using visual cues from their nest's environment. However, the precise mechanism underlying this remarkable homing skill remains a subject of ongoing investigation. Drawing inspiration from the learning flights of honey bees and wasps, we propose a robot navigation method that directly learns the home vector direction from visual percepts during a learning flight in the vicinity of the nest. After learning, the robot will travel away from the nest, come back by means of odometry, and eliminate the resultant drift by inferring the home vector orientation from the currently experienced view. Using a compact convolutional neural network, we demonstrate successful learning in both simulated and real forest environments, as well as successful homing control of a simulated quadrotor. The average errors of the inferred home vectors in general stay well below the 90° required for successful homing, and below 24° if all images contain sufficient texture and illumination. Moreover, we show that the trajectory followed during the initial learning flight has a pronounced impact on the network's performance. A higher density of sample points in proximity to the nest results in a more consistent return. Code and data are available at this https URL .
[50] arXiv:2405.03828 [pdf, ps, other]: Title: The Trajectory of Romance Scams in the U.S

LD Herrera, John Hastings

Comments: 6 pages, 3 figures, 12th International Symposium on Digital Forensics and Security

Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Romance scams (RS) inflict financial and emotional damage by defrauding victims under the guise of meaningful relationships. This research study examines RS trends in the U.S. through a quantitative analysis of web searches, news articles, research publications, and government reports from 2004 to 2023. This is the first study to use multiple sources for RS trend analysis. Results reveal increasing public interest and media coverage contrasted by a recent decrease in incidents reported to authorities. The frequency of research dedicated to RS has steadily grown but focuses predominantly on documenting the problem rather than developing solutions. Overall, findings suggest RS escalation despite declining official reports, which are likely obscured by low victim reporting rates. This highlights the need for greater awareness to encourage reporting enabling accurate data-driven policy responses. Additionally, more research must focus on techniques to counter these crimes. With improved awareness and prevention, along with responses informed by more accurate data, the rising RS threat can perhaps be mitigated.
[51] arXiv:2405.03830 [pdf, ps, html, other]: Title: Cloud Storage Integrity at Scale: A Case for Dynamic Hash Trees

Quinn Burke, Ryan Sheatsley, Rachel King, Michael Swift, Patrick McDaniel

Subjects: Cryptography and Security (cs.CR)

Merkle hash trees are the state-of-the-art method to protect the integrity of storage systems. However, using a hash tree can severely degrade performance, and prior works optimizing them have yet to yield a concrete understanding of the scalability of certain designs in the context of large-scale cloud storage systems. In this paper, we take a first-principles approach to analyzing hash tree performance for storage by introducing a definition of an optimal hash tree and a principled methodology for evaluating hash tree designs. We show that state-of-the-art designs are not scalable; they incur up to 40.1X slowdowns over an insecure baseline and deliver <50% of optimal performance across various experiments. We then exploit the characteristics of optimal hash trees to design Dynamic Hash Trees (DHTs), hash trees that can adapt to workload patterns on-the-fly, delivering >95% of optimal read and write performance and up to 4.2X speedups over the state-of-the art. Our novel methodology and DHT design provides a new foundation in the search for integrity mechanisms that can operate efficiently at scale.
[52] arXiv:2405.03831 [pdf, ps, html, other]: Title: Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Issa Saba, Eishi Arima, Dai Liu, Martin Schulz

Journal-ref: Architecture of Computing Systems. ARCS 2022. Lecture Notes in Computer Science, vol 13642. Springer, Cham

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single program typically cannot fully exploit all available resources. At the same time, power consumption is a key issue and often requires optimizing power allocations to the CPU and GPU while enforcing a total power constraint, in particular when the power/thermal requirements are strict. The result is a system-wide optimization problem with several knobs. In particular we focus on (1) co-scheduling decisions, i.e., selecting programs to co-locate in a space sharing manner; (2) resource partitioning on both CPUs and GPUs; and (3) power capping on both CPUs and GPUs. We solve this problem using predictive performance modeling using machine learning in order to coordinately optimize the above knob setups. Our experiential results using a real system show that our approach achieves up to 67% of speedup compared to a time-sharing-based scheduling with a naive power capping that evenly distributes power budgets across components.
[53] arXiv:2405.03832 [pdf, ps, html, other]: Title: Guylingo: The Republic of Guyana Creole Corpora

Christopher Clarke, Roland Daynauth, Charlene Wilkinson, Hubert Devonish, Jason Mars

Comments: Accepted to NAACL 2024 Main Conference Special Theme Track: Languages of Latin America

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking", the ex-British Caribbean region consists of a myriad of Creole languages thriving alongside English. In this paper, we present Guylingo: a comprehensive corpus designed for advancing NLP research in the domain of Creolese (Guyanese English-lexicon Creole), the most widely spoken language in the culturally rich nation of Guyana. We first outline our framework for gathering and digitizing this diverse corpus, inclusive of colloquial expressions, idioms, and regional variations in a low-resource language. We then demonstrate the challenges of training and evaluating NLP models for machine translation in Creole. Lastly, we discuss the unique opportunities presented by recent NLP advancements for accelerating the formal adoption of Creole languages as official languages in the Caribbean.
[54] arXiv:2405.03838 [pdf, ps, html, other]: Title: Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

Eishi Arima, Minjoon Kang, Issa Saba, Josef Weidendorfer, Carsten Trinitis, Martin Schulz

Journal-ref: ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing, August 2022, Article No.: 9

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.
[55] arXiv:2405.03840 [pdf, ps, html, other]: Title: End-to-End Autoencoder for Drill String Acoustic Communications

Iurii Lezhenin, Aleksandr Sidnev, Vladimir Tsygan, Igor Malyshev

Subjects: Machine Learning (cs.LG)

Drill string communications are important for drilling efficiency and safety. The design of a low latency drill string communication system with high throughput and reliability remains an open challenge. In this paper a deep learning autoencoder (AE) based end-to-end communication system, where transmitter and receiver implemented as feed forward neural networks, is proposed for acousticdrill string communications. Simulation shows that the AE system is able to outperform a baseline non-contiguous OFDM system in terms of BER and PAPR, operating with lower latency.
[56] arXiv:2405.03842 [pdf, ps, html, other]: Title: A Novel Cross-band CSI Prediction Scheme for Multi-band Fingerprint based Localization

Yuan Ruihao, Huang Kaixuan, Zhang Shunqing

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

Because of the advantages of computation complexity compared with traditional localization algorithms, fingerprint based localization is getting increasing demand. Expanding the fingerprint database from the frequency domain by channel reconstruction can improve localization accuracy. However, in a mobility environment, the channel reconstruction accuracy is limited by the time-varying parameters. In this paper, we proposed a system to extract the time-varying parameters based on space-alternating generalized expectation maximization (SAGE) algorithm, then used variational auto-encoder (VAE) to reconstruct the channel state information on another channel. The proposed scheme is tested on the data generated by the deep-MIMO channel model. Mathematical analysis for the viability of our system is also shown in this paper.
[57] arXiv:2405.03844 [pdf, ps, html, other]: Title: Perception in Pixels: Understanding Avatar Representation in Video-Mediated Collaborative Interactions

Pitch Sinlapanuntakul, Mark Zachry

Comments: 25 pages, 5 figures, 3 tables

Subjects: Human-Computer Interaction (cs.HC)

Despite the abundance of research concerning virtual reality (VR) avatars, the impact of screen-based or augmented reality (AR) avatars for real-world applications remain relatively unexplored. Notably, there is a lack of research examining video-mediated collaborative interaction experiences using AR avatars for goal-directed group activities. This study bridges this gap with a mixed-methods, quasi-experimental user study that investigates video-based small-group interactions when employing AR avatars as opposed to traditional video for user representation. We found that the use of avatars positively influenced self-esteem and video-based collaboration satisfaction. In addition, our group interview findings highlight experiences and perceptions regarding the dynamic use of avatars in video-mediated collaborative interactions, including benefits, challenges, and factors that would influence a decision to use avatars. This study contributes an empirical understanding of avatar representation in mediating video-based collaborative interactions, implications and perceptions surrounding the adoption of AR avatars, and a comprehensive comparison of key characteristics between user representations.
[58] arXiv:2405.03845 [pdf, ps, html, other]: Title: Self-Improving Customer Review Response Generation Based on LLMs

Guy Azov, Tatiana Pelc, Adi Fledel Alon, Gila Kamhi

Comments: 18 pages, 4 figure, 8 figures in Appendix, accepted to LREC-COLING 2024 workshop

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Previous studies have demonstrated that proactive interaction with user reviews has a positive impact on the perception of app users and encourages them to submit revised ratings. Nevertheless, developers encounter challenges in managing a high volume of reviews, particularly in the case of popular apps with a substantial influx of daily reviews. Consequently, there is a demand for automated solutions aimed at streamlining the process of responding to user reviews. To address this, we have developed a new system for generating automatic responses by leveraging user-contributed documents with the help of retrieval-augmented generation (RAG) and advanced Large Language Models (LLMs). Our solution, named SCRABLE, represents an adaptive customer review response automation that enhances itself with self-optimizing prompts and a judging mechanism based on LLMs. Additionally, we introduce an automatic scoring mechanism that mimics the role of a human evaluator to assess the quality of responses generated in customer review domains. Extensive experiments and analyses conducted on real-world datasets reveal that our method is effective in producing high-quality responses, yielding improvement of more than 8.5% compared to the baseline. Further validation through manual examination of the generated responses underscores the efficacy our proposed system.
[59] arXiv:2405.03846 [pdf, ps, html, other]: Title: Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings

Ádám Fodor, Rachid R. Saboundji, András Lőrincz

Comments: 14 pages, 4 figures

Journal-ref: Annales Universitatis Scientiarium Budapestinensis de Rolando E\"otv\"os Nominatae. Sectio Computatorica, MaCS Special Issue, 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Automatic personality trait assessment is essential for high-quality human-machine interactions. Systems capable of human behavior analysis could be used for self-driving cars, medical research, and surveillance, among many others. We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings and exploiting modality invariant embeddings. Acoustic, visual, and textual information are utilized to reach high-performance solutions in this task. Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant. Our proposed method addresses the challenge of under-represented extreme values, achieves 0.0033 MAE average improvement, and shows a clear advantage over the baseline multimodal DNN without the introduced module.
[60] arXiv:2405.03847 [pdf, ps, html, other]: Title: On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems

Eishi Arima, Isaías A. Comprés, Martin Schulz

Journal-ref: High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recent High-Performance Computing (HPC) systems are facing important challenges, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this observation, in this position paper we first highlight the recent trends of resource management techniques, with a particular focus on malleability support (i.e., dynamically scaling resource allocations/requirements for a job), co-scheduling (i.e., co-locating multiple jobs within a node), and power management. Second, we consider putting them together, assess their relationships/synergies, and discuss the functionality requirements in each software component for future over-provisioned and power-constrained HPC systems. Third, we briefly introduce our ongoing efforts on the integration of software tools, which will ultimately lead to the convergence of malleability and power management, as it is designed in the HPC PowerStack initiative.
[61] arXiv:2405.03848 [pdf, ps, html, other]: Title: CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities

Kingsley Nweye, Kathryn Kaspar, Giacomo Buscemi, Tiago Fonseca, Giuseppe Pinto, Dipanjan Ghose, Satvik Duddukuru, Pavani Pratapa, Han Li, Javad Mohammadi, Luis Lino Ferreira, Tianzhen Hong, Mohamed Ouf, Alfonso Capozzoli, Zoltan Nagy

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

As more distributed energy resources become part of the demand-side infrastructure, it is important to quantify the energy flexibility they provide on a community scale, particularly to understand the impact of geographic, climatic, and occupant behavioral differences on their effectiveness, as well as identify the best control strategies to accelerate their real-world adoption. CityLearn provides an environment for benchmarking simple and advanced distributed energy resource control algorithms including rule-based, model-predictive, and reinforcement learning control. CityLearn v2 presented here extends CityLearn v1 by providing a simulation environment that leverages the End-Use Load Profiles for the U.S. Building Stock dataset to create virtual grid-interactive communities for resilient, multi-agent distributed energy resources and objective control with dynamic occupant feedback. This work details the v2 environment design and provides application examples that utilize reinforcement learning to manage battery energy storage system charging/discharging cycles, vehicle-to-grid control, and thermal comfort during heat pump power modulation.
[62] arXiv:2405.03851 [pdf, ps, html, other]: Title: Upper Bounds for Complexity of Asymptotically Optimal Learned Indexes

Luis Croquevielle, Guang Yang, Liang Lian, Ali Hadian, Thomas Heinis

Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS)

Learned indexes leverage machine learning models to accelerate query answering in databases, showing impressive practical performance. However, theoretical understanding of these methods remains incomplete. Existing research suggests that learned indexes have superior asymptotic complexity compared to their non-learned counterparts, but these findings have been established under restrictive probabilistic assumptions. Specifically, for a sorted array with $n$ elements, it has been shown that learned indexes can find a key in $O(\log(\log n))$ expected time using at most linear space, compared with $O(\log n)$ for non-learned methods.
In this work, we prove $O(1)$ expected time can be achieved with at most linear space, thereby establishing the tightest upper bound so far for the time complexity of an asymptotically optimal learned index. Notably, we use weaker probabilistic assumptions than prior work, meaning our results generalize previous efforts. Furthermore, we introduce a new measure of statistical complexity for data. This metric exhibits an information-theoretical interpretation and can be estimated in practice. This characterization provides further theoretical understanding of learned indexes, by helping to explain why some datasets seem to be particularly challenging for these methods.
[63] arXiv:2405.03852 [pdf, ps, html, other]: Title: VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images

Anna Penzkofer, Lei Shi, Andreas Bulling

Comments: To be published in the Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci'24)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

While Vector Symbolic Architectures (VSAs) are promising for modelling spatial cognition, their application is currently limited to artificially generated images and simple spatial queries. We propose VSA4VQA - a novel 4D implementation of VSAs that implements a mental representation of natural images for the challenging task of Visual Question Answering (VQA). VSA4VQA is the first model to scale a VSA to complex spatial queries. Our method is based on the Semantic Pointer Architecture (SPA) to encode objects in a hyperdimensional vector space. To encode natural images, we extend the SPA to include dimensions for object's width and height in addition to their spatial location. To perform spatial queries we further introduce learned spatial query masks and integrate a pre-trained vision-language model for answering attribute-related questions. We evaluate our method on the GQA benchmark dataset and show that it can effectively encode natural images, achieving competitive performance to state-of-the-art deep learning methods for zero-shot VQA.
[64] arXiv:2405.03853 [pdf, ps, other]: Title: Lifting Directional Fields to Minimal Sections

David Palmer, Albert Chern, Justin Solomon

Comments: 20 pages, 21 figures; to appear at SIGGRAPH 2024

Subjects: Graphics (cs.GR)

Directional fields, including unit vector, line, and cross fields, are essential tools in the geometry processing toolkit. The topology of directional fields is characterized by their singularities. While singularities play an important role in downstream applications such as meshing, existing methods for computing directional fields either require them to be specified in advance, ignore them altogether, or treat them as zeros of a relaxed field. While fields are ill-defined at their singularities, the graphs of directional fields with singularities are well-defined surfaces in a circle bundle. By lifting optimization of fields to optimization over their graphs, we can exploit a natural convex relaxation to a minimal section problem over the space of currents in the bundle. This relaxation treats singularities as first-class citizens, expressing the relationship between fields and singularities as an explicit boundary condition. As curvature frustrates finite element discretization of the bundle, we devise a hybrid spectral method for representing and optimizing minimal sections. Our method supports field optimization on both flat and curved domains and enables more precise control over singularity placement.
[65] arXiv:2405.03855 [pdf, ps, html, other]: Title: Strategies for Increasing Corporate Responsible AI Prioritization

Angelina Wang, Teresa Datta, John P. Dickerson

Subjects: Computers and Society (cs.CY)

Responsible artificial intelligence (RAI) is increasingly recognized as a critical concern. However, the level of corporate RAI prioritization has not kept pace. In this work, we conduct 16 semi-structured interviews with practitioners to investigate what has historically motivated companies to increase the prioritization of RAI. What emerges is a complex story of conflicting and varied factors, but we bring structure to the narrative by highlighting the different strategies available to employ, and point to the actors with access to each. While there are no guaranteed steps for increasing RAI prioritization, we paint the current landscape of motivators so that practitioners can learn from each other, and put forth our own selection of promising directions forward.
[66] arXiv:2405.03856 [pdf, ps, html, other]: Title: Finding perfect matchings in bridgeless cubic multigraphs without dynamic (2-)connectivity

Paweł Gawrychowski, Mateusz Wasylkiewicz

Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

Petersen's theorem, one of the earliest results in graph theory, states that any bridgeless cubic multigraph contains a perfect matching. While the original proof was neither constructive nor algorithmic, Biedl, Bose, Demaine, and Lubiw [J. Algorithms 38(1)] showed how to implement a later constructive proof by Frink in $\mathcal{O}(n\log^{4}n)$ time using a fully dynamic 2-edge-connectivity structure. Then, Diks and Stańczyk [SOFSEM 2010] described a faster approach that only needs a fully dynamic connectivity structure and works in $\mathcal{O}(n\log^{2}n)$ time. Both algorithms, while reasonable simple, utilize non-trivial (2-edge-)connectivity structures. We show that this is not necessary, and in fact a structure for maintaining a dynamic tree, e.g. link-cut trees, suffices to obtain a simple $\mathcal{O}(n\log n)$ time algorithm.
[67] arXiv:2405.03862 [pdf, ps, html, other]: Title: Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration

Razan Baltaji, Babak Hemmatian, Lav R. Varshney

Comments: 16 pages, 8 figures, 3 tables

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This study explores the sources of instability in maintaining cultural personas and opinions within multi-agent LLM systems. Drawing on simulations of inter-cultural collaboration and debate, we analyze agents' pre- and post-discussion private responses alongside chat transcripts to assess the stability of cultural personas and the impact of opinion diversity on group outcomes. Our findings suggest that multi-agent discussions can encourage collective decisions that reflect diverse perspectives, yet this benefit is tempered by the agents' susceptibility to conformity due to perceived peer pressure and challenges in maintaining consistent personas and opinions. Counterintuitively, instructions that encourage debate in support of one's opinions increase the rate of inconstancy. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs will remain untapped.
[68] arXiv:2405.03863 [pdf, ps, html, other]: Title: Resource Optimization in UAV-assisted IoT Networks: The Role of Generative AI

Sana Sharif, Sherali Zeadally, Waleed Ejaz

Comments: Accepted - IEEE Internet of Things Magazine

Subjects: Systems and Control (eess.SY)

We investigate how generative Artificial Intelligence (AI) can be used to optimize resources in Unmanned Aerial Vehicle (UAV)-assisted Internet of Things (IoT) networks. In particular, generative AI models for real-time decision-making have been used in public safety scenarios. This work describes how generative AI models can improve resource management within UAV-assisted networks. Furthermore, this work presents generative AI in UAV-assisted networks to demonstrate its practical applications and highlight its broader capabilities. We demonstrate a real-life case study for public safety, demonstrating how generative AI can enhance real-time decision-making and improve training datasets. By leveraging generative AI in UAV- assisted networks, we can design more intelligent, adaptive, and efficient ecosystems to meet the evolving demands of wireless networks and diverse applications. Finally, we discuss challenges and future research directions associated with generative AI for resource optimization in UAV-assisted networks.
[69] arXiv:2405.03864 [pdf, ps, html, other]: Title: Learning Planning Abstractions from Language

Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu

Comments: The first two authors contributed equally. The last two authors provide equal advising. Project website: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.
[70] arXiv:2405.03865 [pdf, ps, html, other]: Title: Information-driven Affordance Discovery for Efficient Robotic Manipulation

Pietro Mazzaglia, Taco Cohen, Daniel Dijkman

Comments: arXiv admin note: substantial text overlap with arXiv:2308.14915

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Robotic affordances, providing information about what actions can be taken in a given situation, can aid robotic manipulation. However, learning about affordances requires expensive large annotated datasets of interactions or demonstrations. In this work, we argue that well-directed interactions with the environment can mitigate this problem and propose an information-based measure to augment the agent's objective and accelerate the affordance discovery process. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives, such as grasping, stacking objects, or opening drawers, strongly improving data efficiency in simulation, and it allows us to learn grasping affordances in a small number of interactions, on a real-world setup with a UFACTORY XArm 6 robot arm.
[71] arXiv:2405.03869 [pdf, ps, html, other]: Title: Outlier Gradient Analysis: Efficiently Improving Deep Learning Model Performance via Hessian-Free Influence Functions

Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra, Hongfu Liu

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Influence functions offer a robust framework for assessing the impact of each training data sample on model predictions, serving as a prominent tool in data-centric learning. Despite their widespread use in various tasks, the strong convexity assumption on the model and the computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzing large deep models. This paper focuses on a classical data-centric scenario--trimming detrimental samples--and addresses both challenges within a unified framework. Specifically, we establish an equivalence transformation between identifying detrimental training samples via influence functions and outlier gradient detection. This transformation not only presents a straightforward and Hessian-free formulation but also provides profound insights into the role of the gradient in sample impact. Moreover, it relaxes the convexity assumption of influence functions, extending their applicability to non-convex deep models. Through systematic empirical evaluations, we first validate the correctness of our proposed outlier gradient analysis on synthetic datasets and then demonstrate its effectiveness in detecting mislabeled samples in vision models, selecting data samples for improving performance of transformer models for natural language processing, and identifying influential samples for fine-tuned Large Language Models.
[72] arXiv:2405.03870 [pdf, ps, other]: Title: AI-Driven Frameworks for Enhancing Data Quality in Big Data Ecosystems: Error_Detection, Correction, and Metadata Integration

Widad Elouataoui

Comments: Doctoral thesis

Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

The widespread adoption of big data has ushered in a new era of data-driven decision-making, transforming numerous industries and sectors. However, the efficacy of these decisions hinges on the quality of the underlying data. Poor data quality can result in inaccurate analyses and deceptive conclusions. Managing the vast volume, velocity, and variety of data sources presents significant challenges, heightening the importance of addressing big data quality issues. While there has been increased attention from both academia and industry, current approaches often lack comprehensiveness and universality. They tend to focus on limited metrics, neglecting other dimensions of data quality. Moreover, existing methods are often context-specific, limiting their applicability across different domains. There is a clear need for intelligent, automated approaches leveraging artificial intelligence (AI) for advanced data quality corrections.
To bridge these gaps, this Ph.D. thesis proposes a novel set of interconnected frameworks aimed at enhancing big data quality comprehensively. Firstly, we introduce new quality metrics and a weighted scoring system for precise data quality assessment. Secondly, we present a generic framework for detecting various quality anomalies using AI models. Thirdly, we propose an innovative framework for correcting detected anomalies through predictive modeling. Additionally, we address metadata quality enhancement within big data ecosystems. These frameworks are rigorously tested on diverse datasets, demonstrating their efficacy in improving big data quality. Finally, the thesis concludes with insights and suggestions for future research directions.
[73] arXiv:2405.03873 [pdf, ps, html, other]: Title: Investigating Personalized Driving Behaviors in Dilemma Zones: Analysis and Prediction of Stop-or-Go Decisions

Ziye Qin, Siyan Li, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

Subjects: Artificial Intelligence (cs.AI)

Dilemma zones at signalized intersections present a commonly occurring but unsolved challenge for both drivers and traffic operators. Onsets of the yellow lights prompt varied responses from different drivers: some may brake abruptly, compromising the ride comfort, while others may accelerate, increasing the risk of red-light violations and potential safety hazards. Such diversity in drivers' stop-or-go decisions may result from not only surrounding traffic conditions, but also personalized driving behaviors. To this end, identifying personalized driving behaviors and integrating them into advanced driver assistance systems (ADAS) to mitigate the dilemma zone problem presents an intriguing scientific question. In this study, we employ a game engine-based (i.e., CARLA-enabled) driving simulator to collect high-resolution vehicle trajectories, incoming traffic signal phase and timing information, and stop-or-go decisions from four subject drivers in various scenarios. This approach allows us to analyze personalized driving behaviors in dilemma zones and develop a Personalized Transformer Encoder to predict individual drivers' stop-or-go decisions. The results show that the Personalized Transformer Encoder improves the accuracy of predicting driver decision-making in the dilemma zone by 3.7% to 12.6% compared to the Generic Transformer Encoder, and by 16.8% to 21.6% over the binary logistic regression model.
[74] arXiv:2405.03875 [pdf, ps, html, other]: Title: Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis testing framework and show that Data Shapley's performance can be no better than random selection without specific constraints on utility functions. We identify a class of utility functions, monotonically transformed modular functions, within which Data Shapley optimally selects data. Based on this insight, we propose a heuristic for predicting Data Shapley's effectiveness in data selection tasks. Our experiments corroborate these findings, adding new insights into when Data Shapley may or may not succeed.
[75] arXiv:2405.03878 [pdf, ps, html, other]: Title: Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

Aditya A. Ramesh, Kenny Young, Louis Kirsch, Jürgen Schmidhuber

Subjects: Machine Learning (cs.LG)

Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrapping to overcome variance but introduces a bias that can only be corrected through many iterations. TD($\lambda$) provides a mechanism to navigate this bias-variance tradeoff smoothly. Appropriately selecting $\lambda$ can significantly improve performance. Here, we propose Chunked-TD, which uses predicted probabilities of transitions from a model for computing $\lambda$-return targets. Unlike other model-based solutions to credit assignment, Chunked-TD is less vulnerable to model inaccuracies. Our approach is motivated by the principle of history compression and 'chunks' trajectories for conventional TD learning. Chunking with learned world models compresses near-deterministic regions of the environment-policy interaction to speed up credit assignment while still bootstrapping when necessary. We propose algorithms that can be implemented online and show that they solve some problems much faster than conventional TD($\lambda$).
[76] arXiv:2405.03882 [pdf, ps, html, other]: Title: Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Huihong Shi, Haikuo Shao, Wendong Mao, Zhongfeng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unfortunately, due to the existence of hardware-unfriendly and quantization-sensitive non-linear operations, particularly {Softmax}, it is non-trivial to completely quantize all operations in ViTs, yielding either significant accuracy drops or non-negligible hardware costs. In response to challenges associated with \textit{standard ViTs}, we focus our attention towards the quantization and acceleration for \textit{efficient ViTs}, which not only eliminate the troublesome Softmax but also integrate linear attention with low computational complexity, and propose \emph{Trio-ViT} accordingly. Specifically, at the algorithm level, we develop a {tailored post-training quantization engine} taking the unique activation distributions of Softmax-free efficient ViTs into full consideration, aiming to boost quantization accuracy. Furthermore, at the hardware level, we build an accelerator dedicated to the specific Convolution-Transformer hybrid architecture of efficient ViTs, thereby enhancing hardware efficiency. Extensive experimental results consistently prove the effectiveness of our Trio-ViT framework. {Particularly, we can gain up to $\uparrow$$\mathbf{7.2}\times$ and $\uparrow$$\mathbf{14.6}\times$ FPS under comparable accuracy over state-of-the-art ViT accelerators, as well as $\uparrow$$\mathbf{5.9}\times$ and $\uparrow$$\mathbf{2.0}\times$ DSP efficiency.} Codes will be released publicly upon acceptance.
[77] arXiv:2405.03883 [pdf, ps, other]: Title: sqlelf: a SQL-centric Approach to ELF Analysis

Farid Zakaria, Zheyuan Chen, Andrew Quinn, Thomas R. W. Scogland

Subjects: Software Engineering (cs.SE); Databases (cs.DB); Operating Systems (cs.OS)

The exploration and understanding of Executable and Linkable Format (ELF) objects underpin various critical activities in computer systems, from debugging to reverse engineering. Traditional UNIX tooling like readelf, nm, and objdump have served the community reliably over the years. However, as the complexity and scale of software projects has grown, there arises a need for more intuitive, flexible, and powerful methods to investigate ELF objects. In this paper, we introduce sqlelf, an innovative tool that empowers users to probe ELF objects through the expressive power of SQL. By modeling ELF objects as relational databases, sqlelf offers the following advantages over conventional methods.
Our evaluations demonstrate that sqlelf not only provides more nuanced and comprehensive insights into ELF objects but also significantly reduces the effort and time traditionally required for ELF exploration tasks
[78] arXiv:2405.03884 [pdf, ps, html, other]: Title: BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection

Saket S. Chaturvedi, Lan Zhang, Wenbin Zhang, Pan He, Xiaoyong Yuan

Comments: Accepted at IJCAI 2024 Conference

Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D triggers are injected to ensure their visibility within the sparse 3D space, rendering them easy to detect and impractical in real-world scenarios.
In this paper, we delve into the robustness of 3D object detection, exploring a new backdoor attack surface through 2D cameras. Given the prevalent adoption of camera and LiDAR signal fusion for high-fidelity 3D perception, we investigate the latent potential of camera signals to disrupt the process. Although the dense nature of camera signals enables the use of nearly imperceptible small-sized triggers to mislead 2D object detection, realizing 2D-oriented backdoor attacks against 3D object detection is non-trivial. The primary challenge emerges from the fusion process that transforms camera signals into a 3D space, compromising the association with the 2D trigger to the target output. To tackle this issue, we propose an innovative 2D-oriented backdoor attack against LiDAR-camera fusion methods for 3D object detection, named BadFusion, for preserving trigger effectiveness throughout the entire fusion process. The evaluation demonstrates the effectiveness of BadFusion, achieving a significantly higher attack success rate compared to existing 2D-oriented attacks.
[79] arXiv:2405.03885 [pdf, ps, other]: Title: Playing Games with your PET: Extending the Partial Exploration Tool to Stochastic Games

Tobias Meggendorfer, Maximilian Weininger

Subjects: Systems and Control (eess.SY)

We present version 2.0 of the Partial Exploration Tool (PET), a tool for verification of probabilistic systems. We extend the previous version by adding support for stochastic games, based on a recent unified framework for sound value iteration algorithms. Thereby, PET2 is the first tool implementing a sound and efficient approach for solving stochastic games with objectives of the type reachability/safety and mean payoff. We complement this approach by developing and implementing a partial-exploration based variant for all three objectives. Our experimental evaluation shows that PET2 offers the most efficient partial-exploration based algorithm and is the most viable tool on SGs, even outperforming unsound tools.
[80] arXiv:2405.03889 [pdf, ps, html, other]: Title: ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading

Griffin Dietz Smith, Siddhartha Prasad, Matt J. Davidson, Leah Findlater, R. Benjamin Shapiro

Comments: ACM Interaction Design and Children (IDC) 2024

Subjects: Human-Computer Interaction (cs.HC)

Much of early literacy education happens at home with caretakers reading books to young children. Prior research demonstrates how having dialogue with children during co-reading can develop critical reading readiness skills, but most adult readers are unsure if and how to lead effective conversations. We present ContextQ, a tablet-based reading application to unobtrusively present auto-generated dialogic questions to caretakers to support this dialogic reading practice. An ablation study demonstrates how our method of encoding educator expertise into the question generation pipeline can produce high-quality output; and through a user study with 12 parent-child dyads (child age: 4-6), we demonstrate that this system can serve as a guide for parents in leading contextually meaningful dialogue, leading to significantly more conversational turns from both the parent and the child and deeper conversations with connections to the child's everyday life.
[81] arXiv:2405.03890 [pdf, ps, html, other]: Title: SVan: A Mobile Hub as a Field Robotics Development and Deployment Platform

Alexander Moortgat-Pick, Anna Adamczyk, Daniel A Duecker, Sami Haddadin

Comments: 7 pages, 10 figures, accepted paper at IEEE ICRA Workshop on Field Robotics 2024

Subjects: Robotics (cs.RO)

As robotics becomes increasingly vital for environmental protection, there is a growing need for effective deployment methods that match the pace of robotics innovation. Current strategies often fall short, leaving a gap between the potential of robotics and their practical application in the field. Addressing this challenge, we introduce a mobile hub concept designed to provide the necessary infrastructure and support for deploying a diverse, multi-domain robot team effectively. This paper presents the development and insights into `SVAN' (Synchronous Team-Robot Van), a prototype of our mobile hub concept. We delve into the mechanical construction and software setup of SVAN, offering a comprehensive overview of its capabilities and design considerations. Further, we discuss the hardware specifications and share valuable lessons learned during the prototype's development and deployment. In addition to this paper, an accepted video complements our exploration by depicting SVAN in its envisioned role as an environmental guardian, highlighting its potential in ecological monitoring and preservation. Furthermore, our discussion is enriched by referencing a previously accepted paper detailing a novel methodology for continuous UAV mission cycling enabled by a mobile hub like SVAN. These accompanying works underscore our contribution towards addressing the existing gaps in robot deployment strategies, presenting a scalable and efficient framework to overcome operational challenges in environmental robotics.
[82] arXiv:2405.03891 [pdf, ps, other]: Title: Enhancing O-RAN Security: Evasion Attacks and Robust Defenses for Graph Reinforcement Learning-based Connection Management

Ravikumar Balakrishnan, Marius Arvinte, Nageen Himayat, Hosein Nikopour, Hassnaa Moustafa

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Adversarial machine learning, focused on studying various attacks and defenses on machine learning (ML) models, is rapidly gaining importance as ML is increasingly being adopted for optimizing wireless systems such as Open Radio Access Networks (O-RAN). A comprehensive modeling of the security threats and the demonstration of adversarial attacks and defenses on practical AI based O-RAN systems is still in its nascent stages. We begin by conducting threat modeling to pinpoint attack surfaces in O-RAN using an ML-based Connection management application (xApp) as an example. The xApp uses a Graph Neural Network trained using Deep Reinforcement Learning and achieves on average 54% improvement in the coverage rate measured as the 5th percentile user data rates. We then formulate and demonstrate evasion attacks that degrade the coverage rates by as much as 50% through injecting bounded noise at different threat surfaces including the open wireless medium itself. Crucially, we also compare and contrast the effectiveness of such attacks on the ML-based xApp and a non-ML based heuristic. We finally develop and demonstrate robust training-based defenses against the challenging physical/jamming-based attacks and show a 15% improvement in the coverage rates when compared to employing no defense over a range of noise budgets
[83] arXiv:2405.03892 [pdf, ps, html, other]: Title: Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

Minjae Cho, Jonathan P. How, Chuangchuang Sun

Comments: Submitted for review at IEEE: Neural Networks and Learning Systems

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance when the policy is evaluated on scenarios that are Out-Of-Distribution (OOD) from the training dataset. Most existing offline RL resolves this issue by regularizing policy learning within the information supported by the given dataset. However, such regularization overlooks the potential for high-reward regions that may exist beyond the dataset. This motivates exploring novel offline learning techniques that can make improvements beyond the data support without compromising policy performance, potentially by learning causation (cause-and-effect) instead of correlation from the dataset. In this paper, we propose the MOOD-CRL (Model-based Offline OOD-Adapting Causal RL) algorithm, which aims to address the challenge of extrapolation for offline policy training through causal inference instead of policy-regularizing methods. Specifically, Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training. Based on the data-invariant, physics-based qualitative causal graph and the observational data, we develop a novel learning scheme for CNF to learn the quantitative structural causal model. As a result, CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation. Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
[84] arXiv:2405.03894 [pdf, ps, html, other]: Title: MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View

Emmanuelle Bourigault, Pauline Bourigault

Comments: CVPRW: Generative Models for Computer Vision

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Generating consistent multiple views for 3D reconstruction tasks is still a challenge to existing image-to-3D diffusion models. Generally, incorporating 3D representations into diffusion model decrease the model's speed as well as generalizability and quality. This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model. In the model, we introduce epipolar geometry constraints and multi-view attention to enforce 3D consistency. From as few as one image input, our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
[85] arXiv:2405.03898 [pdf, ps, html, other]: Title: With or Without Permission: Site-Specific Augmented Reality for Social Justice

Rafael M.L. Silva, Ana María Cárdenas Gasca, Joshua A. Fisher, Erica Principe Cruz, Cinthya Jauregui, Amy Lueck, Fannie Liu, Andrés Monroy-Hernández, Kai Lukoff

Comments: Presented at CHI 2024 (arXiv:2404.05889)

Subjects: Human-Computer Interaction (cs.HC)

Movements for social change are often tied to a particular locale. This makes Augmented Reality (AR), which changes how people perceive their surroundings, a promising technology for social justice. Site-specific AR empowers activists to re-tell the story of a place, with or without permission of its owner. It has been used, for example, to reveal hidden histories, re-imagine problematic monuments, and celebrate minority cultures. However, challenges remain concerning technological ownership and accessibility, scalability, sustainability, and navigating collaborations with marginalized communities and across disciplinary boundaries. This half-day workshop at CHI 2024 seeks to bring together an interdisciplinary group of activists, computer scientists, designers, media scholars, and more to identify opportunities and challenges across domains. To anchor the discussion, participants will each share one example of an artifact used in speculating, designing, and/or delivering site-specific AR experiences. This collection of artifacts will inaugurate an interactive database that can inspire a new wave of activists to leverage AR for social justice.
[86] arXiv:2405.03901 [pdf, ps, html, other]: Title: OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs

Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, Michelle Li

Comments: Paper accepted to the 2024 CHI Conference on Human Factors in Computing Systems (CHI 2024)

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

The progression to "Pervasive Augmented Reality" envisions easy access to multimodal information continuously. However, in many everyday scenarios, users are occupied physically, cognitively or socially. This may increase the friction to act upon the multimodal information that users encounter in the world. To reduce such friction, future interactive interfaces should intelligently provide quick access to digital actions based on users' context. To explore the range of possible digital actions, we conducted a diary study that required participants to capture and share the media that they intended to perform actions on (e.g., images or audio), along with their desired actions and other contextual information. Using this data, we generated a holistic design space of digital follow-up actions that could be performed in response to different types of multimodal sensory inputs. We then designed OmniActions, a pipeline powered by large language models (LLMs) that processes multimodal sensory inputs and predicts follow-up actions on the target information grounded in the derived design space. Using the empirical data collected in the diary study, we performed quantitative evaluations on three variations of LLM techniques (intent classification, in-context learning and finetuning) and identified the most effective technique for our task. Additionally, as an instantiation of the pipeline, we developed an interactive prototype and reported preliminary user feedback about how people perceive and react to the action predictions and its errors.
[87] arXiv:2405.03903 [pdf, ps, html, other]: Title: Unified Locational Differential Privacy Framework

Aman Priyanshu, Yash Maurya, Suriya Ganesh, Vy Tran

Comments: 10 pages, 7 figures

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Aggregating statistics over geographical regions is important for many applications, such as analyzing income, election results, and disease spread. However, the sensitive nature of this data necessitates strong privacy protections to safeguard individuals. In this work, we present a unified locational differential privacy (DP) framework to enable private aggregation of various data types, including one-hot encoded, boolean, float, and integer arrays, over geographical regions. Our framework employs local DP mechanisms such as randomized response, the exponential mechanism, and the Gaussian mechanism. We evaluate our approach on four datasets representing significant location data aggregation scenarios. Results demonstrate the utility of our framework in providing formal DP guarantees while enabling geographical data analysis.
[88] arXiv:2405.03904 [pdf, ps, html, other]: Title: Transformer models classify random numbers

Rishabh Goel, YiZi Xiao, Ramin Ramezani

Comments: 13 pages, 5 figures

Subjects: Machine Learning (cs.LG)

Random numbers are incredibly important in a variety of fields, and the need for their validation remains important. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers however this does not remove the need to thoroughly test their randomness. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests from the NIST Statistical Test Suite (STS) which are often slow and only perform one task at a time. Our work presents a deep learning model that utilizes the transformer architecture to encode some of the tests from the NIST STS in a single model that also runs much faster. This model performs multi-label classification on these tests and outputs the probability of passing each statistical test that it encodes. We perform a thorough hyper-parameter optimization to converge on the best possible model and as a result, achieve a high degree of accuracy with a sample f1 score of above 0.9.
[89] arXiv:2405.03905 [pdf, ps, html, other]: Title: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

Subjects: Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network ({\Delta}RNN) classifier, achieves an 11-class Google Speech Command Dataset (GSCD) KWS accuracy of 90.5% and energy consumption of 36nJ/decision. At 87% temporal sparsity, computing latency and energy per inference are reduced by 2.4$\times$/3.4$\times$, respectively. The 65nm design occupies 0.78mm$^2$ and features two additional blocks, a compact 0.084mm$^2$ digital infinite-impulse-response (IIR)-based band-pass filter (BPF) audio feature extractor (FEx) and a 24kB 0.6V near-Vth weight SRAM with 6.6$\times$ lower read power compared to the standard SRAM.
[90] arXiv:2405.03908 [pdf, ps, html, other]: Title: Deterministic Expander Routing: Faster and More Versatile

Yi-Jun Chang, Shang-En Huang, Hsin-Hao Su

Comments: Accepted to PODC 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We consider the expander routing problem formulated by Ghaffari, Kuhn, and Su (PODC 2017), where the goal is to route all the tokens to their destinations given that each vertex is the source and the destination of at most $°(v)$ tokens. They developed $\textit{randomized algorithms}$ that solve this problem in $\text{poly}(\phi^{-1}) \cdot 2^{O(\sqrt{\log n \log \log n})}$ rounds in the $\textsf{CONGEST}$ model, where $\phi$ is the conductance of the graph. Later, Ghaffari and Li (DISC 2018) gave an improved algorithm. However, both algorithms are randomized, which means that all the resulting applications are also randomized. Recently, Chang and Saranurak (FOCS 2020) gave a deterministic algorithm that solves an expander routing instance in $2^{O(\log^{2/3} n \cdot \log^{1/3} \log n)}$ rounds. The deterministic algorithm is less efficient and does not allow preprocessing/query tradeoffs, which precludes the de-randomization of algorithms that require this feature, such as the $k$-clique enumeration algorithm in general graphs.
The main contribution of our work is a new deterministic expander routing algorithm that not only matches the randomized bound of [GKS 2017] but also allows preprocessing/query tradeoffs. Our algorithm solves a single instance of routing query in $2^{{O}(\sqrt{\log n \cdot \log \log n})}$ rounds. Our algorithm achieves the following preprocessing and query tradeoffs: For $0 < \epsilon < 1$, we can answer every routing query in $\log^{O(1/\epsilon)} n$ rounds at the cost of a $(n^{O(\epsilon)} + \log^{O(1/\epsilon)} n)$-round preprocessing procedure. Combining this with the approach of Censor-Hillel, Leitersdorf, and Vulakh (PODC 2022), we obtain a near-optimal $\tilde{O}(n^{1-2/k})$-round deterministic algorithm for $k$-clique enumeration in general graphs, improving the previous state-of-the-art $n^{1-2/k+o(1)}$.
[91] arXiv:2405.03911 [pdf, ps, html, other]: Title: Federated Graph Condensation with Information Bottleneck Principles

Bo Yan

Comments: 13 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediately benefited various graph learning tasks. However, existing graph condensation methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge the gap, we propose and study the novel problem of federated graph condensation for graph neural networks (GNNs). Specifically, we first propose a general framework for federated graph condensation, in which we decouple the typical gradient matching process for graph condensation into client-side gradient calculation and server-side gradient matching. In this way, the burdensome computation cost in client-side is largely alleviated. Besides, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during the federated training can be utilized to steal the training data under the membership inference attacks (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the federated graph condensation, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Extensive experiments on real-world datasets demonstrate that our framework can consistently protect membership privacy during training. Meanwhile, it also achieves comparable and even superior performance against existing centralized graph condensation and federated graph learning methods.
[92] arXiv:2405.03915 [pdf, ps, html, other]: Title: Motivating Users to Attend to Privacy: A Theory-Driven Design Study

Varun Shiri, Maggie Xiong, Jinghui Cheng, Jin L.C. Guo

Comments: 18 pages, 2 figures, DIS 2024

Subjects: Human-Computer Interaction (cs.HC)

In modern technology environments, raising users' privacy awareness is crucial. Existing efforts largely focused on privacy policy presentation and failed to systematically address a radical challenge of user motivation for initiating privacy awareness. Leveraging the Protection Motivation Theory (PMT), we proposed design ideas and categories dedicated to motivating users to engage with privacy-related information. Using these design ideas, we created a conceptual prototype, enhancing the current App Store product page. Results from an online experiment and follow-up interviews showed that our design effectively motivated participants to attend to privacy issues, raising both the threat appraisal and coping appraisal, two main factors in PMT. Our work indicated that effective design should consider combining PMT components, calibrating information content, and integrating other design elements, such as visual cues and user familiarity. Overall, our study contributes valuable design considerations driven by the PMT to amplify the motivational aspect of privacy communication.
[93] arXiv:2405.03917 [pdf, ps, html, other]: Title: KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

Tianyi Zhang, Jonah Yi, Zhaozhuo Xu, Anshumali Shrivastava

Subjects: Machine Learning (cs.LG)

Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression, but existing methods still fail at very low bit widths. We observe that distinct channels of a key/value activation embedding are highly inter-dependent, and the joint entropy of multiple channels grows at a slower rate than the sum of their marginal entropies. Based on this insight, we propose Coupled Quantization (CQ), which couples multiple key/value channels together to exploit their inter-dependency and encode the activations in a more information-efficient manner. Extensive experiments reveal that CQ outperforms or is competitive with existing baselines in preserving model quality. Furthermore, we demonstrate that CQ can preserve model quality with KV cache quantized down to 1-bit.
[94] arXiv:2405.03918 [pdf, ps, other]: Title: Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak

Subjects: Machine Learning (cs.LG)

In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for convolutional neural networks. Comprehensive evaluations demonstrate the efficacy of our proposed approach compared to state-of-the-art approaches, particularly in realistic data settings.
[95] arXiv:2405.03920 [pdf, ps, html, other]: Title: A Roadmap for Multilingual, Multimodal Domain Independent Deception Detection

Dainis Boumber, Rakesh M. Verma, Fatima Zahra Qachfar

Comments: 6 pages, 1 figure, shorter version in SIAM International Conference on Data Mining (SDM) 2024

Journal-ref: Proc. SDM 2024, 396-399

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

Deception, a prevalent aspect of human communication, has undergone a significant transformation in the digital age. With the globalization of online interactions, individuals are communicating in multiple languages and mixing languages on social media, with varied data becoming available in each language and dialect. At the same time, the techniques for detecting deception are similar across the board. Recent studies have shown the possibility of the existence of universal linguistic cues to deception across domains within the English language; however, the existence of such cues in other languages remains unknown. Furthermore, the practical task of deception detection in low-resource languages is not a well-studied problem due to the lack of labeled data. Another dimension of deception is multimodality. For example, a picture with an altered caption in fake news or disinformation may exist. This paper calls for a comprehensive investigation into the complexities of deceptive language across linguistic boundaries and modalities within the realm of computer security and natural language processing and the possibility of using multilingual transformer models and labeled data in various languages to universally address the task of deception detection.
[96] arXiv:2405.03923 [pdf, ps, other]: Title: An Electronically Tunable 28-34 GHz 2-D Steerable Leaky Wave Antenna

Mahdi Alesheikh, Md Hedayatullah Maktoomi, Soheil Saadat, Hamidreza Aghasi

Comments: 2 pages, 4 figures, Accepted in 2024 IEEE International Symposium on Antennas and Propagation and ITNC-USNC-URSI Radio Science Meeting (USNC-URSI), Florence, Italy

Subjects: Systems and Control (eess.SY)

In this paper, a 2-D beam steering mm-wave antenna based on the leaky wave configuration is presented. Microstrip leaky wave antennas are known to exhibit beam rotation by changing the frequency. In this work, the microstrip leaky wave antenna is adopted and co-integrated with electronically tunable board components that periodically load the antenna. By independent control of variable capacitors and diodes, single-frequency 2-D beam steering across the bandwidth is achieved. The proposed antenna is fabricated in Rogers printed circuit board technologies and the simulation results exhibit a peak realized gain of 8 dBi, radiation bandwidth of 28-34 GHz, radiation efficiency of more than 80%, and more than 90$^\circ$ and 70$^\circ$ of beam rotation in the $\phi$ and $\theta$ directions.
[97] arXiv:2405.03924 [pdf, ps, html, other]: Title: NeurDB: An AI-powered Autonomous Data System

Beng Chin Ooi, Shaofeng Cai, Gang Chen, Kian Lee Tan, Yuncheng Wu, Xiaokui Xiao, Naili Xing, Cong Yue, Lingze Zeng, Meihui Zhang, Zhanhao Zhao

Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, self-driving capabilities for improved system performance, etc. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, our next-generation data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.
[98] arXiv:2405.03927 [pdf, ps, other]: Title: Codexity: Secure AI-assisted Code Generation

Sung Yong Kim, Zhiyu Fan, Yannic Noller, Abhik Roychoudhury

Subjects: Software Engineering (cs.SE)

Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot, CodeWhisperer). In this work, we present Codexity, a security-focused code generation framework integrated with five LLMs. Codexity leverages the feedback of static analysis tools such as Infer and CppCheck to mitigate security vulnerabilities in LLM-generated programs. Our evaluation in a real-world benchmark with 751 automatically generated vulnerable subjects demonstrates Codexity can prevent 60% of the vulnerabilities being exposed to the software developer.
[99] arXiv:2405.03929 [pdf, ps, other]: Title: Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations

Jaesung Park, Sungchul Hong, Yoonseo Cho, Jong-June Jeon

Subjects: Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)

Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architecture named Unicorn, designed to forecast weekly sea ice. Our model integrates multiple time series images within its architecture to enhance its forecasting performance. Moreover, we incorporate a bottleneck layer within the U-Net architecture, serving as neural ordinary differential equations with convolution operations, to capture the spatiotemporal dynamics of latent variables. Through real data analysis with datasets spanning from 1998 to 2021, our proposed model demonstrates significant improvements over state-of-the-art models in the sea ice concentration forecasting task. It achieves an average MAE improvement of 12% compared to benchmark models. Additionally, our method outperforms existing approaches in sea ice extent forecasting, achieving a classification performance improvement of approximately 18%. These experimental results show the superiority of our proposed model.
[100] arXiv:2405.03932 [pdf, ps, other]: Title: CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

Tyler Bikaun, Michael Stewart, Wei Liu

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at this https URL under the MIT License.
[101] arXiv:2405.03935 [pdf, ps, other]: Title: Roadside Units Assisted Localized Automated Vehicle Maneuvering: An Offline Reinforcement Learning Approach

Kui Wang, Changyang She, Zongdian Li, Tao Yu, Yonghui Li, Kei Sakaguchi

Comments: 6 pages, 6 figures

Subjects: Systems and Control (eess.SY)

Traffic intersections present significant challenges for the safe and efficient maneuvering of connected and automated vehicles (CAVs). This research proposes an innovative roadside unit (RSU)-assisted cooperative maneuvering system aimed at enhancing road safety and traveling efficiency at intersections for CAVs. We utilize RSUs for real-time traffic data acquisition and train an offline reinforcement learning (RL) algorithm based on human driving data. Evaluation results obtained from hardware-in-loop autonomous driving simulations show that our approach employing the twin delayed deep deterministic policy gradient and behavior cloning (TD3+BC), achieves performance comparable to state-of-the-art autonomous driving systems in terms of safety measures while significantly enhancing travel efficiency by up to 17.38% in intersection areas. This paper makes a pivotal contribution to the field of intelligent transportation systems, presenting a breakthrough solution for improving urban traffic flow and safety at intersections.
[102] arXiv:2405.03939 [pdf, ps, html, other]: Title: Long Context Alignment with Short Instructions and Synthesized Positions

Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li

Comments: preview

Subjects: Computation and Language (cs.CL)

Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand the context. Through extensive experiments on base models with a variety of context window sizes, SkipAlign demonstrates its effectiveness across a spectrum of long-context tasks. Particularly noteworthy is that with a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
[103] arXiv:2405.03942 [pdf, ps, other]: Title: Collaborative Intelligence in Sequential Experiments: A Human-in-the-Loop Framework for Drug Discovery

Jinghai He, Cheng Hua, Yingfei Wang, Zeyu Zheng

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Drug discovery is a complex process that involves sequentially screening and examining a vast array of molecules to identify those with the target properties. This process, also referred to as sequential experimentation, faces challenges due to the vast search space, the rarity of target molecules, and constraints imposed by limited data and experimental budgets. To address these challenges, we introduce a human-in-the-loop framework for sequential experiments in drug discovery. This collaborative approach combines human expert knowledge with deep learning algorithms, enhancing the discovery of target molecules within a specified experimental budget. The proposed algorithm processes experimental data to recommend both promising molecules and those that could improve its performance to human experts. Human experts retain the final decision-making authority based on these recommendations and their domain expertise, including the ability to override algorithmic recommendations. We applied our method to drug discovery tasks using real-world data and found that it consistently outperforms all baseline methods, including those which rely solely on human or algorithmic input. This demonstrates the complementarity between human experts and the algorithm. Our results provide key insights into the levels of humans' domain knowledge, the importance of meta-knowledge, and effective work delegation strategies. Our findings suggest that such a framework can significantly accelerate the development of new vaccines and drugs by leveraging the best of both human and artificial intelligence.
[104] arXiv:2405.03943 [pdf, ps, other]: Title: Predictive Modeling with Temporal Graphical Representation on Electronic Health Records

Jiayuan Chen, Changchang Yin, Yuanlong Wang, Ping Zhang

Comments: IJCAI 2024 main track

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient's EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be roughly categorized into sequential representation and graphical representation. The sequential representation methods focus only on the temporal relationships among longitudinal visits. On the other hand, the graphical representation approaches, while adept at extracting the graph-structured relationships between various medical events, fall short in effectively integrate temporal information. To capture both types of information, we model a patient's EHR as a novel temporal heterogeneous graph. This graph includes historical visits nodes and medical events nodes. It propagates structured information from medical event nodes to visit nodes and utilizes time-aware visit nodes to capture changes in the patient's health status. Furthermore, we introduce a novel temporal graph transformer (TRANS) that integrates temporal edge features, global positional encoding, and local structural encoding into heterogeneous graph convolution, capturing both temporal and structural information. We validate the effectiveness of TRANS through extensive experiments on three real-world datasets. The results show that our proposed approach achieves state-of-the-art performance.
[105] arXiv:2405.03945 [pdf, ps, html, other]: Title: Role of Sensing and Computer Vision in 6G Wireless Communications

Seungnyun Kim, Jihoon Moon, Jinhong Kim, Yongjun Ahn, Donghoon Kim, Sunwoo Kim, Kyuhong Shim, Byonghyo Shim

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, we are witnessing the remarkable progress and widespread adoption of sensing technologies in autonomous driving, robotics, and metaverse. Considering the rapid advancement of computer vision (CV) technology to analyze the sensing information, we anticipate a proliferation of wireless applications exploiting the sensing and CV technologies in 6G. In this article, we provide a holistic overview of the sensing and CV-aided wireless communications (SVWC) framework for 6G. By analyzing the high-resolution sensing information through the powerful CV techniques, SVWC can quickly and accurately understand the wireless environments and then perform the wireless tasks. To demonstrate the efficacy of SVWC, we design the whole process of SVWC including the sensing dataset collection, DL model training, and execution of realistic wireless tasks. From the numerical evaluations on 6G communication scenarios, we show that SVWC achieves considerable performance gains over the conventional 5G systems in terms of positioning accuracy, data rate, and access latency.
[106] arXiv:2405.03946 [pdf, ps, other]: Title: Association between centrality and flourishing trait: analyzing student co-occurrence networks drawn from dining activities

Yi Cao, Shimin Cai, Xiaorong Shen, Tao Zhou

Comments: 14 pages, 2 figures, 1 Table

Subjects: Social and Information Networks (cs.SI)

Comprehending the association between social capabilities and individual psychological traits is paramount for educational administrators. Presently, many studies heavily depend on online questionnaires and self-reported data, while analysis of the connection between offline social networks and mental health status remains scarce. By leveraging a public dataset encompassing on-campus dining activities over 21 weeks, we establish student co-occurrence networks and closely observe the changes in network topology over time. Empirical analysis shows that the node centralities of the student co-occurrence networks exhibit significantly positive correlation with the enhancement of the flourishing trait within the field of mental well-being. Our findings offer potential guidance for assisting students in maintaining a positive mental health status.
[107] arXiv:2405.03948 [pdf, ps, html, other]: Title: The Fault in Our Recommendations: On the Perils of Optimizing the Measurable

Omar Besbes, Yash Kanoria, Akshit Kumar

Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

Recommendation systems are widespread, and through customized recommendations, promise to match users with options they will like. To that end, data on engagement is collected and used. Most recommendation systems are ranking-based, where they rank and recommend items based on their predicted engagement. However, the engagement signals are often only a crude proxy for utility, as data on the latter is rarely collected or available. This paper explores the following question: By optimizing for measurable proxies, are recommendation systems at risk of significantly under-delivering on utility? If so, how can one improve utility which is seldom measured? To study these questions, we introduce a model of repeated user consumption in which, at each interaction, users select between an outside option and the best option from a recommendation set. Our model accounts for user heterogeneity, with the majority preferring ``popular'' content, and a minority favoring ``niche'' content. The system initially lacks knowledge of individual user preferences but can learn them through observations of users' choices over time. Our theoretical and numerical analysis demonstrate that optimizing for engagement can lead to significant utility losses. Instead, we propose a utility-aware policy that initially recommends a mix of popular and niche content. As the platform becomes more forward-looking, our utility-aware policy achieves the best of both worlds: near-optimal utility and near-optimal engagement simultaneously. Our study elucidates an important feature of recommendation systems; given the ability to suggest multiple items, one can perform significant exploration without incurring significant reductions in engagement. By recommending high-risk, high-reward items alongside popular items, systems can enhance discovery of high utility items without significantly affecting engagement.
[108] arXiv:2405.03949 [pdf, ps, html, other]: Title: FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

Shusen Jing, Anlan Yu, Shuai Zhang, Songyang Zhang

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Recent efforts have been made to integrate self-supervised learning (SSL) with the framework of federated learning (FL). One unique challenge of federated self-supervised learning (FedSSL) is that the global objective of FedSSL usually does not equal the weighted sum of local SSL objectives. Consequently, conventional approaches, such as federated averaging (FedAvg), fail to precisely minimize the FedSSL global objective, often resulting in suboptimal performance, especially when data is non-i.i.d.. To fill this gap, we propose a provable FedSSL algorithm, named FedSC, based on the spectral contrastive objective. In FedSC, clients share correlation matrices of data representations in addition to model weights periodically, which enables inter-client contrast of data samples in addition to intra-client contrast and contraction, resulting in improved quality of data representations. Differential privacy (DP) protection is deployed to control the additional privacy leakage on local datasets when correlation matrices are shared. We also provide theoretical analysis on the convergence and extra privacy leakage. The experimental results validate the effectiveness of our proposed algorithm.
[109] arXiv:2405.03950 [pdf, ps, html, other]: Title: Relating-Up: Advancing Graph Neural Networks through Inter-Graph Relationships

Qi Zou, Na Yu, Daoliang Zhang, Wei Zhang, Rui Gao

Comments: 16 pages, 6 figures, 9 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a plug-and-play module that enhances GNNs by exploiting inter-graph relationships. This module incorporates a relation-aware encoder and a feedback training strategy. The former enables GNNs to capture relationships across graphs, enriching relation-aware graph representation through collective context. The latter utilizes a feedback loop mechanism for the recursively refinement of these representations, leveraging insights from refining inter-graph dynamics to conduct feedback loop. The synergy between these two innovations results in a robust and versatile module. Relating-Up enhances the expressiveness of GNNs, enabling them to encapsulate a wider spectrum of graph relationships with greater precision. Our evaluations across 16 benchmark datasets demonstrate that integrating Relating-Up into GNN architectures substantially improves performance, positioning Relating-Up as a formidable choice for a broad spectrum of graph representation learning tasks.
[110] arXiv:2405.03952 [pdf, ps, other]: Title: HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech

Zhongren Dong, Zixing Zhang, Weixiang Xu, Jing Han, Jianjun Ou, Björn W. Schuller

Journal-ref: publised at ICASSP 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying such models on edge devices. In this context, we construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for AD detection. Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation, and a GELU-based Gated Linear Unit to replace the feedforward layer, aiming to automatically filter out the redundant information. Moreover, we design a hierarchical structure to force it to learn a variety of information grains, from the frame level to the dialogue level. By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work, but with significant computational complexity and model size reduction compared to the standard Transformer. This shows the efficiency of HAFFormer in dealing with long audio for AD detection.
[111] arXiv:2405.03953 [pdf, ps, other]: Title: Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation

Zixing Zhang, Tao Pang, Jing Han, Björn W. Schuller

Journal-ref: published at ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Heart murmurs are a common manifestation of cardiovascular diseases and can provide crucial clues to early cardiac abnormalities. While most current research methods primarily focus on the accuracy of models, they often overlook other important aspects such as the interpretability of machine learning algorithms and the uncertainty of predictions. This paper introduces a heart murmur detection method based on a parallel-attentive model, which consists of two branches: One is based on a self-attention module and the other one is based on a convolutional network. Unlike traditional approaches, this structure is better equipped to handle long-term dependencies in sequential data, and thus effectively captures the local and global features of heart murmurs. Additionally, we acknowledge the significance of understanding the uncertainty of model predictions in the medical field for clinical decision-making. Therefore, we have incorporated an effective uncertainty estimation method based on Monte Carlo Dropout into our model. Furthermore, we have employed temperature scaling to calibrate the predictions of our probabilistic model, enhancing its reliability. In experiments conducted on the CirCor Digiscope dataset for heart murmur detection, our proposed method achieves a weighted accuracy of 79.8% and an F1 of 65.1%, representing state-of-the-art results.
[112] arXiv:2405.03955 [pdf, ps, other]: Title: IPFed: Identity protected federated learning for user authentication

Yosuke Kaga, Yusei Suzuki, Kenta Takahashi

Journal-ref: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

With the development of laws and regulations related to privacy preservation, it has become difficult to collect personal data to perform machine learning. In this context, federated learning, which is distributed learning without sharing personal data, has been proposed. In this paper, we focus on federated learning for user authentication. We show that it is difficult to achieve both privacy preservation and high accuracy with existing methods. To address these challenges, we propose IPFed which is privacy-preserving federated learning using random projection for class embedding. Furthermore, we prove that IPFed is capable of learning equivalent to the state-of-the-art method. Experiments on face image datasets show that IPFed can protect the privacy of personal data while maintaining the accuracy of the state-of-the-art method.
[113] arXiv:2405.03956 [pdf, ps, other]: Title: Adaptive Speech Emotion Representation Learning Based On Dynamic Graph

Yingxue Gao, Huan Zhao, Zixing Zhang

Journal-ref: published at ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Graph representation learning has become a hot research topic due to its powerful nonlinear fitting capability in extracting representative node embeddings. However, for sequential data such as speech signals, most traditional methods merely focus on the static graph created within a sequence, and largely overlook the intrinsic evolving patterns of these data. This may reduce the efficiency of graph representation learning for sequential data. For this reason, we propose an adaptive graph representation learning method based on dynamically evolved graphs, which are consecutively constructed on a series of subsequences segmented by a sliding window. In doing this, it is better to capture local and global context information within a long sequence. Moreover, we introduce a weighted approach to update the node representation rather than the conventional average one, where the weights are calculated by a novel matrix computation based on the degree of neighboring nodes. Finally, we construct a learnable graph convolutional layer that combines the graph structure loss and classification loss to optimize the graph structure. To verify the effectiveness of the proposed method, we conducted experiments for speech emotion recognition on the IEMOCAP and RAVDESS datasets. Experimental results show that the proposed method outperforms the latest (non-)graph-based models.
[114] arXiv:2405.03958 [pdf, ps, html, other]: Title: Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.
[115] arXiv:2405.03959 [pdf, ps, other]: Title: Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints

Xiongjun Guan, Zhiyu Pan, Jianjiang Feng, Jie Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Currently, portable electronic devices are becoming more and more popular. For lightweight considerations, their fingerprint recognition modules usually use limited-size sensors. However, partial fingerprints have few matchable features, especially when there are differences in finger pressing posture or image quality, which makes partial fingerprint verification challenging. Most existing methods regard fingerprint position rectification and identity verification as independent tasks, ignoring the coupling relationship between them -- relative pose estimation typically relies on paired features as anchors, and authentication accuracy tends to improve with more precise pose alignment. Consequently, in this paper we propose a method that jointly estimates identity verification and relative pose for partial fingerprints, aiming to leverage their inherent correlation to improve each other. To achieve this, we propose a multi-task CNN (Convolutional Neural Network)-Transformer hybrid network, and design a pre-training task to enhance the feature extraction capability. Experiments on multiple public datasets (NIST SD14, FVC2002 DB1A & DB3A, FVC2004 DB1A & DB2A, FVC2006 DB1A) and an in-house dataset show that our method achieves state-of-the-art performance in both partial fingerprint verification and relative pose estimation, while being more efficient than previous methods.
[116] arXiv:2405.03960 [pdf, ps, other]: Title: ESIHGNN: Event-State Interactions Infused Heterogeneous Graph Neural Network for Conversational Emotion Recognition

Xupeng Zha, Huan Zhao, Zixing Zhang

Journal-ref: published at ICASSP 2024

Subjects: Computation and Language (cs.CL)

Conversational Emotion Recognition (CER) aims to predict the emotion expressed by an utterance (referred to as an ``event'') during a conversation. Existing graph-based methods mainly focus on event interactions to comprehend the conversational context, while overlooking the direct influence of the speaker's emotional state on the events. In addition, real-time modeling of the conversation is crucial for real-world applications but is rarely considered. Toward this end, we propose a novel graph-based approach, namely Event-State Interactions infused Heterogeneous Graph Neural Network (ESIHGNN), which incorporates the speaker's emotional state and constructs a heterogeneous event-state interaction graph to model the conversation. Specifically, a heterogeneous directed acyclic graph neural network is employed to dynamically update and enhance the representations of events and emotional states at each turn, thereby improving conversational coherence and consistency. Furthermore, to further improve the performance of CER, we enrich the graph's edges with external knowledge. Experimental results on four publicly available CER datasets show the superiority of our approach and the effectiveness of the introduced heterogeneous event-state interaction graph.
[117] arXiv:2405.03961 [pdf, ps, html, other]: Title: Structure-based drug design by denoising voxel grids

Pedro O. Pinheiro, Arian Jamasb, Omar Mahmood, Vishnu Sresht, Saeed Saremi

Subjects: Machine Learning (cs.LG)

We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-step procedure: (i) sample noisy molecules from the Gaussian-smoothed conditional distribution with underdamped Langevin MCMC using the learned score function and (ii) estimate clean molecules from the noisy samples with single-step denoising. Compared to the current state of the art, our model is simpler to train, significantly faster to sample from, and achieves better results on extensive in silico benchmarks -- the generated molecules are more diverse, exhibit fewer steric clashes, and bind with higher affinity to protein pockets.
[118] arXiv:2405.03962 [pdf, ps, other]: Title: AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion

Adeesh Kolluru, John R Kitchin

Comments: 8 pages, 7 figures, ICML 2024

Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)

Determining the optimal configuration of adsorbates on a slab (adslab) is pivotal in the exploration of novel catalysts across diverse applications. Traditionally, the quest for the lowest energy adslab configuration involves placing the adsorbate onto the slab followed by an optimization process. Prior methodologies have relied on heuristics, problem-specific intuitions, or brute-force approaches to guide adsorbate placement. In this work, we propose a novel framework for adsorbate placement using denoising diffusion. The model is designed to predict the optimal adsorbate site and orientation corresponding to the lowest energy configuration. Further, we have an end-to-end evaluation framework where diffusion-predicted adslab configuration is optimized with a pretrained machine learning force field and finally evaluated with Density Functional Theory (DFT). Our findings demonstrate an acceleration of up to 5x or 3.5x improvement in accuracy compared to the previous best approach. Given the novelty of this framework and application, we provide insights into the impact of pre-training, model architectures, and conduct extensive experiments to underscore the significance of this approach.
[119] arXiv:2405.03963 [pdf, ps, html, other]: Title: ERATTA: Extreme RAG for Table To Answers with Large Language Models

Sohini Roychowdhury, Marko Krema, Anvar Mahammad, Brian Moore, Arijit Mukherjee, Punit Prakashchandra

Comments: 5 pages, 3 tables, Asilomar SSC Conference, 2024

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) with residual augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. However, the choice of use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user query routing, data retrieval and custom prompting for question answering capabilities from data tables that are highly varying and large in size. Our system is tuned to extract information from Enterprise-level data products and furnish real time responses under 10 seconds. One prompt manages user-to-data authentication followed by three prompts to route, fetch data and generate a customizable prompt natural language responses. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.
[120] arXiv:2405.03967 [pdf, ps, other]: Title: SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems

Kailash Gogineni, Sai Santosh Dayapule, Juan Gómez-Luna, Karthikeya Gogineni, Peng Wei, Tian Lan, Mohammad Sadrosadati, Onur Mutlu, Guru Venkataramani

Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)

Reinforcement Learning (RL) trains agents to learn optimal behavior by maximizing reward signals from experience datasets. However, RL training often faces memory limitations, leading to execution latencies and prolonged training times. To overcome this, SwiftRL explores Processing-In-Memory (PIM) architectures to accelerate RL workloads. We achieve near-linear performance scaling by implementing RL algorithms like Tabular Q-learning and SARSA on UPMEM PIM systems and optimizing for hardware. Our experiments on OpenAI GYM environments using UPMEM hardware demonstrate superior performance compared to CPU and GPU implementations.
[121] arXiv:2405.03969 [pdf, ps, other]: Title: Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform

Zhijian Qiao, Haoming Huang, Chuhao Liu, Shaojie Shen, Fumin Zhang, Huan Yin

Comments: 12 pages, 10 figures

Subjects: Robotics (cs.RO)

The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., ``speak the same language''. To achieve this, we design a cross-modality registration method, spanning from front end the back end. At the front end, we extract descriptors by identifying walls and capturing the intersected corners. Subsequently, for the back-end pose estimation, we employ the Hough transform for pose estimation and estimate multiple pose candidates. The final pose is verified by wall-pixel correlation. To evaluate the effectiveness of our method, we conducted real-world multi-session experiments in a large-scale university building, involving two different types of LiDAR sensors. We also report our findings and plan to make our collected dataset open-sourced.
[122] arXiv:2405.03971 [pdf, ps, html, other]: Title: Unified End-to-End V2X Cooperative Autonomous Driving

Zhiwei Li, Bozhen Zhang, Lei Yang, Tianyu Shen, Nuo Xu, Ruosen Hao, Weiting Li, Tao Yan, Huaping Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issues of autonomous driving. To address this challenge, this paper introduces the UniE2EV2X framework, a V2X-integrated end-to-end autonomous driving system that consolidates key driving modules within a unified network. The framework employs a deformable attention-based data fusion strategy, effectively facilitating cooperation between vehicles and infrastructure. The main advantages include: 1) significantly enhancing agents' perception and motion prediction capabilities, thereby improving the accuracy of accident predictions; 2) ensuring high reliability in the data fusion process; 3) superior end-to-end perception compared to modular approaches. Furthermore, We implement the UniE2EV2X framework on the challenging DeepAccident, a simulation dataset designed for V2X cooperative driving.
[123] arXiv:2405.03972 [pdf, ps, html, other]: Title: Contextualization with SPLADE for High Recall Retrieval

Eugene Yang

Comments: 5 pages, 1 figure, accepted at SIGIR 2024 as short paper

Subjects: Information Retrieval (cs.IR)

High Recall Retrieval (HRR), such as eDiscovery and medical systematic review, is a search problem that optimizes the cost of retrieving most relevant documents in a given collection. Iterative approaches, such as iterative relevance feedback and uncertainty sampling, are shown to be effective under various operational scenarios. Despite neural models demonstrating success in other text-related tasks, linear models such as logistic regression, in general, are still more effective and efficient in HRR since the model is trained and retrieves documents from the same fixed collection. In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors, for HRR. Our approach combines the best of both worlds, leveraging both the contextualization from pretrained language models and the efficiency of linear models. It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%. The experiment is implemented with TARexp and is available at this https URL.
[124] arXiv:2405.03974 [pdf, ps, other]: Title: TBNet: A Neural Architectural Defense Framework Facilitating DNN Model Protection in Trusted Execution Environments

Ziyu Liu, Tong Zhou, Yukui Luo, Xiaolin Xu

Journal-ref: DAC2024

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Trusted Execution Environments (TEEs) have become a promising solution to secure DNN models on edge devices. However, the existing solutions either provide inadequate protection or introduce large performance overhead. Taking both security and performance into consideration, this paper presents TBNet, a TEE-based defense framework that protects DNN model from a neural architectural perspective. Specifically, TBNet generates a novel Two-Branch substitution model, to respectively exploit (1) the computational resources in the untrusted Rich Execution Environment (REE) for latency reduction and (2) the physically-isolated TEE for model protection. Experimental results on a Raspberry Pi across diverse DNN model architectures and datasets demonstrate that TBNet achieves efficient model protection at a low cost.
[125] arXiv:2405.03977 [pdf, ps, html, other]: Title: Can citations tell us about a paper's reproducibility? A case study of machine learning papers

Rochana R. Obadage, Sarah M. Rajtmajer, Jian Wu

Comments: 9 pages, 4 figures

Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature. Yet, resource constraints and inadequate documentation can make running replications particularly challenging. Our work explores the potential of using downstream citation contexts as a signal of reproducibility. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges in order to interpret the positive or negative outcomes of reproduction attempts. Our contributions include training classifiers for reproducibility-related contexts and sentiment analysis, and exploring correlations between citation context sentiment and reproducibility scores. Study data, software, and an artifact appendix are publicly available at this https URL .
[126] arXiv:2405.03978 [pdf, ps, html, other]: Title: VMambaCC: A Visual State Space Model for Crowd Counting

Hao-Yuan Ma, Li Zhang, Shuai Shi

Subjects: Computer Vision and Pattern Recognition (cs.CV)

As a deep learning model, Visual Mamba (VMamba) has a low computational complexity and a global receptive field, which has been successful applied to image classification and detection. To extend its applications, we apply VMamba to crowd counting and propose a novel VMambaCC (VMamba Crowd Counting) model. Naturally, VMambaCC inherits the merits of VMamba, or global modeling for images and low computational cost. Additionally, we design a Multi-head High-level Feature (MHF) attention mechanism for VMambaCC. MHF is a new attention mechanism that leverages high-level semantic features to augment low-level semantic features, thereby enhancing spatial feature representation with greater precision. Building upon MHF, we further present a High-level Semantic Supervised Feature Pyramid Network (HS2PFN) that progressively integrates and enhances high-level semantic information with low-level semantic information. Extensive experimental results on five public datasets validate the efficacy of our approach. For example, our method achieves a mean absolute error of 51.87 and a mean squared error of 81.3 on the ShangHaiTech\_PartA dataset. Our code is coming soon.
[127] arXiv:2405.03981 [pdf, ps, html, other]: Title: Predicting Lung Disease Severity via Image-Based AQI Analysis using Deep Learning Techniques

Anvita Mahajan, Sayali Mate, Chinmayee Kulkarni, Suraj Sawant

Comments: 11 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Air pollution is a significant health concern worldwide, contributing to various respiratory diseases. Advances in air quality mapping, driven by the emergence of smart cities and the proliferation of Internet-of-Things sensor devices, have led to an increase in available data, fueling momentum in air pollution forecasting. The objective of this study is to devise an integrated approach for predicting air quality using image data and subsequently assessing lung disease severity based on Air Quality Index (AQI).The aim is to implement an integrated approach by refining existing techniques to improve accuracy in predicting AQI and lung disease severity. The study aims to forecast additional atmospheric pollutants like AQI, PM10, O3, CO, SO2, NO2 in addition to PM2.5 levels. Additionally, the study aims to compare the proposed approach with existing methods to show its effectiveness. The approach used in this paper uses VGG16 model for feature extraction in images and neural network for predicting this http URL predicting lung disease severity, Support Vector Classifier (SVC) and K-Nearest Neighbors (KNN) algorithms are utilized. The neural network model for predicting AQI achieved training accuracy of 88.54 % and testing accuracy of 87.44%,which was measured using loss function, while the KNN model used for predicting lung disease severity achieved training accuracy of 98.4% and testing accuracy of 97.5% In conclusion, the integrated approach presented in this study forecasts air quality and evaluates lung disease severity, achieving high testing accuracies of 87.44% for AQI and 97.5% for lung disease severity using neural network, KNN, and SVC models. The future scope involves implementing transfer learning and advanced deep learning modules to enhance prediction capabilities. While the current study focuses on India, the objective is to expand its scope to encompass global coverage.
[128] arXiv:2405.03986 [pdf, ps, other]: Title: Factors Influencing User Willingness To Use SORA

Gustave Florentin Nkoulou Mvondo, Ben Niu

Comments: 27 pages, 3 figures, 7 tables, 2 authors; first author* corresponding author,

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Sora promises to redefine the way visual content is created. Despite its numerous forecasted benefits, the drivers of user willingness to use the text-to-video (T2V) model are unknown. This study extends the extended unified theory of acceptance and use of technology (UTAUT2) with perceived realism and novelty value. Using a purposive sampling method, we collected data from 940 respondents in the US and analyzed the sample using covariance-based structural equation modeling and fuzzy set qualitative comparative analysis (fsQCA). The findings reveal that all hypothesized relationships are supported, with perceived realism emerging as the most influential driver, followed by novelty value. Moreover, fsQCA identifies five configurations leading to high and low willingness to use, and the model demonstrates high predictive validity, contributing to theory advancement. Our study provides valuable insights for developers and marketers, offering guidance for strategic decisions to promote the widespread adoption of T2V models.
[129] arXiv:2405.03987 [pdf, ps, other]: Title: Navigating Chemical Space with Latent Flows

Guanghao Wei, Yining Huang, Chenru Duan, Yue Song, Yuanqi Du

Comments: Preprint of Neurips 2024 submission

Subjects: Machine Learning (cs.LG)

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at this https URL.
[130] arXiv:2405.03988 [pdf, ps, other]: Title: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai

Comments: 11 pages, 6 figures

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabilities of Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems by integrating open-world domain knowledge. In this paper, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through offline experiments on the large-scale industrial dataset and online experiments on A/B tests, we demonstrate the efficacy of our approach.
[131] arXiv:2405.03989 [pdf, ps, other]: Title: A Method for Parsing and Vectorization of Semi-structured Data used in Retrieval Augmented Generation

Hang Yang, Jing Guo, Jianchuan Qi, Jinliang Xie, Si Zhang, Siqi Yang, Nan Li, Ming Xu

Comments: 20 pages,4 figures, 5 tables

Subjects: Databases (cs.DB)

This paper presents a novel method for parsing and vectorizing semi-structured data to enhance the functionality of Retrieval-Augmented Generation (RAG) within Large Language Models (LLMs). We developed a comprehensive pipeline for converting various data formats into .docx, enabling efficient parsing and structured data extraction. The core of our methodology involves the construction of a vector database using Pinecone, which integrates seamlessly with LLMs to provide accurate, context-specific responses, particularly in environmental management and wastewater treatment operations. Through rigorous testing with both English and Chinese texts in diverse document formats, our results demonstrate a marked improvement in the precision and reliability of LLMs outputs. The RAG-enhanced models displayed enhanced ability to generate contextually rich and technically accurate responses, underscoring the potential of vector knowledge bases in significantly boosting the performance of LLMs in specialized domains. This research not only illustrates the effectiveness of our method but also highlights its potential to revolutionize data processing and analysis in environmental sciences, setting a precedent for future advancements in AI-driven applications. Our code is available at this https URL.
[132] arXiv:2405.03990 [pdf, ps, other]: Title: TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

Comments: 11 pages, 7 figures. This paper has been accepted by ICDCS 2024. arXiv admin note: substantial text overlap with arXiv:2404.14204

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.
[133] arXiv:2405.03991 [pdf, ps, html, other]: Title: Assemblage: Automatic Binary Dataset Construction for Machine Learning

Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, Kristopher Micinski

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Binary code is pervasive, and binary analysis is a key task in reverse engineering, malware classification, and vulnerability discovery. Unfortunately, while there exist large corpuses of malicious binaries, obtaining high-quality corpuses of benign binaries for modern systems has proven challenging (e.g., due to licensing issues). Consequently, machine learning based pipelines for binary analysis utilize either costly commercial corpuses (e.g., VirusTotal) or open-source binaries (e.g., coreutils) available in limited quantities. To address these issues, we present Assemblage: an extensible cloud-based distributed system that crawls, configures, and builds Windows PE binaries to obtain high-quality binary corpuses suitable for training state-of-the-art models in binary analysis. We have run Assemblage on AWS over the past year, producing 890k Windows PE and 428k Linux ELF binaries across 29 configurations. Assemblage is designed to be both reproducible and extensible, enabling users to publish "recipes" for their datasets, and facilitating the extraction of a wide array of features. We evaluated Assemblage by using its data to train modern learning-based pipelines for compiler provenance and binary function similarity. Our results illustrate the practical need for robust corpuses of high-quality Windows PE binaries in training modern learning-based binary analyses. Assemblage can be downloaded from this https URL
[134] arXiv:2405.03992 [pdf, ps, other]: Title: Research on financial fraud algorithm based on federal learning and big data technology

Xinye Sha

Subjects: Computational Engineering, Finance, and Science (cs.CE)

With the deepening of the digitization degree of financial business, financial fraud presents more complex and hidden characteristics, which poses a severe challenge to the risk prevention and control ability of financial institutions. At the same time, the vigorous development of big data technology provides massive potential information resources, and federated learning, as an emerging distributed machine learning paradigm, can realize multi-party data collaborative modeling under the premise of protecting data privacy. This paper firstly elaborates the basic principle, advantages and unique value of federated learning in solving data silos and protecting user privacy. Aiming at the needs of financial fraud detection, this paper discusses the design of federal learning architecture suitable for this scenario, including selecting suitable model type (such as neural network), setting reasonable data partitioning and updating rules. The central theme of the dissertation revolves around the exploration and execution of an algorithm for detecting financial fraud, which is grounded in federated learning methodologies. With a federated learning framework, each participant trains the model locally and exchanges only model parameters rather than raw data, enabling iterative optimization of the global model while protecting data privacy. To ascertain the efficacy and superiority of the suggested algorithm, a meticulous experimental investigation is both devised and executed. A real-world financial fraud dataset is selected to compare the fraud detection performance using traditional centralized learning and federated learning. The findings from the experiments reveal that the federated learning-based financial fraud algorithm achieves a substantial reduction in the likelihood of data privacy breaches without compromising on high detection accuracies.
[135] arXiv:2405.03994 [pdf, ps, other]: Title: A Guide to Re-Implementing Agent-based Models: Experiences from the HUMAT Model

Önder Gürcan, Timo Szczepanska, Patrycja Antosz

Comments: 12 pages, 5 figures, The 18th annual Social Simulation Conference (SSC 2023)

Subjects: Multiagent Systems (cs.MA)

Replicating existing agent-based models poses significant challenges, particularly for those new to the field. This article presents an all-encompassing guide to re-implementing agent-based models, encompassing vital concepts such as comprehending the original model, utilizing agent-based modeling frameworks, simulation design, model validation, and more. By embracing the proposed guide, researchers and practitioners can gain a profound understanding of the entire re-implementation process, resulting in heightened accuracy and reliability of simulations for complex systems. Furthermore, this article showcases the re-implementation of the HUMAT socio-cognitive architecture, with a specific focus on designing a versatile, language-independent model. The encountered challenges and pitfalls in the re-implementation process are thoroughly discussed, empowering readers with practical insights. Embrace this guide to expedite model development while ensuring robust and precise simulations.
[136] arXiv:2405.03995 [pdf, ps, html, other]: Title: Deep Event-based Object Detection in Autonomous Driving: A Survey

Bingquan Zhou, Jie Jiang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detection plays a critical role in autonomous driving, where accurately and efficiently detecting objects in fast-moving scenes is crucial. Traditional frame-based cameras face challenges in balancing latency and bandwidth, necessitating the need for innovative solutions. Event cameras have emerged as promising sensors for autonomous driving due to their low latency, high dynamic range, and low power consumption. However, effectively utilizing the asynchronous and sparse event data presents challenges, particularly in maintaining low latency and lightweight architectures for object detection. This paper provides an overview of object detection using event data in autonomous driving, showcasing the competitive benefits of event cameras.
[137] arXiv:2405.03998 [pdf, ps, html, other]: Title: Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches

Chen Zhu-Tian, Zeyu Xiong, Xiaoshuo Yao, Elena Glassman

Comments: 4 pages

Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)

Crafting effective prompts for code generation or editing with Large Language Models (LLMs) is not an easy task. Particularly, the absence of immediate, stable feedback during prompt crafting hinders effective interaction, as users are left to mentally imagine possible outcomes until the code is generated. In response, we introduce Language-Oriented Code Sketching, an interactive approach that provides instant, incremental feedback in the form of code sketches (i.e., incomplete code outlines) during prompt crafting. This approach converts a prompt into a code sketch by leveraging the inherent linguistic structures within the prompt and applying classic natural language processing techniques. The sketch then serves as an intermediate placeholder that not only previews the intended code structure but also guides the LLM towards the desired code, thereby enhancing human-LLM interaction. We conclude by discussing the approach's applicability and future plans.
[138] arXiv:2405.03999 [pdf, ps, other]: Title: Interaction Design for Human-AI Choreography Co-creation

Yimeng Liu

Comments: GenAICHI: CHI 2024 Workshop on Generative AI and HCI

Subjects: Human-Computer Interaction (cs.HC)

Human-AI co-creation aims to combine human and AI strengths for artistic results exceeding individual capabilities. Frameworks exist for painting, music, and poetry, but choreography's embodied nature demands a dedicated approach. This paper explores AI-assisted choreography techniques (e.g., generative ideation, embodied improvisation) and analyzes interaction design -- how humans and AI collaborate and communicate -- to inform the design considerations of future human-AI choreography co-creation systems.
[139] arXiv:2405.04000 [pdf, ps, other]: Title: Distributed Invariant Kalman Filter for Cooperative Localization using Matrix Lie Groups

Yizhi Zhou, Yufan Liu, Pengxiang Zhu, Xuan Wang

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper studies the problem of Cooperative Localization (CL) for multi-robot systems, where a group of mobile robots jointly localize themselves by using measurements from onboard sensors and shared information from other robots. We propose a novel distributed invariant Kalman Filter (DInEKF) based on the Lie group theory, to solve the CL problem in a 3-D environment. Unlike the standard EKF which computes the Jacobians based on the linearization at the state estimate, DInEKF defines the robots' motion model on matrix Lie groups and offers the advantage of state estimate-independent Jacobians. This significantly improves the consistency of the estimator. Moreover, the proposed algorithm is fully distributed, relying solely on each robot's ego-motion measurements and information received from its one-hop communication neighbors. The effectiveness of the proposed algorithm is validated in both Monte-Carlo simulations and real-world experiments. The results show that the proposed DInEKF outperforms the standard distributed EKF in terms of both accuracy and consistency.
[140] arXiv:2405.04007 [pdf, ps, html, other]: Title: SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

Yuying Ge, Sijie Zhao, Chen Li, Yixiao Ge, Ying Shan

Comments: Technical Report; Dataset released in this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario data collected from the internet, which captures the intricacies of user intentions for promoting the practical application of image editing in the real world. (3) High-precision multi-turn editing data annotated by humans, which involves multiple rounds of edits for simulating iterative editing processes. The combination of these diverse data sources makes SEED-Data-Edit a comprehensive and versatile dataset for training language-guided image editing model. We fine-tune a pretrained Multimodal Large Language Model (MLLM) that unifies comprehension and generation with SEED-Data-Edit. The instruction tuned model demonstrates promising results, indicating the potential and effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. The datasets are released in this https URL.
[141] arXiv:2405.04009 [pdf, ps, html, other]: Title: Structured Click Control in Transformer-based Interactive Segmentation

Long Xu, Yongquan Chen, Rui Huang, Feng Wu, Shiwu Lai

Comments: 10 pages, 6 figures, submitted to NeurIPS 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Click-point-based interactive segmentation has received widespread attention due to its efficiency. However, it's hard for existing algorithms to obtain precise and robust responses after multiple clicks. In this case, the segmentation results tend to have little change or are even worse than before. To improve the robustness of the response, we propose a structured click intent model based on graph neural networks, which adaptively obtains graph nodes via the global similarity of user-clicked Transformer tokens. Then the graph nodes will be aggregated to obtain structured interaction features. Finally, the dual cross-attention will be used to inject structured interaction features into vision Transformer features, thereby enhancing the control of clicks over segmentation results. Extensive experiments demonstrated the proposed algorithm can serve as a general structure in improving Transformer-based interactive segmenta?tion performance. The code and data will be released at this https URL.
[142] arXiv:2405.04010 [pdf, ps, html, other]: Title: Explainability-Informed Targeted Malware Misclassification

Quincy Card, Kshitiz Aryal, Maanak Gupta

Subjects: Cryptography and Security (cs.CR)

In recent years, there has been a surge in malware attacks across critical infrastructures, requiring further research and development of appropriate response and remediation strategies in malware detection and classification. Several works have used machine learning models for malware classification into categories, and deep neural networks have shown promising results. However, these models have shown its vulnerabilities against intentionally crafted adversarial attacks, which yields misclassification of a malicious file. Our paper explores such adversarial vulnerabilities of neural network based malware classification system in the dynamic and online analysis environments. To evaluate our approach, we trained Feed Forward Neural Networks (FFNN) to classify malware categories based on features obtained from dynamic and online analysis environments. We use the state-of-the-art method, SHapley Additive exPlanations (SHAP), for the feature attribution for malware classification, to inform the adversarial attackers about the features with significant importance on classification decision. Using the explainability-informed features, we perform targeted misclassification adversarial white-box evasion attacks using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks against the trained classifier. Our results demonstrated high evasion rate for some instances of attacks, showing a clear vulnerability of a malware classifier for such attacks. We offer recommendations for a balanced approach and a benchmark for much-needed future research into evasion attacks against malware classifiers, and develop more robust and trustworthy solutions.
[143] arXiv:2405.04012 [pdf, ps, html, other]: Title: Latency and Energy Minimization in NOMA-Assisted MEC Network: A Federated Deep Reinforcement Learning Approach

Arian Ahmadi, Anders Høst-Madsen, Zixiang Xiong

Subjects: Systems and Control (eess.SY)

Multi-access edge computing (MEC) is seen as a vital component of forthcoming 6G wireless networks, aiming to support emerging applications that demand high service reliability and low latency. However, ensuring the ultra-reliable and low-latency performance of MEC networks poses a significant challenge due to uncertainties associated with wireless links, constraints imposed by communication and computing resources, and the dynamic nature of network traffic. Enabling ultra-reliable and low-latency MEC mandates efficient load balancing jointly with resource allocation. In this paper, we investigate the joint optimization problem of offloading decisions, computation and communication resource allocation to minimize the expected weighted sum of delivery latency and energy consumption in a non-orthogonal multiple access (NOMA)-assisted MEC network. Given the formulated problem is a mixed-integer non-linear programming (MINLP), a new multi-agent federated deep reinforcement learning (FDRL) solution based on double deep Q-network (DDQN) is developed to efficiently optimize the offloading strategies across the MEC network while accelerating the learning process of the Internet-of-Thing (IoT) devices. Simulation results show that the proposed FDRL scheme can effectively reduce the weighted sum of delivery latency and energy consumption of IoT devices in the MEC network and outperform the baseline approaches.
[144] arXiv:2405.04015 [pdf, ps, html, other]: Title: Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance Properties

S. Akshay, Krishnendu Chatterjee, Tobias Meggendorfer, Đorđe Žikelić

Comments: Extended version of a paper accepted at IJCAI 2024

Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques.
In this work, we consider the problems of certified policy (i.e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.
[145] arXiv:2405.04017 [pdf, ps, html, other]: Title: An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Zhifa Ke, Zaiwen Wen, Junyu Zhang

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $\tilde{\mathcal{O}}(\epsilon^{-1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{\mathcal{O}}(\epsilon^{-1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{\mathcal{O}}(\epsilon^{-2})$ complexity in the existing literature.
[146] arXiv:2405.04020 [pdf, ps, html, other]: Title: Metric Distortion of Line-up Elections: The Right Person for the Right Job

Christopher Jerrett, Yue Han, Elliot Anshelevich

Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS)

We provide mechanisms and new metric distortion bounds for line-up elections. In such elections, a set of $n$ voters, $k$ candidates, and $\ell$ positions are all located in a metric space. The goal is to choose a set of candidates and assign them to different positions, so as to minimize the total cost of the voters. The cost of each voter consists of the distances from itself to the chosen candidates (measuring how much the voter likes the chosen candidates, or how similar it is to them), as well as the distances from the candidates to the positions they are assigned to (measuring the fitness of the candidates for their positions). Our mechanisms, however, do not know the exact distances, and instead produce good outcomes while only using a smaller amount of information, resulting in small distortion.
We consider several different types of information: ordinal voter preferences, ordinal position preferences, and knowing the exact locations of candidates and positions, but not those of voters. In each of these cases, we provide constant distortion bounds, thus showing that only a small amount of information is enough to form outcomes close to optimum in line-up elections.
[147] arXiv:2405.04021 [pdf, ps, html, other]: Title: Robust and Reusable Fuzzy Extractors for Low-entropy Rate Randomness Sources

Somnath Panja, Shaoquan Jiang, Reihaneh Safavi-Naini

Subjects: Cryptography and Security (cs.CR)

Fuzzy extractors (FE) are cryptographic primitives that extract reliable cryptographic key from noisy real world random sources such as biometric sources. The FE generation algorithm takes a source sample, extracts a key and generates some helper data that will be used by the reproduction algorithm to recover the key. Reusability of FE guarantees that security holds when FE is used multiple times with the same source, and robustness of FE requires tampering with the helper data be detectable.
In this paper, we consider information theoretic FEs, define a strong notion of reusability, and propose strongly robust and reusable FEs (srrFE) that provides the strongest combined notion of reusability and robustness for FEs. We give two constructions, one for reusable FEs and one for srrFE with information theoretic (IT) security for structured sources. The constructions are for structured sources and use sample-then-lock approach. We discuss each construction and show their unique properties in relation to existing work.
Construction 2 is the first robust and reusable FE with IT-security without assuming random oracle. The robustness is achieved by using an IT-secure MAC with security against key-shift attack, which can be of independent interest.
[148] arXiv:2405.04025 [pdf, ps, html, other]: Title: Optimal Group Fair Classifiers from Linear Post-Processing

Ruicheng Xian, Han Zhao

Comments: Code is at this https URL

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

We propose a post-processing algorithm for fair classification that mitigates model bias under a unified family of group fairness criteria covering statistical parity, equal opportunity, and equalized odds, applicable to multi-class problems and both attribute-aware and attribute-blind settings. It achieves fairness by re-calibrating the output score of the given base model with a "fairness cost" -- a linear combination of the (predicted) group memberships. Our algorithm is based on a representation result showing that the optimal fair classifier can be expressed as a linear post-processing of the loss function and the group predictor, derived via using these as sufficient statistics to reformulate the fair classification problem as a linear program. The parameters of the post-processor are estimated by solving the empirical LP. Experiments on benchmark datasets show the efficiency and effectiveness of our algorithm at reducing disparity compared to existing algorithms, including in-processing, especially on larger problems.
[149] arXiv:2405.04028 [pdf, ps, html, other]: Title: Masked Graph Transformer for Large-Scale Recommendation

Huiyuan Chen, Zhe Xu, Chin-Chia Michael Yeh, Vivian Lai, Yan Zheng, Minghua Xu, Hanghang Tong

Subjects: Information Retrieval (cs.IR)

Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer.
[150] arXiv:2405.04029 [pdf, ps, html, other]: Title: Enabling Privacy-Preserving and Publicly Auditable Federated Learning

Huang Zeng (1), Anjia Yang (1), Jian Weng (1), Min-Rong Chen (2), Fengjun Xiao (3 and 4), Yi Liu (1), Ye Yao (4)

Comments: ICC 2024 - 2024 IEEE International Conference on Communications Conference Program

Subjects: Cryptography and Security (cs.CR)

Federated learning (FL) has attracted widespread attention because it supports the joint training of models by multiple participants without moving private dataset. However, there are still many security issues in FL that deserve discussion. In this paper, we consider three major issues: 1) how to ensure that the training process can be publicly audited by any third party; 2) how to avoid the influence of malicious participants on training; 3) how to ensure that private gradients and models are not leaked to third parties. Many solutions have been proposed to address these issues, while solving the above three problems simultaneously is seldom considered. In this paper, we propose a publicly auditable and privacy-preserving federated learning scheme that is resistant to malicious participants uploading gradients with wrong directions and enables anyone to audit and verify the correctness of the training process. In particular, we design a robust aggregation algorithm capable of detecting gradients with wrong directions from malicious participants. Then, we design a random vector generation algorithm and combine it with zero sharing and blockchain technologies to make the joint training process publicly auditable, meaning anyone can verify the correctness of the training. Finally, we conduct a series of experiments, and the experimental results show that the model generated by the protocol is comparable in accuracy to the original FL approach while keeping security advantages.
[151] arXiv:2405.04030 [pdf, ps, other]: Title: Uncovering implementable dormant pruning decisions from three different stakeholder perspectives

Deanna Flynn, Abhinav Jain, Heather Knight, Cristina G. Wilson, Cindy Grimm

Comments: 36 pages; 21 figures

Subjects: Robotics (cs.RO)

Dormant pruning, or the removal of unproductive portions of a tree while a tree is not actively growing, is an important orchard task to help maintain yield, requiring years to build expertise. Because of long training periods and an increasing labor shortage in agricultural jobs, pruning could benefit from robotic automation. However, to program robots to prune branches, we first need to understand how pruning decisions are made, and what variables in the environment (e.g., branch size and thickness) we need to capture. Working directly with three pruning stakeholders -- horticulturists, growers, and pruners -- we find that each group of human experts approaches pruning decision-making differently. To capture this knowledge, we present three studies and two extracted pruning protocols from field work conducted in Prosser, Washington in January 2022 and 2023. We interviewed six stakeholders (two in each group) and observed pruning across three cultivars -- Bing Cherries, Envy Apples, and Jazz Apples -- and two tree architectures -- Upright Fruiting Offshoot and V-Trellis. Leveraging participant interviews and video data, this analysis uses grounded coding to extract pruning terminology, discover horticultural contexts that influence pruning decisions, and find implementable pruning heuristics for autonomous systems. The results include a validated terminology set, which we offer for use by both pruning stakeholders and roboticists, to communicate general pruning concepts and heuristics. The results also highlight seven pruning heuristics utilizing this terminology set that would be relevant for use by future autonomous robot pruning systems, and characterize three discovered horticultural contexts (i.e., environmental management, crop-load management, and replacement wood) across all three cultivars.
[152] arXiv:2405.04032 [pdf, ps, html, other]: Title: Locally Differentially Private In-Context Learning

Chunyan Zheng, Keke Sun, Wenhao Zhao, Haibo Zhou, Lixin Jiang, Shaoyang Song, Chunlai Zhou

Comments: This paper was published at LREC-Coling 2024

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference attacks (MIA) and prompt leaking attacks. In order to deal with this problem, we treat LLMs as untrusted in privacy and propose a locally differentially private framework of in-context learning(LDP-ICL) in the settings where labels are sensitive. Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL for classification. Moreover, we apply LDP-ICL to the discrete distribution estimation problem. In the end, we perform several experiments to demonstrate our analysis results.
[153] arXiv:2405.04034 [pdf, ps, other]: Title: Differentially Private Post-Processing for Fair Regression

Ruicheng Xian, Qiaobo Li, Gautam Kamath, Han Zhao

Comments: ICML 2024. Code is at this https URL

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computers and Society (cs.CY)

This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs. It consists of three steps: first, the output distributions are estimated privately via histogram density estimation and the Laplace mechanism, then their Wasserstein barycenter is computed, and the optimal transports to the barycenter are used for post-processing to satisfy fairness. We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram, in which using less bins always favors fairness at the expense of error.
[154] arXiv:2405.04036 [pdf, ps, html, other]: Title: uTNT: Unikernels for Efficient and Flexible Internet Probing

Maxime Letemple, Gaulthier Gain, Sami Ben Mariem, Laurent Mathy, Benoit Donnet

Comments: 10 pages, 11 figures, IFIP extended-abstract

Subjects: Operating Systems (cs.OS); Networking and Internet Architecture (cs.NI)

The last twenty years have seen the development and popularity of network measurement infrastructures. Internet measurement platforms have become common and have demonstrated their relevance in Internet understanding and security observation. However, despite their popularity, those platforms lack of flexibility and reactivity, as they are usually used for longitudinal measurements. As a consequence, they may miss detecting events that are security or Internet-related. During the same period, operating systems have evolved to virtual machines (VMs) as self-contained units for running applications, with the recent rise of unikernels, ultra-lightweight VMs tailored for specific applications, eliminating the need for a host OS. In this paper, we advocate that measurement infrastructures could take advantage of unikernels to become more flexible and efficient. We propose uTNT, a proof-of-concept unikernel-based implementation of TNT, a traceroute extension able to reveal MPLS tunnels. This paper documents the full toolchain for porting TNT into a unikernel and evaluates uTNT performance with respect to more traditional approaches. The paper also discusses a use case in which uTNT could find a suitable usage. uTNT source code is publicly available on Gitlab.
[155] arXiv:2405.04038 [pdf, ps, html, other]: Title: Self-Replicating and Self-Employed Smart Contract on Ethereum Blockchain

Atsushi Masumori, Norihiro Maruyama, Takashi Ikegami

Comments: This paper was originally written in 2022 for submission to ALIFE 2022

Subjects: Neural and Evolutionary Computing (cs.NE)

Blockchain is the underlying technology for cryptocurrencies such as Bitcoin. Blockchain is a robust distributed ledger that uses consensus algorithms to approve transactions in a decentralized manner, making malicious tampering extremely difficult. Ethereum, one of the blockchains, can be seen as an unstoppable computer which shared by users around the world that can run Turing-complete programs. In order to run any program on Ethereum, Ether (currency on Ethereum) is required. In other words, Ether can be seen as a kind of energy in the Ethereum world. We developed self-replicating and self-employed agents who earn the energy by themselves to replicate them, on the Ethereum blockchain. The agents can issued their token and gain Ether each time the tokens are sold. When a certain amount of Ether is accumulated, the agent replicates itself and leaves offspring. The goal of this project is to implement artificial agents that lives for itself, not as a tool for humans, in the open cyber space connected to the real world.
[156] arXiv:2405.04039 [pdf, ps, html, other]: Title: Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations

Hassan Shakil, Zeydy Ortiz, Grant C. Forbes

Comments: 9 pages, 3 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this research, we uses the DistilBERT model to generate extractive summary and the T5 model to generate abstractive summaries. Also, we generate hybrid summaries by combining both DistilBERT and T5 models. Central to our research is the implementation of GPT-based refining process to minimize the common problem of hallucinations that happens in AI-generated summaries. We evaluate unrefined summaries and, after refining, we also assess refined summaries using a range of traditional and novel metrics, demonstrating marked improvements in the accuracy and reliability of the summaries. Results highlight significant improvements in reducing hallucinatory content, thereby increasing the factual integrity of the summaries.
[157] arXiv:2405.04041 [pdf, ps, html, other]: Title: Feature Map Convergence Evaluation for Functional Module

Ludan Zhang, Chaoyi Chen, Lei He, Keqiang Li

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneering in the issue, we propose an evaluation method based on feature map analysis to gauge the convergence of model, thereby assessing functional modules' training maturity. We construct a quantitative metric named as the Feature Map Convergence Score (FMCS) and develop Feature Map Convergence Evaluation Network (FMCE-Net) to measure and predict the convergence degree of models respectively. FMCE-Net achieves remarkable predictive accuracy for FMCS across multiple image classification experiments, validating the efficacy and robustness of the introduced approach. To the best of our knowledge, this is the first independent evaluation method for functional modules, offering a new paradigm for the training assessment towards perception models.
[158] arXiv:2405.04042 [pdf, ps, html, other]: Title: Space-time Reinforcement Network for Video Object Segmentation

Yadang Chen, Wentao Zhu, Zhi-Xin Yang, Enhua Wu

Comments: Accepted by ICME 2024. 6 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames. Despite these methods having superior performance, they suffer from two issues: 1) Challenging data can destroy the space-time coherence between adjacent video frames. 2) Pixel-level matching will lead to undesired mismatching caused by the noises or distractors. To address the aforementioned issues, we first propose to generate an auxiliary frame between adjacent frames, serving as an implicit short-temporal reference for the query one. Next, we learn a prototype for each video object and prototype-level matching can be implemented between the query and memory. The experiment demonstrated that our network outperforms the state-of-the-art method on the DAVIS 2017, achieving a J&F score of 86.4%, and attains a competitive result 85.0% on YouTube VOS 2018. In addition, our network exhibits a high inference speed of 32+ FPS.
[159] arXiv:2405.04044 [pdf, ps, html, other]: Title: DMOFC: Discrimination Metric-Optimized Feature Compression

Changsheng Gao, Yiheng Jiang, Li Li, Dong Liu, Feng Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Feature compression, as an important branch of video coding for machines (VCM), has attracted significant attention and exploration. However, the existing methods mainly focus on intra-feature similarity, such as the Mean Squared Error (MSE) between the reconstructed and original features, while neglecting the importance of inter-feature relationships. In this paper, we analyze the inter-feature relationships, focusing on feature discriminability in machine vision and underscoring its significance in feature compression. To maintain the feature discriminability of reconstructed features, we introduce a discrimination metric for feature compression. The discrimination metric is designed to ensure that the distance between features of the same category is smaller than the distance between features of different categories. Furthermore, we explore the relationship between the discrimination metric and the discriminability of the original features. Experimental results confirm the effectiveness of the proposed discrimination metric and reveal there exists a trade-off between the discrimination metric and the discriminability of the original features.
[160] arXiv:2405.04046 [pdf, ps, other]: Title: MBCT: A Monero-Based Covert Transmission Approach with On-chain Dynamic Session Key Negotiation

Zhenshuai Yue, Haoran Zhu, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić, Junchao Fan

Subjects: Cryptography and Security (cs.CR)

Traditional covert transmission (CT) approaches have been hindering CT application while blockchain technology offers new avenue. Current blockchain-based CT approaches require off-chain negotiation of critical information and often overlook the dynamic session keys updating, which increases the risk of message and key leakage. Additionally, in some approaches the covert transactions exhibit obvious characteristics that can be easily detected by third-parties. Moreover, most approaches do not address the issue of decreased reliability of message transmission in blockchain attack scenarios. Bitcoin- and Ethereum-based approaches also have the issue of transaction linkability, which can be tackled by Monero-based approaches because of the privacy protection mechanisms in Monero. However, Monero-based CT has the problem of sender repudiation. In this paper, we propose a novel Monero-Based CT approach (MBCT), which enables on-chain session key dynamically updating without off-chain negotiation. MBCT can assure non-repudiation of transmission participants, confidentiality of keys, reliability of message transmission and less observable characteristics. There are achieved by the three components in MBCT, namely, a sender authentication method, a dynamically on-chain session key updating method and a state feedback method. We implement MBCT in Monero-this http URL and the experiment results demonstrate its high embedding capacity of MBCT.
[161] arXiv:2405.04048 [pdf, ps, other]: Title: Philosophy of Cognitive Science in the Age of Deep Learning

Raphaël Millière

Comments: Forthcoming in WIREs Cognitive Science

Subjects: Computation and Language (cs.CL)

Deep learning has enabled major advances across most areas of artificial intelligence research. This remarkable progress extends beyond mere engineering achievements and holds significant relevance for the philosophy of cognitive science. Deep neural networks have made significant strides in overcoming the limitations of older connectionist models that once occupied the centre stage of philosophical debates about cognition. This development is directly relevant to long-standing theoretical debates in the philosophy of cognitive science. Furthermore, ongoing methodological challenges related to the comparative evaluation of deep neural networks stand to benefit greatly from interdisciplinary collaboration with philosophy and cognitive science. The time is ripe for philosophers to explore foundational issues related to deep learning and cognition; this perspective paper surveys key areas where their contributions can be especially fruitful.
[162] arXiv:2405.04049 [pdf, ps, html, other]: Title: Watermarking Neuromorphic Brains: Intellectual Property Protection in Spiking Neural Networks

Hamed Poursiami, Ihsen Alouani, Maryam Parsa

Comments: 7 pages, 7 figures

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

As spiking neural networks (SNNs) gain traction in deploying neuromorphic computing solutions, protecting their intellectual property (IP) has become crucial. Without adequate safeguards, proprietary SNN architectures are at risk of theft, replication, or misuse, which could lead to significant financial losses for the owners. While IP protection techniques have been extensively explored for artificial neural networks (ANNs), their applicability and effectiveness for the unique characteristics of SNNs remain largely unexplored. In this work, we pioneer an investigation into adapting two prominent watermarking approaches, namely, fingerprint-based and backdoor-based mechanisms to secure proprietary SNN architectures. We conduct thorough experiments to evaluate the impact on fidelity, resilience against overwrite threats, and resistance to compression attacks when applying these watermarking techniques to SNNs, drawing comparisons with their ANN counterparts. This study lays the groundwork for developing neuromorphic-aware IP protection strategies tailored to the distinctive dynamics of SNNs.
[163] arXiv:2405.04050 [pdf, ps, html, other]: Title: Learning Linear Block Error Correction Codes

Yoni Choukroun, Lior Wolf

Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)

Error correction codes are a crucial part of the physical communication layer, ensuring the reliable transfer of data over noisy channels. The design of optimal linear block codes capable of being efficiently decoded is of major concern, especially for short block lengths. While neural decoders have recently demonstrated their advantage over classical decoding techniques, the neural design of the codes remains a challenge. In this work, we propose for the first time a unified encoder-decoder training of binary linear block codes. To this end, we adapt the coding setting to support efficient and differentiable training of the code for end-to-end optimization over the order two Galois field. We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient. Our results show that (i) the proposed decoder outperforms existing neural decoding on conventional codes, (ii) the suggested framework generates codes that outperform the {analogous} conventional codes, and (iii) the codes we developed not only excel with our decoder but also show enhanced performance with traditional decoding techniques.
[164] arXiv:2405.04051 [pdf, ps, html, other]: Title: On the quantization goodness of polar lattices

Ling Liu, Shanxiang Lyu, Cong Ling, Baoming Bai

Comments: 12 pages, 5 figures, submitted to IEEE for possible publication

Subjects: Information Theory (cs.IT)

In this work, we prove that polar lattices, when tailored for lossy compression, are quantization-good in the sense that their normalized second moments approach $\frac{1}{2\pi e}$ as the dimension of lattices increases. It has been predicted by Zamir et al. \cite{ZamirQZ96} that the Entropy Coded Dithered Quantization (ECDQ) system using quantization-good lattices can achieve the rate-distortion bound of i.i.d. Gaussian sources. In our previous work \cite{LingQZ}, we established that polar lattices are indeed capable of attaining the same objective. It is reasonable to conjecture that polar lattices also demonstrate quantization goodness in the context of lossy compression. This study confirms this hypothesis.
[165] arXiv:2405.04052 [pdf, ps, html, other]: Title: Minimizing the Minimizers via Alphabet Reordering

Hilde Verbeek, Lorraine A.K. Ayad, Grigorios Loukides, Solon P. Pissis

Comments: Extended version of a paper accepted at CPM 2024

Subjects: Data Structures and Algorithms (cs.DS)

Minimizers sampling is one of the most widely-used mechanisms for sampling strings [Roberts et al., Bioinformatics 2004]. Let $S=S[1]\ldots S[n]$ be a string over a totally ordered alphabet $\Sigma$. Further let $w\geq 2$ and $k\geq 1$ be two integers. The minimizer of $S[i\mathinner{.\,.} i+w+k-2]$ is the smallest position in $[i,i+w-1]$ where the lexicographically smallest length-$k$ substring of $S[i\mathinner{.\,.} i+w+k-2]$ starts. The set of minimizers over all $i\in[1,n-w-k+2]$ is the set $\mathcal{M}_{w,k}(S)$ of the minimizers of $S$. We consider the following basic problem: Given $S$, $w$, and $k$, can we efficiently compute a total order on $\Sigma$ that minimizes $|\mathcal{M}_{w,k}(S)|$? We show that this is unlikely by proving that the problem is NP-hard for any $w\geq 2$ and $k\geq 1$. Our result provides theoretical justification as to why there exist no exact algorithms for minimizing the minimizers samples, while there exists a plethora of heuristics for the same purpose.
[166] arXiv:2405.04053 [pdf, ps, other]: Title: Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

Hassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen, Zeydy Ortiz, Mamoun T. Mardini

Comments: 10 pages, 5 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis for comparative analysis of transformer-based models in natural language processing tasks.
[167] arXiv:2405.04054 [pdf, ps, other]: Title: What Impacts the Quality of the User Answers when Asked about the Current Context?

Ivano Bison, Haonan Zhao, Fausto Giunchiglia

Comments: 25 pages, 16 figures, under review by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Subjects: Human-Computer Interaction (cs.HC)

Sensor data provide an objective view of reality but fail to capture the subjective motivations behind an individual's behavior. This latter information is crucial for learning about the various dimensions of the personal context, thus increasing predictability. The main limitation is the human input, which is often not of the quality that is needed. The work so far has focused on the usually high number of missing answers. The focus of this paper is on \textit{the number of mistakes} made when answering questions. Three are the main contributions of this paper. First, we show that the user's reaction time, i.e., the time before starting to respond, is the main cause of a low answer quality, where its effects are both direct and indirect, the latter relating to its impact on the completion time, i.e., the time taken to compile the response. Second, we identify the specific exogenous (e.g., the situational or temporal context) and endogenous (e.g., mood, personality traits) factors which have an influence on the reaction time, as well as on the completion time. Third, we show how reaction and completion time compose their effects on the answer quality. The paper concludes with a set of actionable recommendations.
[168] arXiv:2405.04061 [pdf, ps, other]: Title: Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

Mingfei Lu, Shujian Yu, Robert Jenssen, Badong Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Divergence measures play a central role in machine learning and become increasingly essential in deep learning. However, valid and computationally efficient divergence measures for multiple (more than two) distributions are scarcely investigated. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both unavoidable and essential. Examples include clustering, multi-source domain adaptation or generalization, and multi-view learning, among others. Although calculating the mean of pairwise distances between any two distributions serves as a common way to quantify the total divergence among multiple distributions, it is crucial to acknowledge that this approach is not straightforward and requires significant computational resources. In this study, we introduce a new divergence measure for multiple distributions named the generalized Cauchy-Schwarz divergence (GCSD), which is inspired by the classic Cauchy-Schwarz divergence. Additionally, we provide a closed-form sample estimator based on kernel density estimation, making it convenient and straightforward to use in various machine-learning applications. Finally, we apply the proposed GCSD to two challenging machine learning tasks, namely deep learning-based clustering and the problem of multi-source domain adaptation. The experimental results showcase the impressive performance of GCSD in both tasks, highlighting its potential application in machine-learning areas that involve quantifying multiple distributions.
[169] arXiv:2405.04063 [pdf, ps, html, other]: Title: xNose: A Test Smell Detector for C#

Partha P. Paul, Md Tonoy Akanda, M. Raihan Ullah, Dipto Mondal, Nazia S. Chowdhury, Fazle M. Tawsif

Comments: Full report of our ICSE'24 poster

Subjects: Software Engineering (cs.SE)

Test smells, similar to code smells, can negatively impact both the test code and the production code being tested. Despite extensive research on test smells in languages like Java, Scala, and Python, automated tools for detecting test smells in C# are lacking. This paper aims to bridge this gap by extending the study of test smells to C#, and developing a tool (xNose) to identify test smells in this language and analyze their distribution across projects. We identified 16 test smells from prior studies that were language-independent and had equivalent features in C# and evaluated xNose, achieving a precision score of 96.97% and a recall score of 96.03%. In addition, we conducted an empirical study to determine the prevalence of test smells in xUnit-based C# projects. This analysis sheds light on the frequency and distribution of test smells, deepening our understanding of their impact on C# projects and test suites. The development of xNose and our analysis of test smells in C# code aim to assist developers in maintaining code quality by addressing potential issues early in the development process.
[170] arXiv:2405.04064 [pdf, ps, html, other]: Title: MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation

Yanli Yuan, Bingbing Wang, Chuan Zhang, Jingyi Xu, Ximeng Liu, Liehuang Zhu

Subjects: Artificial Intelligence (cs.AI)

Segmentation of organs of interest in medical CT images is beneficial for diagnosis of diseases. Though recent methods based on Fully Convolutional Neural Networks (F-CNNs) have shown success in many segmentation tasks, fusing features from images with different scales is still a challenge: (1) Due to the lack of spatial awareness, F-CNNs share the same weights at different spatial locations. (2) F-CNNs can only obtain surrounding information through local receptive fields. To address the above challenge, we propose a new segmentation framework based on attention mechanisms, named MFA-Net (Multi-Scale Feature Fusion Attention Network). The proposed framework can learn more meaningful feature maps among multiple scales and result in more accurate automatic segmentation. We compare our proposed MFA-Net with SOTA methods on two 2D liver CT datasets. The experimental results show that our MFA-Net produces more precise segmentation on images with different scales.
[171] arXiv:2405.04065 [pdf, ps, html, other]: Title: FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu

Comments: 14 pages

Subjects: Computation and Language (cs.CL)

Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work using utilizing retrieved content by simply prepending retrieved contents to the input poses a high runtime issue, which degrades the inference efficiency of the LLMs because they fail to use the Key-Value (KV) cache efficiently. In this paper, we propose \textsc{FlashBack}, a modular RALM designed to improve the inference efficiency of RALM with appending context pattern while maintaining decent performance after specific fine-tuning without heavily destruct the knowledge integrity of the LLM. \textsc{FlashBack} appends retrieved documents at the end of the context for efficiently utilizing the KV cache instead of prepending them. Our experiment shows that the inference speed of \textsc{FlashBack} is up to $4\times$ faster than the prepending method on a 7B LLM (Llama 2). Via bypassing unnecessary re-computation, it demonstrates an advancement by achieving significantly faster inference speed, and this heightened efficiency will substantially reduce inferential cost. Our code will be publicly available.
[172] arXiv:2405.04066 [pdf, ps, html, other]: Title: Characterizing Regional Importance in Cities with Human Mobility Motifs in Metro Networks

Shuyang Shi, Ding Lyu, Lin Wang, Xiaofan Wang, Guanrong Chen

Subjects: Social and Information Networks (cs.SI); Systems and Control (eess.SY)

Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of people, as fundamental units of human mobility networks. In this paper, we propose two network construction frameworks at the level of mobility motifs in characterizing regional importance in cities. Firstly, we enhance the structural dependencies within mobility motifs and proceed to construct mobility networks based on the enhanced mobility motifs. Secondly, taking inspiration from PageRank, we speculate that people would allocate values of importance to destinations according to their trip intentions. A motif-wise network construction framework is proposed based on the established mechanism. Leveraging large-scale metro data across cities, we construct three types of human mobility networks and characterize the regional importance by node importance indicators. Our comparison results suggest that the motif-based mobility network outperforms the classic mobility network, thus highlighting the efficacy of the introduced human mobility motifs. Finally, we demonstrate that the performance in characterizing the regional importance is significantly improved by our motif-wise framework.
[173] arXiv:2405.04068 [pdf, ps, other]: Title: An Improved Reversible Data Hiding Algorithm Based on Reconstructed Mapping for PVO-k

Yusen Zhang, Haoyun Xu, Jingwen Li

Subjects: Cryptography and Security (cs.CR)

Reversible Data Hiding (RDH) is a practical and efficient technique for information encryption. Among its methods, the Pixel-Value Ordering (PVO) algorithm and its variants primarily modify prediction errors to embed information. However, both the classic PVO and its improved versions, such as IPVO and PVO-k, share a common limitation: their maximum data embedding capacity for a given grayscale image is relatively low. This poses a challenge when large amounts of data need to be embedded into an image. In response to these issues, this paper proposes an improved design targeting the PVO-k algorithm. We have reconstructed the mapping scheme of the PVO-k algorithm to maximize the number of pixels that can embed encrypted information. Experimental validations show that our proposed scheme significantly surpasses previous algorithms in terms of the maximum data embedding capacity. For instance, when embedding information into a grayscale image of an airplane, our method's capacity exceeds that of PVO-k by 11,207 bits, PVO by 8,004 bits, and IPVO by 4,562 bits. The results demonstrate that our algorithm holds substantial advantages over existing methods and introduces innovative mapping ideas, laying a foundation for future research in reversible data hiding in images.
[174] arXiv:2405.04071 [pdf, ps, html, other]: Title: IMU-Aided Event-based Stereo Visual Odometry

Junkai Niu, Sheng Zhong, Yi Zhou

Comments: 10 pages, 7 figures, ICRA

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Direct methods for event-based visual odometry solve the mapping and camera pose tracking sub-problems by establishing implicit data association in a way that the generative model of events is exploited. The main bottlenecks faced by state-of-the-art work in this field include the high computational complexity of mapping and the limited accuracy of tracking. In this paper, we improve our previous direct pipeline \textit{Event-based Stereo Visual Odometry} in terms of accuracy and efficiency. To speed up the mapping operation, we propose an efficient strategy of edge-pixel sampling according to the local dynamics of events. The mapping performance in terms of completeness and local smoothness is also improved by combining the temporal stereo results and the static stereo results. To circumvent the degeneracy issue of camera pose tracking in recovering the yaw component of general 6-DoF motion, we introduce as a prior the gyroscope measurements via pre-integration. Experiments on publicly available datasets justify our improvement. We release our pipeline as an open-source software for future research in this field.
[175] arXiv:2405.04078 [pdf, ps, html, other]: Title: WISER: Weak supervISion and supErvised Representation learning to improve drug response prediction in cancer

Kumar Shubham, Aishwarya Jayagopal, Syed Mohammed Danish, Prathosh AP, Vaibhav Rajan

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (`cell lines') is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between cell lines and humans arise due to biological and environmental differences. Moreover, while genomic profiles of many cancer patients are readily available, the scarcity of corresponding drug response data limits the ability to train machine learning models that can predict drug response in patients effectively. Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Introducing supervision in both stages is challenging due to heterogeneous patient response to drugs and limited drug response data. This paper addresses these challenges through a novel representation learning method in the first phase and weak supervision in the second. Experimental results on real patient data demonstrate the efficacy of our method (WISER) over state-of-the-art alternatives on predicting personalized drug response.
[176] arXiv:2405.04079 [pdf, ps, html, other]: Title: Leveraging swarm capabilities to assist other systems

Miquel Kegeleirs, David Garzón Ramos, Guillermo Legarda Herranz, Ilyes Gharbi, Jeanne Szpirer, Ken Hasselmann, Lorenzo Garattoni, Gianpiero Francesca, Mauro Birattari

Comments: Presented at the "Breaking swarm stereotypes" workshop at ICRA 2024

Subjects: Robotics (cs.RO)

Most studies in swarm robotics treat the swarm as an isolated system of interest. We argue that the prevailing view of swarms as self-sufficient, independent systems limits the scope of potential applications for swarm robotics. A robot swarm could act as a support in an heterogeneous system comprising other robots and/or human operators, in particular by quickly providing access to a large amount of data acquired in large unknown environments. Tasks such as target identification & tracking, scouting, or monitoring/surveillance could benefit from this approach.
[177] arXiv:2405.04080 [pdf, ps, html, other]: Title: Derisking of subsynchronous torsional oscillations in power systems with conventional and inverter-based generation

Nicolas Bonafé, Julian Freytes, Hani Saad

Subjects: Systems and Control (eess.SY)

This article proposes an application of a derisking methodology of subsynchronous torsional oscillations considering a realistic use case. The main objective is to summarize and draft a synthetic paper clarifying the complete methodology highlighting the main information needed step-by-step. For exemplification, a real model from a decommissioned oil power plant is adopted, where a fictitious high voltage direct current power link is connected. In this article, stress is laid on details of the application of the derisking methods: the unit interaction factor and the complex torque coefficients method. Then, the different steps to obtain results are explicitly explained. Moreover, the design and tuning process of supplementary subsynchronous damping controller is discussed. This mitigation section uses minimal information to correctly damp the unstable oscillations, as one would expect from industrial projects where the data sharing may be limited. Finally, the resources needed to perform each step of the study were summarized.
[178] arXiv:2405.04081 [pdf, ps, other]: Title: Counterfactual and Semifactual Explanations in Abstract Argumentation: Formal Foundations, Complexity and Computation

Gianvincenzo Alfano, Sergio Greco, Francesco Parisi, Irina Trubitsyna

Subjects: Artificial Intelligence (cs.AI)

Explainable Artificial Intelligence and Formal Argumentation have received significant attention in recent years. Argumentation-based systems often lack explainability while supporting decision-making processes. Counterfactual and semifactual explanations are interpretability techniques that provide insights into the outcome of a model by generating alternative hypothetical instances. While there has been important work on counterfactual and semifactual explanations for Machine Learning models, less attention has been devoted to these kinds of problems in argumentation. In this paper, we explore counterfactual and semifactual reasoning in abstract Argumentation Framework. We investigate the computational complexity of counterfactual- and semifactual-based reasoning problems, showing that they are generally harder than classical argumentation problems such as credulous and skeptical acceptance. Finally, we show that counterfactual and semifactual queries can be encoded in weak-constrained Argumentation Framework, and provide a computational strategy through ASP solvers.
[179] arXiv:2405.04082 [pdf, ps, html, other]: Title: Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

Teng Xue, Amirreza Razmjoo, Suhan Shetty, Sylvain Calinon

Subjects: Robotics (cs.RO)

Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of Tensor Train to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art Reinforcement Learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.
[180] arXiv:2405.04086 [pdf, ps, other]: Title: Optimizing Language Model's Reasoning Abilities with Weak Supervision

Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

Subjects: Computation and Language (cs.CL)

While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities with minimal human supervision. In this work, we introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of the model using a small collection of annotated questions. Then it iteratively improves LLMs by learning from the differences in responses from the SFT and unfinetuned models on unlabeled questions. Our approach provides an efficient approach without relying heavily on extensive human-annotated explanations. However, current reasoning benchmarks typically only include golden-reference answers or rationales. Therefore, we present \textsc{PuzzleBen}, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales across various domains, such as brainteasers, puzzles, riddles, parajumbles, and critical reasoning tasks. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities. Our experiments underscore the significance of \textsc{PuzzleBen}, as well as the effectiveness of our methodology as a promising direction in future endeavors. Our dataset and code will be published soon on \texttt{Anonymity Link}.
[181] arXiv:2405.04091 [pdf, ps, html, other]: Title: Randomized iterative methods for generalized absolute value equations: Solvability and error bounds

Jiaxin Xie, Huoduo Qi, Deren Han

Subjects: Numerical Analysis (math.NA)

Randomized iterative methods, such as the Kaczmarz method and its variants, have gained growing attention due to their simplicity and efficiency in solving large-scale linear systems. Meanwhile, absolute value equations (AVE) have attracted increasing interest due to their connection with the linear complementarity problem. In this paper, we investigate the application of randomized iterative methods to generalized AVE (GAVE). Our approach differs from most existing works in that we tackle GAVE with non-square coefficient matrices. We establish more comprehensive sufficient and necessary conditions for characterizing the solvability of GAVE and propose precise error bound conditions. Furthermore, we introduce a flexible and efficient randomized iterative algorithmic framework for solving GAVE, which employs sampling matrices drawn from user-specified distributions. This framework is capable of encompassing many well-known methods, including the Picard iteration method and the randomized Kaczmarz method. Leveraging our findings on solvability and error bounds, we establish both almost sure convergence and linear convergence rates for this versatile algorithmic framework. Finally, we present numerical examples to illustrate the advantages of the new algorithms.
[182] arXiv:2405.04093 [pdf, ps, html, other]: Title: DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects

Da Fu, Mingfei Rong, Eun-Hu Kim, Hao Huang, Witold Pedrycz

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate classification of fine-grained images remains a challenge in backbones based on convolutional operations or self-attention mechanisms. This study proposes novel dual-current neural networks (DCNN), which combine the advantages of convolutional operations and self-attention mechanisms to improve the accuracy of fine-grained image classification. The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) keeping the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features. Experimental results demonstrated that using DCNN as the backbone network for classifying certain fine-grained benchmark datasets achieved performance advantage improvements of 13.5--19.5% and 2.2--12.9%, respectively, compared to other advanced convolution or attention-based fine-grained backbones.
[183] arXiv:2405.04095 [pdf, ps, html, other]: Title: Going Proactive and Explanatory Against Malware Concept Drift

Yiling He, Junchi Lei, Zhan Qin, Kui Ren

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive understanding of malware concepts and provide limited guidance for effective drift adaptation, leading to unstable detection performance and high human labeling costs.
To address these limitations, we introduce DREAM, a novel system designed to surpass the capabilities of existing drift detectors and to establish an explanatory drift adaptation process. DREAM enhances drift detection through model sensitivity and data autonomy. The detector, trained in a semi-supervised approach, proactively captures malware behavior concepts through classifier feedback. During testing, it utilizes samples generated by the detector itself, eliminating reliance on extensive training data. For drift adaptation, DREAM enlarges human intervention, enabling revisions of malware labels and concept explanations embedded within the detector's latent space. To ensure a comprehensive response to concept drift, it facilitates a coordinated update process for both the classifier and the detector. Our evaluation shows that DREAM can effectively improve the drift detection accuracy and reduce the expert analysis effort in adaptation across different malware datasets and classifiers.
[184] arXiv:2405.04097 [pdf, ps, html, other]: Title: Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Multimedia (cs.MM)

The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.
[185] arXiv:2405.04098 [pdf, ps, other]: Title: Binarized Simplicial Convolutional Neural Networks

Yi Yan, Ercan E. Kuruoglu

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on simplicial complexes named Binarized Simplicial Convolutional Neural Networks (Bi-SCNN) based on the combination of simplicial convolution with a binary-sign forward propagation strategy. The usage of the Hodge Laplacian on a binary-sign forward propagation enables Bi-SCNN to efficiently and effectively represent simplicial features that have higher-order structures than traditional graph node representations. Compared to the previous Simplicial Convolutional Neural Networks, the reduced model complexity of Bi-SCNN shortens the execution time without sacrificing the prediction performance and is less prone to the over-smoothing effect. Experimenting with real-world citation and ocean-drifter data confirmed that our proposed Bi-SCNN is efficient and accurate.
[186] arXiv:2405.04099 [pdf, ps, html, other]: Title: Effect of realistic oscillator phase noise on the performance of cell-free networks

Igor Zhilin, Evgenii Vinogradov

Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

To keep supporting 6G requirements, the radio access infrastructure will increasingly densify. Cell-free (CF) networks offer extreme flexibility by coherently serving users with multiple Access points (APs). This paradigm requires precise and stable phase synchronization. In this article, we adapt the standardized 5G NR setup (subcarrier spacing, OFDM symbol duration and allocation) to investigate the effect of Phase Noise (PN) on the simulated performance of scalable CF networks. In contrast to the prior literature relying on the simplified model of a free-running oscillator with the Wiener process, we deploy a realistic hardware-inspired phase noise model reproducing the Local Oscillator (LO) phase drift. Our results demonstrate that even affordable LOs offer sufficient stability to ensure negligible loss of uplink Spectral Efficiency (SE) on the time scale of the standardized 5G Transmission Time Interval of 1 ms. This study substantiates the feasibility of CF networks based on 5G standards.
[187] arXiv:2405.04100 [pdf, ps, html, other]: Title: ESP: Extro-Spective Prediction for Long-term Behavior Reasoning in Emergency Scenarios

Dingrui Wang, Zheyuan Lai, Yuda Li, Yi Wu, Yuexin Ma, Johannes Betz, Ruigang Yang, Wei Li

Comments: Accepted by ICRA 2024 as Oral Presentation

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Emergent-scene safety is the key milestone for fully autonomous driving, and reliable on-time prediction is essential to maintain safety in emergency scenarios. However, these emergency scenarios are long-tailed and hard to collect, which restricts the system from getting reliable predictions. In this paper, we build a new dataset, which aims at the long-term prediction with the inconspicuous state variation in history for the emergency event, named the Extro-Spective Prediction (ESP) problem. Based on the proposed dataset, a flexible feature encoder for ESP is introduced to various prediction methods as a seamless plug-in, and its consistent performance improvement underscores its efficacy. Furthermore, a new metric named clamped temporal error (CTE) is proposed to give a more comprehensive evaluation of prediction performance, especially in time-sensitive emergency events of subseconds. Interestingly, as our ESP features can be described in human-readable language naturally, the application of integrating into ChatGPT also shows huge potential. The ESP-dataset and all benchmarks are released at this https URL.
[188] arXiv:2405.04101 [pdf, ps, html, other]: Title: Continual Learning in the Presence of Repetition

Hamed Hemati, Lorenzo Pellegrini, Xiaotian Duan, Zixuan Zhao, Fangfang Xia, Marc Masana, Benedikt Tscheschner, Eduardo Veas, Yuxiang Zheng, Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang, Vincenzo Lomonaco, Gido M. van de Ven

Comments: Preprint; Challenge Report of the 4th Workshop on Continual Learning in Computer Vision at CVPR

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the strategy, repetition in the data stream naturally stems from the environment. This report provides a summary of the CLVision challenge at CVPR 2023, which focused on the topic of repetition in class-incremental learning. The report initially outlines the challenge objective and then describes three solutions proposed by finalist teams that aim to effectively exploit the repetition in the stream to learn continually. The experimental results from the challenge highlight the effectiveness of ensemble-based solutions that employ multiple versions of similar modules, each trained on different but overlapping subsets of classes. This report underscores the transformative potential of taking a different perspective in CL by employing repetition in the data stream to foster innovative strategy design.
[189] arXiv:2405.04102 [pdf, ps, html, other]: Title: Analysis of Markovian Arrivals and Service with Applications to Intermittent Overload

Isaac Grosof, Yige Hong, Mor Harchol-Balter

Comments: 27 pages

Subjects: Performance (cs.PF); Probability (math.PR)

Almost all queueing analysis assumes i.i.d. arrivals and service. In reality, arrival and service rates fluctuate over time. In particular, it is common for real systems to intermittently experience overload, where the arrival rate temporarily exceeds the service rate, which an i.i.d. model cannot capture. We consider the MAMS system, where the arrival and service rates each vary according to an arbitrary finite-state Markov chain, allowing intermittent overload to be modeled.
We derive the first explicit characterization of mean queue length in the MAMS system, with explicit bounds for all arrival and service chains at all loads. Our bounds are tight in heavy traffic. We prove even stronger bounds for the important special case of two-level arrivals with intermittent overload.
Our key contribution is an extension to the drift method, based on the novel concepts of relative arrivals and relative completions. These quantities allow us to tractably capture the transient correlational effect of the arrival and service processes on the mean queue length.
[190] arXiv:2405.04103 [pdf, ps, html, other]: Title: COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for 3D Retrieval

Hao Wu, Ruochong LI, Hao Wang, Hui Xiong

Comments: Accepted by ICME 2024 oral

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we investigate an open research task of cross-modal retrieval between 3D shapes and textual descriptions. Previous approaches mainly rely on point cloud encoders for feature extraction, which may ignore key inherent features of 3D shapes, including depth, spatial hierarchy, geometric continuity, etc. To address this issue, we propose COM3D, making the first attempt to exploit the cross-view correspondence and cross-modal mining to enhance the retrieval performance. Notably, we augment the 3D features through a scene representation transformer, to generate cross-view correspondence features of 3D shapes, which enrich the inherent features and enhance their compatibility with text matching. Furthermore, we propose to optimize the cross-modal matching process based on the semi-hard negative example mining method, in an attempt to improve the learning efficiency. Extensive quantitative and qualitative experiments demonstrate the superiority of our proposed COM3D, achieving state-of-the-art results on the Text2Shape dataset.
[191] arXiv:2405.04108 [pdf, ps, other]: Title: A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model

Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Recent booming development of Generative Artificial Intelligence (GenAI) has facilitated an emerging model commercialization for the purpose of reinforcement on model performance, such as licensing or trading Deep Neural Network (DNN) models. However, DNN model trading may trigger concerns of the unauthorized replications or misuses over the model, so that the benefit of the model ownership will be violated. Model identity auditing is a challenging issue in protecting intellectual property of DNN models and verifying the integrity and ownership of models for guaranteeing trusts in transactions is one of the critical obstacles. In this paper, we focus on the above issue and propose a novel Accumulator-enabled Auditing for Distributed Identity of DNN Model (A2-DIDM) that utilizes blockchain and zero-knowledge techniques to protect data and function privacy while ensuring the lightweight on-chain ownership verification. The proposed model presents a scheme of identity records via configuring model weight checkpoints with corresponding zero-knowledge proofs, which incorporates predicates to capture incremental state changes in model weight checkpoints. Our scheme ensures both computational integrity of DNN training process and programmability, so that the uniqueness of the weight checkpoint sequence in a DNN model is preserved, ensuring the correctness of the model identity auditing. In addition, A2-DIDM also addresses privacy protections in distributed identity via a proposed method of accumulators. We systematically analyze the security and robustness of our proposed model and further evaluate the effectiveness and usability of auditing DNN model identities.
[192] arXiv:2405.04109 [pdf, ps, html, other]: Title: The Malware as a Service ecosystem

Constantinos Patsakis, David Arroyo, Fran Casino

Comments: To appear in D Gritzalis, KKR Choo, C Patsakis, "Malware - Handbook of Prevention and Detection", Springer (Advances in Information Security)

Subjects: Cryptography and Security (cs.CR)

The goal of this chapter is to illuminate the operational frameworks, key actors, and significant cybersecurity implications of the Malware as a Service (MaaS) ecosystem. Highlighting the transformation of malware proliferation into a service-oriented model, the chapter discusses how MaaS democratises access to sophisticated cyberattack capabilities, enabling even those with minimal technical knowledge to execute catastrophic cyberattacks. The discussion extends to the roles within the MaaS ecosystem, including malware developers, affiliates, initial access brokers, and the essential infrastructure providers that support these nefarious activities. The study emphasises the profound challenges MaaS poses to traditional cybersecurity defences, rendered ineffective against the constantly evolving and highly adaptable threats generated by MaaS platforms. With the increase in malware sophistication, there is a parallel call for a paradigm shift in defensive strategies, advocating for dynamic analysis, behavioural detection, and the integration of AI and machine learning techniques. By exploring the intricacies of the MaaS ecosystem, including the economic motivations driving its growth and the blurred lines between legitimate service models and cyber crime, the chapter presents a comprehensive overview intended to foster a deeper understanding among researchers and cybersecurity professionals. The ultimate goal is to aid in developing more effective strategies for combating the spread of commoditised malware threats and safeguarding against the increasing accessibility and scalability of cyberattacks facilitated by the MaaS model.
[193] arXiv:2405.04111 [pdf, ps, html, other]: Title: Adaptive Least Mean pth Power Graph Neural Networks

Changran Peng, Yi Yan, Ercan E. Kuruoglu

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage of adaptive filtering in handling noise and missing observations as well as the online update capability. The incorporated graph neural network within the LMP-GNN can train and update filter parameters online instead of predefined filter parameters in previous methods, outputting more accurate prediction results. The adaptive update scheme of the LMP-GNN follows the solution of a $l_p$-norm optimization, rooting to the minimum dispersion criterion, and yields robust estimation results for time-varying graph signals under impulsive noise. A special case of LMP-GNN named the Sign-GNN is also provided and analyzed, Experiment results on two real-world datasets of temperature graph and traffic graph under four different noise distributions prove the effectiveness and robustness of our proposed LMP-GNN.
[194] arXiv:2405.04114 [pdf, ps, other]: Title: Acceleration Algorithms in GNNs: A Survey

Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

Comments: 9 pages,3 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the research community. In this paper, we present a systematic review of acceleration algorithms in GNNs, which can be categorized into three main topics based on their purpose: training acceleration, inference acceleration, and execution acceleration. Specifically, we summarize and categorize the existing approaches for each main topic, and provide detailed characterizations of the approaches within each category. Additionally, we review several libraries related to acceleration algorithms in GNNs and discuss our Scalable Graph Learning (SGL) library. Finally, we propose promising directions for future research. A complete summary is presented in our GitHub repository: this https URL.
[195] arXiv:2405.04115 [pdf, ps, html, other]: Title: A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning

Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, Yaxin Liu

Comments: Accepted to CVPR 2024

Subjects: Cryptography and Security (cs.CR)

Split Learning (SL) is a distributed learning framework renowned for its privacy-preserving features and minimal computational requirements. Previous research consistently highlights the potential privacy breaches in SL systems by server adversaries reconstructing training data. However, these studies often rely on strong assumptions or compromise system utility to enhance attack performance. This paper introduces a new semi-honest Data Reconstruction Attack on SL, named Feature-Oriented Reconstruction Attack (FORA). In contrast to prior works, FORA relies on limited prior knowledge, specifically that the server utilizes auxiliary samples from the public without knowing any client's private information. This allows FORA to conduct the attack stealthily and achieve robust performance. The key vulnerability exploited by FORA is the revelation of the model representation preference in the smashed data output by victim client. FORA constructs a substitute client through feature-level transfer learning, aiming to closely mimic the victim client's representation preference. Leveraging this substitute client, the server trains the attack model to effectively reconstruct private data. Extensive experiments showcase FORA's superior performance compared to state-of-the-art methods. Furthermore, the paper systematically evaluates the proposed method's applicability across diverse settings and advanced defense strategies.
[196] arXiv:2405.04116 [pdf, ps, other]: Title: Semi-implicit Lagrangian Voronoi Approximation for the incompressible Navier-Stokes equations

Ondřej Kincl, Ilya Peshkov, Walter Boscheri

Comments: 22 pages, 13 figures

Subjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)

We introduce Semi-Implicit Lagrangian Voronoi Approximation (SILVA), a novel numerical method for the solution of the incompressible Euler and Navier-Stokes equations, which combines the efficiency of semi-implicit time marching schemes with the robustness of time-dependent Voronoi tessellations. In SILVA, the numerical solution is stored at particles, which move with the fluid velocity and also play the role of the generators of the computational mesh. The Voronoi mesh is rapidly regenerated at each time step, allowing large deformations with topology changes. As opposed to the reconnection-based Arbitrary-Lagrangian-Eulerian schemes, we need no remapping stage. A semi-implicit scheme is devised in the context of moving Voronoi meshes to project the velocity field onto a divergence-free manifold. We validate SILVA by illustrative benchmarks, including viscous, inviscid, and multi-phase flows. Compared to its closest competitor, the Incompressible Smoothed Particle Hydrodynamics (ISPH) method, SILVA offers a sparser stiffness matrix and facilitates the implementation of no-slip and free-slip boundary conditions.
[197] arXiv:2405.04118 [pdf, ps, other]: Title: Policy Learning with a Language Bottleneck

Megha Srivastava, Cedric Colas, Dorsa Sadigh, Jacob Andreas

Comments: 18 pages, 13 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Modern AI systems such as self-driving cars and game-playing agents achieve superhuman performance, but often lack human-like features such as generalization, interpretability and human inter-operability. Inspired by the rich interactions between language and decision-making in humans, we introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules that capture the strategies underlying their most rewarding behaviors. PLLB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules. In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show that PLLB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users, enabling more effective human-AI coordination.
[198] arXiv:2405.04120 [pdf, ps, html, other]: Title: Movable Antennas-Enabled Two-User Multicasting: Do We Really Need Alternating Optimization for Minimum Rate Maximization?

Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Subjects: Information Theory (cs.IT)

Movable antenna (MA) technology, which can reconfigure wireless channels by flexibly moving antenna positions in a specified region, has great potential for improving communication performance. In this paper, we consider a new setup of MAs-enabled multicasting, where we adopt a simple setting in which a linear MA array-enabled source (${\rm{S}}$) transmits a common message to two single-antenna users ${\rm{U}}_1$ and ${\rm{U}}_2$. We aim to maximize the minimum rate among these two users, by jointly optimizing the transmit beamforming and antenna positions at ${\rm{S}}$. Instead of utilizing the widely-used alternating optimization (AO) approach, we reveal, with rigorous proof, that the above two variables can be optimized separately: i) the optimal antenna positions can be firstly determined via the successive convex approximation technique, based on the rule of maximizing the correlation between ${\rm{S}}$-${\rm{U}}_1$ and ${\rm{S}}$-${\rm{U}}_2$ channels; ii) afterwards, the optimal closed-form transmit beamforming can be derived via simple arguments. Compared to AO, this new approach yields the same performance but reduces the computational complexities significantly. Moreover, it can provide insightful conclusions which are not possible with AO.
[199] arXiv:2405.04121 [pdf, ps, other]: Title: ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation

Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

Comments: 9 pages, 6 figures, ICME 2024 oral

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Multi-Stage Knowledge Distillation, transferring comprehensive knowledge from the Vision Foundation Model (VFM), extensively trained on diverse open-world images. This enables effective knowledge transfer to a lightweight student model across modalities. ELiTe employs Parameter-Efficient Fine-Tuning to strengthen the VFM teacher and expedite large-scale model training with minimal costs. Additionally, we introduce the Segment Anything Model based Pseudo-Label Generation approach to enhance low-quality image labels, facilitating robust semantic representations. Efficient knowledge transfer in ELiTe yields state-of-the-art results on the SemanticKITTI benchmark, outperforming real-time inference models. Our approach achieves this with significantly fewer parameters, confirming its effectiveness and efficiency.
[200] arXiv:2405.04122 [pdf, ps, html, other]: Title: Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning

Chunlin Tian, Zhan Shi, Xinpeng Qin, Li Li, Chengzhong Xu

Comments: Accepted by ICML 2024

Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a novel device selection solution called FedRank, which is an end-to-end, ranking-based approach that is pre-trained by imitation learning against state-of-the-art analytical approaches. It not only considers data and system heterogeneity at runtime but also adaptively and efficiently chooses the most suitable clients for model training. Specifically, FedRank views client selection in FL as a ranking problem and employs a pairwise training strategy for the smart selection process. Additionally, an imitation learning-based approach is designed to counteract the cold-start issues often seen in state-of-the-art learning-based approaches. Experimental results reveal that \model~ boosts model accuracy by 5.2\% to 56.9\%, accelerates the training convergence up to $2.01 \times$ and saves the energy consumption up to $40.1\%$.
[201] arXiv:2405.04124 [pdf, ps, html, other]: Title: Comparative Study of Recurrent Neural Networks for Virtual Analog Audio Effects Modeling

Riccardo Simionato, Stefano Fasciani

Comments: arXiv admin note: text overlap with arXiv:1810.06603 by other authors

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Analog electronic circuits are at the core of an important category of musical devices. The nonlinear features of their electronic components give analog musical devices a distinctive timbre and sound quality, making them highly desirable. Artificial neural networks have rapidly gained popularity for the emulation of analog audio effects circuits, particularly recurrent networks. While neural approaches have been successful in accurately modeling distortion circuits, they require architectural improvements that account for parameter conditioning and low latency response. In this article, we explore the application of recent machine learning advancements for virtual analog modeling. We compare State Space models and Linear Recurrent Units against the more common Long Short Term Memory networks. These have shown promising ability in sequence to sequence modeling tasks, showing a notable improvement in signal history encoding. Our comparative study uses these black box neural modeling techniques with a variety of audio effects. We evaluate the performance and limitations using multiple metrics aiming to assess the models' ability to accurately replicate energy envelopes, frequency contents, and transients in the audio signal. To incorporate control parameters we employ the Feature wise Linear Modulation method. Long Short Term Memory networks exhibit better accuracy in emulating distortions and equalizers, while the State Space model, followed by Long Short Term Memory networks when integrated in an encoder decoder structure, outperforms others in emulating saturation and compression. When considering long time variant characteristics, the State Space model demonstrates the greatest accuracy. The Long Short Term Memory and, in particular, Linear Recurrent Unit networks present more tendency to introduce audio artifacts.
[202] arXiv:2405.04125 [pdf, ps, other]: Title: Optimizing Prosumer Policies in Periodic Double Auctions Inspired by Equilibrium Analysis (Extended Version)

Bharat Manvi, Sanjay Chandlekar, Easwar Subramanian

Comments: 12 pages, 3 figures, Accepted at IJCAI 2024

Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)

We consider a periodic double auction (PDA) wherein the main participants are wholesale suppliers and brokers representing retailers. The suppliers are represented by a composite supply curve and the brokers are represented by individual bids. Additionally, the brokers can participate in small-scale selling by placing individual asks; hence, they act as prosumers. Specifically, in a PDA, the prosumers who are net buyers have multiple opportunities to buy or sell multiple units of a commodity with the aim of minimizing the cost of buying across multiple rounds of the PDA. Formulating optimal bidding strategies for such a PDA setting involves planning across current and future rounds while considering the bidding strategies of other agents. In this work, we propose Markov perfect Nash equilibrium (MPNE) policies for a setup where multiple prosumers with knowledge of the composite supply curve compete to procure commodities. Thereafter, the MPNE policies are used to develop an algorithm called MPNE-BBS for the case wherein the prosumers need to re-construct an approximate composite supply curve using past auction information. The efficacy of the proposed algorithm is demonstrated on the PowerTAC wholesale market simulator against several baselines and state-of-the-art bidding policies.
[203] arXiv:2405.04126 [pdf, ps, html, other]: Title: Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning

Karim Galliamov, Leila Khaertdinova, Karina Denisova

Comments: 17 pages, 4 figures, Accepted to AINL-2024

Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

The latest developments in Natural Language Processing (NLP) have demonstrated remarkable progress in a code-text retrieval problem. As the Transformer-based models used in this task continue to increase in size, the computational costs and time required for end-to-end fine-tuning become substantial. This poses a significant challenge for adapting and utilizing these models when computational resources are limited. Motivated by these concerns, we propose a fine-tuning framework that leverages Parameter-Efficient Fine-Tuning (PEFT) techniques. Moreover, we adopt contrastive learning objectives to improve the quality of bimodal representations learned by transformer models. Additionally, for PEFT methods we provide extensive benchmarking, the lack of which has been highlighted as a crucial problem in the literature. Based on the thorough experimentation with the CodeT5+ model conducted on two datasets, we demonstrate that the proposed fine-tuning framework has the potential to improve code-text retrieval performance by tuning only 0.4% parameters at most.
[204] arXiv:2405.04128 [pdf, ps, html, other]: Title: Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model

Zhonglong Chen, Changwei Song, Yining Chen, Jianqiang Li, Guanghui Fu, Yongsheng Tong, Qing Zhao

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available.
[205] arXiv:2405.04133 [pdf, ps, other]: Title: Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method

Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.
[206] arXiv:2405.04134 [pdf, ps, other]: Title: Geometry and Dynamics of LayerNorm

Paul M. Riechers

Subjects: Machine Learning (cs.LG)

A technical note aiming to offer deeper intuition for the LayerNorm function common in deep neural networks. LayerNorm is defined relative to a distinguished 'neural' basis, but it does more than just normalize the corresponding vector elements. Rather, it implements a composition -- of linear projection, nonlinear scaling, and then affine transformation -- on input activation vectors. We develop both a new mathematical expression and geometric intuition, to make the net effect more transparent. We emphasize that, when LayerNorm acts on an N-dimensional vector space, all outcomes of LayerNorm lie within the intersection of an (N-1)-dimensional hyperplane and the interior of an N-dimensional hyperellipsoid. This intersection is the interior of an (N-1)-dimensional hyperellipsoid, and typical inputs are mapped near its surface. We find the direction and length of the principal axes of this (N-1)-dimensional hyperellipsoid via the eigen-decomposition of a simply constructed matrix.
[207] arXiv:2405.04135 [pdf, ps, html, other]: Title: In-context Learning for Automated Driving Scenarios

Ziqi Zhou, Jingyue Zhang, Jingyuan Zhang, Boyue Wang, Tianyu Shi, Alaa Khamis

Comments: 7 pages, 6 figures, 35 references

Subjects: Artificial Intelligence (cs.AI)

One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also reaches better performance. Additionally, various strategies for reward-proxy and reward-shaping are investigated, revealing the significant impact of prompt design on shaping an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced and human-like automated driving systems. Our experimental data and source code can be found here.
[208] arXiv:2405.04136 [pdf, ps, html, other]: Title: Enriched BERT Embeddings for Scholarly Publication Classification

Benjamin Wolff, Eva Seidlmayer, Konrad U. Förstner

Comments: 8 pages, 2 figures, NSLP2024 conference

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.This paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref. Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.
[209] arXiv:2405.04138 [pdf, ps, other]: Title: GPT-Enabled Cybersecurity Training: A Tailored Approach for Effective Awareness

Nabil Al-Dhamari, Nathan Clarke

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

This study explores the limitations of traditional Cybersecurity Awareness and Training (CSAT) programs and proposes an innovative solution using Generative Pre-Trained Transformers (GPT) to address these shortcomings. Traditional approaches lack personalization and adaptability to individual learning styles. To overcome these challenges, the study integrates GPT models to deliver highly tailored and dynamic cybersecurity learning expe-riences. Leveraging natural language processing capabilities, the proposed approach personalizes training modules based on individual trainee pro-files, helping to ensure engagement and effectiveness. An experiment using a GPT model to provide a real-time and adaptive CSAT experience through generating customized training content. The findings have demonstrated a significant improvement over traditional programs, addressing issues of en-gagement, dynamicity, and relevance. GPT-powered CSAT programs offer a scalable and effective solution to enhance cybersecurity awareness, provid-ing personalized training content that better prepares individuals to miti-gate cybersecurity risks in their specific roles within the organization.
[210] arXiv:2405.04144 [pdf, ps, other]: Title: Lossy Compression with Data, Perception, and Classification Constraints

Yuhan Wang, Youlong Wu, Shuai Ma, Ying-Jun Angela Zhang

Comments: 10 pages, in part submitted to ITW 2024

Subjects: Information Theory (cs.IT)

Balancing diverse task objectives under limited rate is crucial for developing robust multi-task deep learning (DL) models and improving performance across various domains. In this paper, we consider the lossy compression problem with human-centric and task-oriented metrics, such as perceptual quality and classification accuracy. We investigate two ternary relationships, namely, the rate-distortion-classification (RDC) and rate-perception-classification (RPC). For both RDC and RPC functions, we derive the closed-form expressions of the optimal rate for both binary and Gaussian sources. Notably, both RDC and RPC relationships exhibit distinct characteristics compared to the previous RDP tradeoff proposed by Blau et al. Then, we conduct experiments by implementing a DL-based image compression framework, incorporating rate, distortion, perception, and classification constraints. The experimental results verify the theoretical characteristics of RDC and RPC tradeoffs, providing information-theoretical insights into the design of loss functions to balance diverse task objectives in deep learning.
[211] arXiv:2405.04146 [pdf, ps, html, other]: Title: pFedLVM: A Large Vision Model (LVM)-Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

Wei-Bin Kou, Qingfeng Lin, Ming Tang, Sheng Xu, Rongguang Ye, Yang Leng, Shuai Wang, Zhenyu Chen, Guangxu Zhu, Yik-Chung Wu

Comments: This paper was submitted to IEEE Transactions on Mobile Computing (TMC) on Apr. 6th, 2024

Subjects: Robotics (cs.RO)

Deep learning-based Autonomous Driving (AD) models often exhibit poor generalization due to data heterogeneity in an ever domain-shifting environment. While Federated Learning (FL) could improve the generalization of an AD model (known as FedAD system), conventional models often struggle with under-fitting as the amount of accumulated training data progressively increases. To address this issue, instead of conventional small models, employing Large Vision Models (LVMs) in FedAD is a viable option for better learning of representations from a vast volume of data. However, implementing LVMs in FedAD introduces three challenges: (I) the extremely high communication overheads associated with transmitting LVMs between participating vehicles and a central server; (II) lack of computing resource to deploy LVMs on each vehicle; (III) the performance drop due to LVM focusing on shared features but overlooking local vehicle characteristics. To overcome these challenges, we propose pFedLVM, a LVM-Driven, Latent Feature-Based Personalized Federated Learning framework. In this approach, the LVM is deployed only on central server, which effectively alleviates the computational burden on individual vehicles. Furthermore, the exchange between central server and vehicles are the learned features rather than the LVM parameters, which significantly reduces communication overhead. In addition, we utilize both shared features from all participating vehicles and individual characteristics from each vehicle to establish a personalized learning mechanism. This enables each vehicle's model to learn features from others while preserving its personalized characteristics, thereby outperforming globally shared models trained in general FL. Extensive experiments demonstrate that pFedLVM outperforms the existing state-of-the-art approaches.
[212] arXiv:2405.04151 [pdf, ps, html, other]: Title: Gas Source Localization Using physics Guided Neural Networks

Victor Scott Prieto Ruiz, Patrick Hinsen, Thomas Wiedemann, Constantin Christof, Dmitriy Shutin

Comments: Submission to the International Symposium of Electronic Nose (ISOEN)

Subjects: Machine Learning (cs.LG)

This work discusses a novel method for estimating the location of a gas source based on spatially distributed concentration measurements taken, e.g., by a mobile robot or flying platform that follows a predefined trajectory to collect samples. The proposed approach uses a Physics-Guided Neural Network to approximate the gas dispersion with the source location as an additional network input. After an initial offline training phase, the neural network can be used to efficiently solve the inverse problem of localizing the gas source based on measurements. The proposed approach allows avoiding rather costly numerical simulations of gas physics needed for solving inverse problems. Our experiments show that the method localizes the source well, even when dealing with measurements affected by noise.
[213] arXiv:2405.04152 [pdf, ps, other]: Title: CAKE: Sharing Slices of Confidential Data on Blockchain

Edoardo Marangone, Michele Spina, Claudio Di Ciccio, Ingo Weber

Subjects: Cryptography and Security (cs.CR)

Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
[214] arXiv:2405.04156 [pdf, ps, html, other]: Title: How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability

Jorge García-Carrasco, Alejandro Maté, Juan Trujillo

Journal-ref: Jorge Garc\'ia-Carrasco, Alejandro Mat\'e, Juan Carlos Trujillo Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3322-3330, 2024

Subjects: Machine Learning (cs.LG)

Transformer-based language models are treated as black-boxes because of their large number of parameters and complex internal interactions, which is a serious safety concern. Mechanistic Interpretability (MI) intends to reverse-engineer neural network behaviors in terms of human-understandable components. In this work, we focus on understanding how GPT-2 Small performs the task of predicting three-letter acronyms. Previous works in the MI field have focused so far on tasks that predict a single token. To the best of our knowledge, this is the first work that tries to mechanistically understand a behavior involving the prediction of multiple consecutive tokens. We discover that the prediction is performed by a circuit composed of 8 attention heads (~5% of the total heads) which we classified in three groups according to their role. We also demonstrate that these heads concentrate the acronym prediction functionality. In addition, we mechanistically interpret the most relevant heads of the circuit and find out that they use positional information which is propagated via the causal mask mechanism. We expect this work to lay the foundation for understanding more complex behaviors involving multiple-token predictions.
[215] arXiv:2405.04157 [pdf, ps, html, other]: Title: Two-dimensional Kripke Semantics I: Presheaves

G. A. Kavvos

Journal-ref: LIPIcs, Volume 299, FSCD 2024, article 11

Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT); Logic (math.LO)

The study of modal logic has witnessed tremendous development following the introduction of Kripke semantics. However, recent developments in programming languages and type theory have led to a second way of studying modalities, namely through their categorical semantics. We show how the two correspond.
[216] arXiv:2405.04160 [pdf, ps, other]: Title: A Causal Explainable Guardrails for Large Language Models

Zhixuan Chu, Yan Wang, Longfei Li, Zhibo Wang, Zhan Qin, Kui Ren

Comments: 23 pages

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have shown impressive performance in natural language tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for steering LLMs towards desired attributes often assume unbiased representations and rely solely on steering prompts. However, the representations learned from pre-training can introduce semantic biases that influence the steering process, leading to suboptimal results. We propose LLMGuardaril, a novel framework that incorporates causal analysis and adversarial learning to obtain unbiased steering representations in LLMs. LLMGuardaril systematically identifies and blocks the confounding effects of biases, enabling the extraction of unbiased steering representations. Additionally, it includes an explainable component that provides insights into the alignment between the generated output and the desired direction. Experiments demonstrate LLMGuardaril's effectiveness in steering LLMs towards desired attributes while mitigating biases. Our work contributes to the development of safe and reliable LLMs that align with desired attributes. We discuss the limitations and future research directions, highlighting the need for ongoing research to address the ethical implications of large language models.
[217] arXiv:2405.04161 [pdf, ps, other]: Title: Opportunities for machine learning in scientific discovery

Ricardo Vinuesa, Jean Rabault, Hossein Azizpour, Stefan Bauer, Bingni W. Brunton, Arne Elofsson, Elias Jarlebring, Hedvig Kjellstrom, Stefano Markidis, David Marlevi, Paola Cinnella, Steven L. Brunton

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Technological advancements have substantially increased computational power and data availability, enabling the application of powerful machine-learning (ML) techniques across various fields. However, our ability to leverage ML methods for scientific discovery, {\it i.e.} to obtain fundamental and formalized knowledge about natural processes, is still in its infancy. In this review, we explore how the scientific community can increasingly leverage ML techniques to achieve scientific discoveries. We observe that the applicability and opportunity of ML depends strongly on the nature of the problem domain, and whether we have full ({\it e.g.}, turbulence), partial ({\it e.g.}, computational biochemistry), or no ({\it e.g.}, neuroscience) {\it a-priori} knowledge about the governing equations and physical properties of the system. Although challenges remain, principled use of ML is opening up new avenues for fundamental scientific discoveries. Throughout these diverse fields, there is a theme that ML is enabling researchers to embrace complexity in observational data that was previously intractable to classic analysis and numerical investigations.
[218] arXiv:2405.04163 [pdf, ps, html, other]: Title: MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization

Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly

Comments: 13 pages, Accepted to the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 (Main) Track

Subjects: Computation and Language (cs.CL)

This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained language models (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time -- from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) -- bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines). We make the codebase publicly available at this https URL.
[219] arXiv:2405.04164 [pdf, ps, other]: Title: Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

Ryan Wong, Necati Cihan Camgoz, Richard Bowden

Comments: Accepted at ICLR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automatic Sign Language Translation requires the integration of both computer vision and natural language processing to effectively bridge the communication gap between sign and spoken languages. However, the deficiency in large-scale training data to support sign language translation means we need to leverage resources from spoken language. We introduce, Sign2GPT, a novel framework for sign language translation that utilizes large-scale pretrained vision and language models via lightweight adapters for gloss-free sign language translation. The lightweight adapters are crucial for sign language translation, due to the constraints imposed by limited dataset sizes and the computational requirements when training with long sign videos. We also propose a novel pretraining strategy that directs our encoder to learn sign representations from automatically extracted pseudo-glosses without requiring gloss order information or annotations. We evaluate our approach on two public benchmark sign language translation datasets, namely RWTH-PHOENIX-Weather 2014T and CSL-Daily, and improve on state-of-the-art gloss-free translation performance with a significant margin.
[220] arXiv:2405.04165 [pdf, ps, other]: Title: LingML: Linguistic-Informed Machine Learning for Enhanced Fake News Detection

Jasraj Singh, Fang Liu, Hong Xu, Bee Chin Ng, Wei Zhang

Comments: 7 pages

Subjects: Computation and Language (cs.CL)

Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with linguistics input and we propose LingML, linguistic-informed ML, for fake news detection. We conducted an experimental study with a popular dataset on fake news during the pandemic. The experiment results show that our proposed solution is highly effective. There are fewer than two errors out of every ten attempts with only linguistic input used in ML and the knowledge is highly explainable. When linguistics input is integrated with advanced large-scale ML models for natural language processing, our solution outperforms existing ones with 1.8% average error rate. LingML creates a new path with linguistics to push the frontier of effective and efficient fake news detection. It also sheds light on real-world multi-disciplinary applications requiring both ML and domain expertise to achieve optimal performance.
[221] arXiv:2405.04167 [pdf, ps, html, other]: Title: Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

Aobo Li, Jinjian Wu, Yongxu Liu, Leida Li

Comments: Accepted by CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may not improve or even be harmful to generalizing authentic image quality assessment. To solve this challenge, we propose distortion-guided unsupervised domain adaptation for BIQA (DGQA), a novel framework that leverages adaptive multi-domain selection via prior knowledge from distortion to match the data distribution between the source domains and the target domain, thereby reducing negative transfer from the outlier source domains. Extensive experiments on two cross-domain settings (synthetic distortion to authentic distortion and synthetic distortion to algorithmic distortion) have demonstrated the effectiveness of our proposed DGQA. Besides, DGQA is orthogonal to existing model-based BIQA methods, and can be used in combination with such models to improve performance with less training data.
[222] arXiv:2405.04168 [pdf, ps, html, other]: Title: Three variations of Heads or Tails Game for Bitcoin

Cyril Grunspan, Ricardo Perez-Marco

Comments: 16 pages

Subjects: Cryptography and Security (cs.CR); Probability (math.PR)

We present three very simple variants of the classic Heads or Tails game using chips, each of which contributes to our understanding of the Bitcoin protocol. The first variant addresses the issue of temporary Bitcoin forks, which occur when two miners discover blocks simultaneously. We determine the threshold at which an honest but temporarily ``Byzantine'' miner persists in mining on their fork to save his orphaned blocks. The second variant of Heads or Tails game is biased in favor of the player and helps to explain why the difficulty adjustment formula is vulnerable to attacks of Nakamoto's consensus. We derive directly and in a simple way, without relying on a Markov decision solver as was the case until now, the threshold beyond which a miner without connectivity finds it advantageous to adopt a deviant mining strategy on Bitcoin. The third variant of Heads or Tails game is unbiased and demonstrates that this issue in the Difficulty Adjustment formula can be fully rectified. Our results are in agreement with the existing literature that we clarify both qualitatively and quantitatively using very simple models and scripts that are easy to implement.
[223] arXiv:2405.04170 [pdf, ps, other]: Title: D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models

Duygu Altinok

Comments: accepted to SemEval-2024, ranked 9th on Task 2

Subjects: Computation and Language (cs.CL)

Large language models (LLMs) have garnered significant attention and widespread usage due to their impressive performance in various tasks. However, they are not without their own set of challenges, including issues such as hallucinations, factual inconsistencies, and limitations in numerical-quantitative reasoning. Evaluating LLMs in miscellaneous reasoning tasks remains an active area of research. Prior to the breakthrough of LLMs, Transformers had already proven successful in the medical domain, effectively employed for various natural language understanding (NLU) tasks. Following this trend, LLMs have also been trained and utilized in the medical domain, raising concerns regarding factual accuracy, adherence to safety protocols, and inherent limitations. In this paper, we focus on evaluating the natural language inference capabilities of popular open-source and closed-source LLMs using clinical trial reports as the dataset. We present the performance results of each LLM and further analyze their performance on a development set, particularly focusing on challenging instances that involve medical abbreviations and require numerical-quantitative reasoning. Gemini, our leading LLM, achieved a test set F1-score of 0.748, securing the ninth position on the task scoreboard. Our work is the first of its kind, offering a thorough examination of the inference capabilities of LLMs within the medical domain.
[224] arXiv:2405.04171 [pdf, ps, other]: Title: FedStale: leveraging stale client updates in federated learning

Angelo Rodio, Giovanni Neglia

Comments: 33 pages, 5 figures, preprint

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Federated learning algorithms, such as FedAvg, are negatively affected by data heterogeneity and partial client participation. To mitigate the latter problem, global variance reduction methods, like FedVARP, leverage stale model updates for non-participating clients. These methods are effective under homogeneous client participation. Yet, this paper shows that, when some clients participate much less than others, aggregating updates with different levels of staleness can detrimentally affect the training process. Motivated by this observation, we introduce FedStale, a novel algorithm that updates the global model in each round through a convex combination of "fresh" updates from participating clients and "stale" updates from non-participating ones. By adjusting the weight in the convex combination, FedStale interpolates between FedAvg, which only uses fresh updates, and FedVARP, which treats fresh and stale updates equally. Our analysis of FedStale convergence yields the following novel findings: i) it integrates and extends previous FedAvg and FedVARP analyses to heterogeneous client participation; ii) it underscores how the least participating client influences convergence error; iii) it provides practical guidelines to best exploit stale updates, showing that their usefulness diminishes as data heterogeneity decreases and participation heterogeneity increases. Extensive experiments featuring diverse levels of client data and participation heterogeneity not only confirm these findings but also show that FedStale outperforms both FedAvg and FedVARP in many settings.
[225] arXiv:2405.04175 [pdf, ps, html, other]: Title: Topicwise Separable Sentence Retrieval for Medical Report Generation

Junting Zhao, Yang Zhou, Zhihao Chen, Huazhu Fu, Liang Wan

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Automated radiology reporting holds immense clinical potential in alleviating the burdensome workload of radiologists and mitigating diagnostic bias. Recently, retrieval-based report generation methods have garnered increasing attention due to their inherent advantages in terms of the quality and consistency of generated reports. However, due to the long-tail distribution of the training data, these models tend to learn frequently occurring sentences and topics, overlooking the rare topics. Regrettably, in many cases, the descriptions of rare topics often indicate critical findings that should be mentioned in the report. To address this problem, we introduce a Topicwise Separable Sentence Retrieval (Teaser) for medical report generation. To ensure comprehensive learning of both common and rare topics, we categorize queries into common and rare types to learn differentiated topics, and then propose Topic Contrastive Loss to effectively align topics and queries in the latent space. Moreover, we integrate an Abstractor module following the extraction of visual features, which aids the topic decoder in gaining a deeper understanding of the visual observational intent. Experiments on the MIMIC-CXR and IU X-ray datasets demonstrate that Teaser surpasses state-of-the-art models, while also validating its capability to effectively represent rare topics and establish more dependable correspondences between queries and topics.
[226] arXiv:2405.04177 [pdf, ps, other]: Title: A 49.8mm2 Fully Integrated, 1.5m Transmission-Range, High-Data-Rate IR-UWB Transmitter for Brain Implants

Cong Ding, Mingxiang Gao, Anja K. Skrivervik, Mahsa Shoaran

Subjects: Systems and Control (eess.SY)

To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of 49.8mm2 and the longest transmission range of 1.5m compared to the state-of-the-art IR-UWB TXs.
[227] arXiv:2405.04180 [pdf, ps, html, other]: Title: Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models

Zhixuan Chu, Lei Zhang, Yichen Sun, Siqiao Xue, Zhibo Wang, Zhan Qin, Kui Ren

Comments: arXiv admin note: text overlap with arXiv:2306.08302, arXiv:2403.05131 by other authors

Subjects: Machine Learning (cs.LG)

The rapid advancement in text-to-video (T2V) generative models has enabled the synthesis of high-fidelity video content guided by textual descriptions. Despite this significant progress, these models are often susceptible to hallucination, generating contents that contradict the input text, which poses a challenge to their reliability and practical deployment. To address this critical issue, we introduce the SoraDetector, a novel unified framework designed to detect hallucinations across diverse large T2V models, including the cutting-edge Sora model. Our framework is built upon a comprehensive analysis of hallucination phenomena, categorizing them based on their manifestation in the video content. Leveraging the state-of-the-art keyframe extraction techniques and multimodal large language models, SoraDetector first evaluates the consistency between extracted video content summary and textual prompts, then constructs static and dynamic knowledge graphs (KGs) from frames to detect hallucination both in single frames and across frames. Sora Detector provides a robust and quantifiable measure of consistency, static and dynamic hallucination. In addition, we have developed the Sora Detector Agent to automate the hallucination detection process and generate a complete video quality report for each input video. Lastly, we present a novel meta-evaluation benchmark, T2VHaluBench, meticulously crafted to facilitate the evaluation of advancements in T2V hallucination detection. Through extensive experiments on videos generated by Sora and other large T2V models, we demonstrate the efficacy of our approach in accurately detecting hallucinations. The code and dataset can be accessed via GitHub.
[228] arXiv:2405.04181 [pdf, ps, html, other]: Title: Detecting music deepfakes is easy but actually hard

Darius Afchar, Gabriel Meseguer Brocal, Romain Hennequin

Comments: Under review

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In the face of a new era of generative models, the detection of artificially generated content has become a matter of utmost importance. The ability to create credible minute-long music deepfakes in a few seconds on user-friendly platforms poses a real threat of fraud on streaming services and unfair competition to human artists. This paper demonstrates the possibility (and surprising ease) of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%. To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery. Nevertheless, informed by decades of literature on forgery detection in other fields, we stress that a good test score is not the end of the story. We step back from the straightforward ML framework and expose many facets that could be problematic with such a deployed detector: calibration, robustness to audio manipulation, generalisation to unseen models, interpretability and possibility for recourse. This second part acts as a position for future research steps in the field and a caveat to a flourishing market of fake content checkers.
[229] arXiv:2405.04188 [pdf, ps, other]: Title: Behavioral Manifolds: Representing the Landscape of Grasp Affordances in the Relative Pose Space

Michael Zechmair, Yannick Morel

Comments: 8 pages, 8 figures

Subjects: Robotics (cs.RO)

The use of machine learning to investigate grasp affordances has received extensive attention over the past several decades. The existing literature provides a robust basis to build upon, though a number of aspects may be improved. Results commonly work in terms of grasp configuration, with little consideration for the manner in which the grasp may be (re-)produced from a reachability and trajectory planning perspective. In addition, the majority of existing learning approaches focus of producing a single viable grasp, offering little transparency on how the result was reached, or insights on its robustness. We propose a different perspective on grasp affordance learning, explicitly accounting for grasp synthesis; that is, the manner in which manipulator kinematics are used to allow materialization of grasps. The approach allows to explicitly map the grasp policy space in terms of generated grasp types and associated grasp quality. Results of numerical simulations illustrate merit of the method and highlight the manner in which it may promote a greater degree of explainability for otherwise intransparent reinforcement processes.
[230] arXiv:2405.04189 [pdf, ps, other]: Title: Artificial Intelligence-powered fossil shark tooth identification: Unleashing the potential of Convolutional Neural Networks

Andrea Barucci, Giulia Ciacci, Pietro Liò, Tiago Azevedo, Andrea Di Cencio, Marco Merella, Giovanni Bianucci, Giulia Bosio, Simone Casati, Alberto Collareta

Comments: 40 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

All fields of knowledge are being impacted by Artificial Intelligence. In particular, the Deep Learning paradigm enables the development of data analysis tools that support subject matter experts in a variety of sectors, from physics up to the recognition of ancient languages. Palaeontology is now observing this trend as well. This study explores the capability of Convolutional Neural Networks (CNNs), a particular class of Deep Learning algorithms specifically crafted for computer vision tasks, to classify images of isolated fossil shark teeth gathered from online datasets as well as from the authors$'$ experience on Peruvian Miocene and Italian Pliocene fossil assemblages. The shark taxa that are included in the final, composite dataset (which consists of more than one thousand images) are representative of both extinct and extant genera, namely, Carcharhinus, Carcharias, Carcharocles, Chlamydoselachus, Cosmopolitodus, Galeocerdo, Hemipristis, Notorynchus, Prionace and Squatina. We developed a CNN, named SharkNet-X, specifically tailored on our recognition task, reaching a 5-fold cross validated mean accuracy of 0.85 to identify images containing a single shark tooth. Furthermore, we elaborated a visualization of the features extracted from images using the last dense layer of the CNN, achieved through the application of the clustering technique t-SNE. In addition, in order to understand and explain the behaviour of the CNN while giving a paleontological point of view on the results, we introduced the explainability method SHAP. To the best of our knowledge, this is the first instance in which this method is applied to the field of palaeontology. The main goal of this work is to showcase how Deep Learning techniques can aid in identifying isolated fossil shark teeth, paving the way for developing new information tools for automating the recognition and classification of fossils.
[231] arXiv:2405.04191 [pdf, ps, html, other]: Title: Effective and Robust Adversarial Training against Data and Label Corruptions

Peng-Fei Zhang, Zi Huang, Xin-Shun Xu, Guangdong Bai

Comments: 12 pages, 8 figures

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.
[232] arXiv:2405.04192 [pdf, ps, html, other]: Title: Quantum software experiments: A reporting and laboratory package structure guidelines

Enrique Moguel, José Antonio Parejo, Antonio Ruiz-Cortés, Jose Garcia-Alonso, Juan Manuel Murillo

Comments: 10 pages, 1 figure, and 6 tables

Subjects: Software Engineering (cs.SE)

Background. In the realm of software engineering, there are widely accepted guidelines for reporting and creating laboratory packages. Unfortunately, the landscape differs considerably in the emerging field of quantum computing. To the best of our knowledge, no standardized guidelines exist for describing experiments or outlining the necessary structures for quantum software laboratory packages. Aims. This paper endeavors to enhance the replicability and verifiability of quantum software experiments. Method. This objective is pursued through the proposition of guidelines for reporting and the delineation of a structure for laboratory packages tailored to quantum computing experiments. Specifically, we advocate for an extension and adaption of established guidelines in experimental software engineering, integrating novel elements to address the specific requirements of quantum software engineering. Results. In validating the utility and effectiveness of the proposed guidelines, we conducted a review encompassing 11 works (5 focusing on reporting guidelines and 6 on laboratory packages). In particular, this review highlighted the absence of standardized guidelines and structure of laboratory packages for quantum software experiments. Conclusions. Our assessment revealed gaps in information and opportunities for enhancement within the evaluated papers and laboratory packages. Our proposal contributes to the advancement of quantum software engineering research, taking a fundamental step toward fostering rigorous and reliable scientific research in this emerging paradigm.
[233] arXiv:2405.04195 [pdf, ps, other]: Title: Rational methods for abstract linear, non-homogeneous problems without order reduction

Carlos Arranz Simón, Cesar Palencia

Comments: 14 pages, 4 tables

Subjects: Numerical Analysis (math.NA)

Starting from an A-stable rational approximation to $\rm{e}^z$ of order $p$, $$r(z)= 1+ z+ \cdots + z^p/ p! + O(z^{p+1}),$$ families of stable methods are proposed to time discretize abstract IVP's of the type $u'(t) = A u(t) + f(t)$. These numerical procedures turn out to be of order $p$, thus overcoming the order reduction phenomenon, and only one evaluation of $f$ per step is required.
[234] arXiv:2405.04198 [pdf, ps, html, other]: Title: Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts

Changyuan Zhao, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin (Sherman)Shen, Khaled B. Letaief

Comments: 9 pages, 4 figures

Subjects: Cryptography and Security (cs.CR)

AI technologies have become more widely adopted in wireless communications. As an emerging type of AI technologies, the generative artificial intelligence (GAI) gains lots of attention in communication security. Due to its powerful learning ability, GAI models have demonstrated superiority over conventional AI methods. However, GAI still has several limitations, including high computational complexity and limited adaptability. Mixture of Experts (MoE), which uses multiple expert models for prediction through a gate mechanism, proposes possible solutions. Firstly, we review GAI model's applications in physical layer communication security, discuss limitations, and explore how MoE can help GAI overcome these limitations. Furthermore, we propose an MoE-enabled GAI framework for network optimization problems for communication security. To demonstrate the framework's effectiveness, we provide a case study in a cooperative friendly jamming scenario. The experimental results show that the MoE-enabled framework effectively assists the GAI algorithm, solves its limitations, and enhances communication security.
[235] arXiv:2405.04206 [pdf, ps, other]: Title: NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator

Mohit Upadhyay, Rohan Juneja, Weng-Fai Wong, Li-Shiuan Peh

Comments: 6 pages, 8 figures

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations within the NoC of the accelerators, and can be overlaid onto existing neuro accelerators to map attention layers at the edge. Our results show that the NOVA architecture is up to 37.8x more power-efficient than state-of-the-art hardware approximators when running existing attention-based neural networks.
[236] arXiv:2405.04211 [pdf, ps, html, other]: Title: Breast Histopathology Image Retrieval by Attention-based Adversarially Regularized Variational Graph Autoencoder with Contrastive Learning-Based Feature Extraction

Nematollah Saeidi, Hossein Karshenas, Bijan Shoushtarian, Sepideh Hatamikia, Ramona Woitek, Amirreza Mahbod

Comments: 31 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Breast cancer is a significant global health concern, particularly for women. Early detection and appropriate treatment are crucial in mitigating its impact, with histopathology examinations playing a vital role in swift diagnosis. However, these examinations often require a substantial workforce and experienced medical experts for proper recognition and cancer grading. Automated image retrieval systems have the potential to assist pathologists in identifying cancerous tissues, thereby accelerating the diagnostic process. Nevertheless, due to considerable variability among the tissue and cell patterns in histological images, proposing an accurate image retrieval model is very challenging.
This work introduces a novel attention-based adversarially regularized variational graph autoencoder model for breast histological image retrieval. Additionally, we incorporated cluster-guided contrastive learning as the graph feature extractor to boost the retrieval performance. We evaluated the proposed model's performance on two publicly available datasets of breast cancer histological images and achieved superior or very competitive retrieval performance, with average mAP scores of 96.5% for the BreakHis dataset and 94.7% for the BACH dataset, and mVP scores of 91.9% and 91.3%, respectively.
Our proposed retrieval model has the potential to be used in clinical settings to enhance diagnostic performance and ultimately benefit patients.
[237] arXiv:2405.04212 [pdf, ps, html, other]: Title: Green Tsetlin Redefining Efficiency in Tsetlin Machine Frameworks

Sondre Glimsdal, Sebastian Østby, Tobias M. Brambo, Eirik M. Vinje

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Green Tsetlin (GT) is a Tsetlin Machine (TM) framework developed to solve real-world problems using TMs. Several frameworks already exist that provide access to TM implementations. However, these either lack features or have a research-first focus. GT is an easy-to-use framework that aims to lower the complexity and provide a production-ready TM implementation that is great for experienced practitioners and beginners. To this end, GT establishes a clear separation between training and inference. A C++ backend with a Python interface provides competitive training and inference performance, with the option of running in pure Python. It also integrates support for critical components such as exporting trained models, hyper-parameter search, and cross-validation out-of-the-box.
[238] arXiv:2405.04214 [pdf, ps, html, other]: Title: On the Gelfand Problem and Viscosity Matrices for Two-Dimensional Hyperbolic Systems of Conservation Laws

Shaoshuai Chu, Igor Kliakhandler, Alexander Kurganov

Subjects: Numerical Analysis (math.NA)

We present counter-intuitive examples of a viscous regularizations of a two-dimensional strictly hyperbolic system of conservation laws. The regularizations are obtained using two different viscosity matrices. While for both of the constructed ``viscous'' systems waves propagating in either $x$- or $y$-directions are stable, oblique waves may be linearly unstable. Numerical simulations fully corroborate these analytical results. To the best of our knowledge, this is the first nontrivial result related to the multidimensional Gelfand problem. Our conjectures provide direct answer to Gelfand's problem both in one- and multi-dimensional cases.
[239] arXiv:2405.04215 [pdf, ps, html, other]: Title: NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Elliot Gestrin, Marco Kuhlmann, Jendrik Seipp

Comments: Accepted for the ICAPS 2024 Workshop on Human-Aware and Explainable Planning

Subjects: Artificial Intelligence (cs.AI)

Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with Large Language Models (LLMs) allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.
[240] arXiv:2405.04219 [pdf, ps, html, other]: Title: Iterative Experience Refinement of Software-Developing Agents

Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Software Engineering (cs.SE)

Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks iterative refinement and thus hampers agents' adaptability. In this paper, we introduce the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution. We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches. Augmented with our heuristic experience elimination, the method prioritizes high-quality and frequently-used experiences, effectively managing the experience space and enhancing efficiency. Extensive experiments show that while the successive pattern may yield superior results, the cumulative pattern provides more stable performance. Moreover, experience elimination facilitates achieving better performance using just 11.54% of a high-quality subset.
[241] arXiv:2405.04225 [pdf, ps, other]: Title: Long-term usage of the off-grid photovoltaic system with lithium-ion battery-based energy storage system on high mountains: A case study in Payiun Lodge on Mt. Jade in Taiwan

Hsien-Ching Chung

Comments: 28 pages, 12 figures, 4 tables

Subjects: Systems and Control (eess.SY); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

Energy supply on high mountains remains an open issue since grid connection is unavailable. In the past, diesel generators with lead-acid battery energy storage systems (ESSs) are applied in most cases. Recently, photovoltaic (PV) system with lithium-ion (Li-ion) battery ESS is an appropriate method for solving this problem in a greener way. In 2016, an off-grid PV system with Li-ion battery ESS has been installed in Paiyun Lodge on Mt. Jade (the highest lodge in Taiwan). After operation for more than 7 years, the aging problem of the whole electric power system becomes a critical issue for long-term usage. In this work, a method is established for analyzing the massive energy data (over 7 million rows) and estimating the health of the Li-ion battery system, such as daily operation patterns as well as C-rate, temperature, and accumulated energy distributions. The accomplished electric power improvement project dealing with the power system aging is reported. Based on the long-term usage experience, a simple cost analysis model between lead-acid and Li-ion battery systems is built, explaining that the expensive Li-ion batteries can compete with the cheap lead-acid batteries for long-term usage on high mountains. This case study provides engineers and researchers a fundamental understanding of the long-term usage of off-grid PV ESSs and engineering on high mountains.
[242] arXiv:2405.04233 [pdf, ps, other]: Title: Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu

Comments: Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as understanding some professional photography techniques, on par with Sora -- the most powerful reported text-to-video generator. Finally, we perform initial experiments on other controllable video generation, including canny-to-video generation, video prediction and subject-driven generation, which demonstrate promising results.
[243] arXiv:2405.04235 [pdf, ps, other]: Title: LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning

Zeyu Feng, Hao Luan, Pranav Goyal, Harold Soh

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Operating effectively in complex environments while complying with specified constraints is crucial for the safe and successful deployment of robots that interact with and operate around people. In this work, we focus on generating long-horizon trajectories that adhere to novel static and temporally-extended constraints/instructions at test time. We propose a data-driven diffusion-based framework, LTLDoG, that modifies the inference steps of the reverse process given an instruction specified using finite linear temporal logic ($\text{LTL}_f$). LTLDoG leverages a satisfaction value function on $\text{LTL}_f$ and guides the sampling steps using its gradient field. This value function can also be trained to generalize to new instructions not observed during training, enabling flexible test-time adaptability. Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences.
[244] arXiv:2405.04236 [pdf, ps, html, other]: Title: Semantic API Alignment: Linking High-level User Goals to APIs

Robert Feldt, Riccardo Coppola

Subjects: Software Engineering (cs.SE)

Large Language Models (LLMs) are becoming key in automating and assisting various software development tasks, including text-based tasks in requirements engineering but also in coding. Typically, these models are used to automate small portions of existing tasks, but we present a broader vision to span multiple steps from requirements engineering to implementation using existing libraries. This approach, which we call Semantic API Alignment (SEAL), aims to bridge the gap between a user's high-level goals and the specific functions of one or more APIs.
In this position paper, we propose a system architecture where a set of LLM-powered ``agents'' match such high-level objectives with appropriate API calls. This system could facilitate automated programming by finding matching links or, alternatively, explaining mismatches to guide manual intervention or further development.
As an initial pilot, our paper demonstrates this concept by applying LLMs to Goal-Oriented Requirements Engineering (GORE), via sub-goal analysis, for aligning with REST API specifications, specifically through a case study involving a GitHub statistics API. We discuss the potential of our approach to enhance complex tasks in software development and requirements engineering and outline future directions for research.
[245] arXiv:2405.04237 [pdf, ps, other]: Title: QR factorization of ill-conditioned tall-and-skinny matrices on distributed-memory systems

Nenad Mijić, Abhiram Kaushik, Davor Davidović

Comments: 12 pages, 10 figures, 2 tables

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

In this paper we present a novel algorithm developed for computing the QR factorisation of extremely ill-conditioned tall-and-skinny matrices on distributed memory systems. The algorithm is based on the communication-avoiding CholeskyQR2 algorithm and its block Gram-Schmidt variant. The latter improves the numerical stability of the CholeskyQR2 algorithm and significantly reduces the loss of orthogonality even for matrices with condition numbers up to $10^{15}$. Currently, there is no distributed GPU version of this algorithm available in the literature which prevents the application of this method to very large matrices. In our work we provide a distributed implementation of this algorithm and also introduce a modified version that improves the performance, especially in the case of extremely ill-conditioned matrices. The main innovation of our approach lies in the interleaving of the CholeskyQR steps with the Gram-Schmidt orthogonalisation, which ensures that update steps are performed with fully orthogonalised panels. The obtained orthogonality and numerical stability of our modified algorithm is equivalent to CholeskyQR2 with Gram-Schmidt and other state-of-the-art methods. Weak scaling tests performed with our test matrices show significant performance improvements. In particular, our algorithm outperforms state-of-the-art Householder-based QR factorisation algorithms available in ScaLAPACK by a factor of $6$ on CPU-only systems and up to $80\times$ on GPU-based systems with distributed memory.
[246] arXiv:2405.04241 [pdf, ps, html, other]: Title: Exploring the Potential of Robot-Collected Data for Training Gesture Classification Systems

Alejandro Garcia-Sosa, Jose J. Quintana-Hernandez, Miguel A. Ferrer Ballester, Cristina Carmona-Duarte

Journal-ref: IGS2023, 2023, 116-120

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Sensors and Artificial Intelligence (AI) have revolutionized the analysis of human movement, but the scarcity of specific samples presents a significant challenge in training intelligent systems, particularly in the context of diagnosing neurodegenerative diseases. This study investigates the feasibility of utilizing robot-collected data to train classification systems traditionally trained with human-collected data. As a proof of concept, we recorded a database of numeric characters using an ABB robotic arm and an Apple Watch. We compare the classification performance of the trained systems using both human-recorded and robot-recorded data. Our primary objective is to determine the potential for accurate identification of human numeric characters wearing a smartwatch using robotic movement as training data. The findings of this study offer valuable insights into the feasibility of using robot-collected data for training classification systems. This research holds broad implications across various domains that require reliable identification, particularly in scenarios where access to human-specific data is limited.
[247] arXiv:2405.04245 [pdf, ps, html, other]: Title: Exploring Correlations of Self-supervised Tasks for Graphs

Taoran Fang, Wei Zhou, Yifei Sun, Kaiqiao Han, Lvbin Ma, Yang Yang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph self-supervised learning has sparked a research surge in training informative representations without accessing any labeled data. However, our understanding of graph self-supervised learning remains limited, and the inherent relationships between various self-supervised tasks are still unexplored. Our paper aims to provide a fresh understanding of graph self-supervised learning based on task correlations. Specifically, we evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations. Through this process, we unveil the task correlations between various self-supervised tasks and can measure their expressive capabilities, which are closely related to downstream performance. By analyzing the correlation values between tasks across various datasets, we reveal the complexity of task correlations and the limitations of existing multi-task learning methods. To obtain more capable representations, we propose Graph Task Correlation Modeling (GraphTCM) to illustrate the task correlations and utilize it to enhance graph self-supervised training. The experimental results indicate that our method significantly outperforms existing methods across various downstream tasks.
[248] arXiv:2405.04246 [pdf, ps, html, other]: Title: Dataset and Models for Item Recommendation Using Multi-Modal User Interactions

Simone Borg Bruun, Krisztian Balog, Maria Maistro

Subjects: Information Retrieval (cs.IR)

While recommender systems with multi-modal item representations (image, audio, and text), have been widely explored, learning recommendations from multi-modal user interactions (e.g., clicks and speech) remains an open problem. We study the case of multi-modal user interactions in a setting where users engage with a service provider through multiple channels (website and call center). In such cases, incomplete modalities naturally occur, since not all users interact through all the available channels. To address these challenges, we publish a real-world dataset that allows progress in this under-researched area. We further present and benchmark various methods for leveraging multi-modal user interactions for item recommendations, and propose a novel approach that specifically deals with missing modalities by mapping user interactions to a common feature space. Our analysis reveals important interactions between the different modalities and that a frequently occurring modality can enhance learning from a less frequent one.
[249] arXiv:2405.04249 [pdf, ps, other]: Title: Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks

Caelin Kaplan, Tareq Si Salem, Angelo Rodio, Chuan Xu, Giovanni Neglia

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

As Internet of Things (IoT) technology advances, end devices like sensors and smartphones are progressively equipped with AI models tailored to their local memory and computational constraints. Local inference reduces communication costs and latency; however, these smaller models typically underperform compared to more sophisticated models deployed on edge servers or in the cloud. Cooperative Inference Systems (CISs) address this performance trade-off by enabling smaller devices to offload part of their inference tasks to more capable devices. These systems often deploy hierarchical models that share numerous parameters, exemplified by Deep Neural Networks (DNNs) that utilize strategies like early exits or ordered dropout. In such instances, Federated Learning (FL) may be employed to jointly train the models within a CIS. Yet, traditional training methods have overlooked the operational dynamics of CISs during inference, particularly the potential high heterogeneity in serving rates across clients. To address this gap, we propose a novel FL approach designed explicitly for use in CISs that accounts for these variations in serving rates. Our framework not only offers rigorous theoretical guarantees, but also surpasses state-of-the-art (SOTA) training algorithms for CISs, especially in scenarios where inference request rates or data availability are uneven among clients.
[250] arXiv:2405.04250 [pdf, ps, html, other]: Title: Weighted Least-Squares PARSIM

Jiabao He, Cristian R. Rojas, Håkan Hjalmarsson

Subjects: Systems and Control (eess.SY)

Subspace identification methods (SIMs) have proven very powerful for estimating linear state-space models. To overcome the deficiencies of classical SIMs, a significant number of algorithms has appeared over the last two decades, where most of them involve a common intermediate step, that is to estimate the range space of the extended observability matrix. In this contribution, an optimized version of the parallel and parsimonious SIM (PARSIM), PARSIM\textsubscript{opt}, is proposed by using weighted least-squares. It not only inherits all the benefits of PARSIM but also attains the best linear unbiased estimator for the above intermediate step. Furthermore, inspired by SIMs based on the predictor form, consistent estimates of the optimal weighting matrix for weighted least-squares are derived. Essential similarities, differences and simulated comparisons of some key SIMs related to our method are also presented.
[251] arXiv:2405.04251 [pdf, ps, html, other]: Title: A General Model for Detecting Learner Engagement: Implementation and Evaluation

Somayeh Malekshahi, Javad M. Kheyridoost, Omid Fatemi

Comments: 13 pages, 2 Postscript figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.
[252] arXiv:2405.04252 [pdf, ps, other]: Title: VAEneu: A New Avenue for VAE Application on Probabilistic Forecasting

Alireza Koochali, Ensiye Tahaei, Andreas Dengel, Sheraz Ahmed

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper presents VAEneu, an innovative autoregressive method for multistep ahead univariate probabilistic time series forecasting. We employ the conditional VAE framework and optimize the lower bound of the predictive distribution likelihood function by adopting the Continuous Ranked Probability Score (CRPS), a strictly proper scoring rule, as the loss function. This novel pipeline results in forecasting sharp and well-calibrated predictive distribution. Through a comprehensive empirical study, VAEneu is rigorously benchmarked against 12 baseline models across 12 datasets. The results unequivocally demonstrate VAEneu's remarkable forecasting performance. VAEneu provides a valuable tool for quantifying future uncertainties, and our extensive empirical study lays the foundation for future comparative studies for univariate multistep ahead probabilistic forecasting.
[253] arXiv:2405.04257 [pdf, ps, other]: Title: Insights from Basilisk: Are Open-Source EDA Tools Ready for a Multi-Million-Gate, Linux-Booting RV64 SoC Design?

Philippe Sauter, Thomas Benz, Philippe Sauter, Frank K. Gürkaynak, Luca Benini

Comments: 8 pages, 6 figures, submitted at IWLS 2024

Subjects: Hardware Architecture (cs.AR)

Designing complex, multi-million-gate application-specific integrated circuits requires robust and mature electronic design automation (EDA) tools. We describe our efforts in enhancing the open-source Yosys+Openroad EDA flow to implement Basilisk, a fully open-source, Linux-booting RV64GC system-on-chip (SoC) design. We analyze the quality-of-results impact of our enhancements to synthesis tools, interfaces between EDA tools, logic optimization scripts, and a newly open-sourced library of optimized arithmetic macro-operators. We also introduce a streamlined physical design flow with an improved power grid and cell placement integration. Our Basilisk SoC design was taped out in IHP's open 130 nm technology. It achieves an operating frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA flow, while also reducing logic area by 1.6x. Furthermore, tool runtime was reduced by 2.5x, and peak RAM usage decreased by 2.9x. Through collaboration with EDA tool developers and domain experts, Basilisk establishes solid "proof of existence" for a fully open-source EDA flow used in designing a competitive multi-million-gate digital SoC.
[254] arXiv:2405.04258 [pdf, ps, html, other]: Title: A Weighted Least-Squares Method for Non-Asymptotic Identification of Markov Parameters from Multiple Trajectories

Jiabao He, Cristian R. Rojas, Håkan Hjalmarsson

Subjects: Systems and Control (eess.SY)

Markov parameters play a key role in system identification. There exists many algorithms where these parameters are estimated using least-squares in a first, pre-processing, step, including subspace identification and multi-step least-squares algorithms, such as Weighted Null-Space Fitting. Recently, there has been an increasing interest in non-asymptotic analysis of estimation algorithms. In this contribution we identify the Markov parameters using weighted least-squares and present non-asymptotic analysis for such estimator. To cover both stable and unstable systems, multiple trajectories are collected. We show that with the optimal weighting matrix, weighted least-squares gives a tighter error bound than ordinary least-squares for the case of non-uniformly distributed measurement errors. Moreover, as the optimal weighting matrix depends on the system's true parameters, we introduce two methods to consistently estimate the optimal weighting matrix, where the convergence rate of these estimates is also provided. Numerical experiments demonstrate improvements of weighted least-squares over ordinary least-squares in finite sample settings.
[255] arXiv:2405.04260 [pdf, ps, other]: Title: Verified Neural Compressed Sensing

Rudy Bunel, Krishnamurthy (Dj)Dvijotham, M. Pawan Kumar, Alessandro De Palma, Robert Stanforth

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors. We focus on applying neural network verification to computational tasks with a precise notion of correctness, where a verifiably correct neural network provably solves the task at hand with no caveats. In particular, we develop an approach to train and verify the first provably correct neural networks for compressed sensing, i.e., recovering sparse vectors from a number of measurements smaller than the dimension of the vector. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. Furthermore, we show that the complexity of the network (number of neurons/layers) can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
[256] arXiv:2405.04261 [pdf, ps, other]: Title: Graph Reconstruction from Noisy Random Subgraphs

Andrew McGregor, Rik Sengupta

Comments: 6 pages, to appear in ISIT 2024

Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)

We consider the problem of reconstructing an undirected graph $G$ on $n$ vertices given multiple random noisy subgraphs or "traces". Specifically, a trace is generated by sampling each vertex with probability $p_v$, then taking the resulting induced subgraph on the sampled vertices, and then adding noise in the form of either (a) deleting each edge in the subgraph with probability $1-p_e$, or (b) deleting each edge with probability $f_e$ and transforming a non-edge into an edge with probability $f_e$. We show that, under mild assumptions on $p_v$, $p_e$ and $f_e$, if $G$ is selected uniformly at random, then $O(p_e^{-1} p_v^{-2} \log n)$ or $O((f_e-1/2)^{-2} p_v^{-2} \log n)$ traces suffice to reconstruct $G$ with high probability. In contrast, if $G$ is arbitrary, then $\exp(\Omega(n))$ traces are necessary even when $p_v=1, p_e=1/2$.
[257] arXiv:2405.04263 [pdf, ps, other]: Title: Energy-Efficient Deployment of Stateful FaaS Vertical Applications on Edge Data Networks

Claudio Cicconetti, Raffaele Bruno, Andrea Passarella

Comments: Accepted for presentation at IEEE ICCCN 2024

Subjects: Networking and Internet Architecture (cs.NI)

5G and beyond support the deployment of vertical applications, which is particularly appealing in combination with network slicing and edge computing to create a logically isolated environment for executing customer services. Even if serverless computing has gained significant interest as a cloud-native technology its adoption at the edge is lagging, especially because of the need to support stateful tasks, which are commonplace in, e.g., cognitive services, but not fully amenable to being deployed on limited and decentralized computing infrastructures. In this work, we study the emerging paradigm of stateful Function as a Service (FaaS) with lightweight task abstractions in WebAssembly. Specifically, we assess the implications of deploying inter-dependent tasks with an internal state on edge computing resources using a stateless vs. stateful approach and then derive a mathematical model to estimate the energy consumption of a workload with given characteristics, considering the power used for both processing and communication. The model is used in extensive simulations to determine the impact of key factors and assess the energy trade-offs of stateless vs. stateful.
[258] arXiv:2405.04266 [pdf, ps, other]: Title: Self-Stabilizing MIS Computation in the Beeping Model

George Giakkoupis, Volker Turau, Isabella Ziccardi

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We consider self-stabilizing algorithms to compute a Maximal Independent Set (MIS) in the extremely weak beeping communication model. The model consists of an anonymous network with synchronous rounds. In each round, each vertex can optionally transmit a signal to all its neighbors (beep). After the transmission of a signal, each vertex can only differentiate between no signal received, or at least one signal received. We assume that vertices have some knowledge about the topology of the network.
We revisit the not self-stabilizing algorithm proposed by Jeavons, Scott, and Xu (2013), which computes an MIS in the beeping model. We enhance this algorithm to be self-stabilizing, and explore two different variants, which differ in the knowledge about the topology available to the vertices. In the first variant, every vertex knows an upper bound on the maximum degree $\Delta$ of the graph. For this case, we prove that the proposed self-stabilizing version maintains the same run-time as the original algorithm, i.e. it stabilizes after $O(\log n)$ rounds w.h.p. on any $n$-vertex graph. In the second variant, each vertex only knows an upper bound on its own degree. For this case, we prove that the algorithm stabilizes after $O(\log n\cdot \log \log n)$ rounds on any $n$-vertex graph, w.h.p.
[259] arXiv:2405.04271 [pdf, ps, other]: Title: Generating Feature Vectors from Phonetic Transcriptions in Cross-Linguistic Data Formats

Arne Rubehn, Jessica Nieder, Robert Forkel, Johann-Mattis List

Comments: To appear in the Proceedings of the 2024 Meeting of the Society for Computation in Linguistics (SCiL)

Subjects: Computation and Language (cs.CL)

When comparing speech sounds across languages, scholars often make use of feature representations of individual sounds in order to determine fine-grained sound similarities. Although binary feature systems for large numbers of speech sounds have been proposed, large-scale computational applications often face the challenges that the proposed feature systems -- even if they list features for several thousand sounds -- only cover a smaller part of the numerous speech sounds reflected in actual cross-linguistic data. In order to address the problem of missing data for attested speech sounds, we propose a new approach that can create binary feature vectors dynamically for all sounds that can be represented in the the standardized version of the International Phonetic Alphabet proposed by the Cross-Linguistic Transcription Systems (CLTS) reference catalog. Since CLTS is actively used in large data collections, covering more than 2,000 distinct language varieties, our procedure for the generation of binary feature vectors provides immediate access to a very large collection of multilingual wordlists. Testing our feature system in different ways on different datasets proves that the system is not only useful to provide a straightforward means to compare the similarity of speech sounds, but also illustrates its potential to be used in future cross-linguistic machine learning applications.
[260] arXiv:2405.04275 [pdf, ps, other]: Title: Grey-box Recursive Parameter Identification of a Nonlinear Dynamic Model for Mineral Flotation

Rodrigo A. González, Paulina Quintanilla

Subjects: Systems and Control (eess.SY)

This study presents a grey-box recursive identification technique to estimate key parameters in a mineral flotation process across two scenarios. The method is applied to a nonlinear physics-based dynamic model validated at a laboratory scale, allowing real-time updates of two model parameters, n and C, in response to changing conditions. The proposed approach effectively adapts to process variability and allows for continuous adjustments based on operational fluctuations, resulting in a significantly improved estimation of concentrate grade - one key performance indicator. In Scenario 1, parameters n and C achieved fit metrics of 97.99 and 96.86, respectively, with concentrate grade estimations improving from 75.1 to 98.69 using recursive identification. In Scenario 2, the fit metrics for n and C were 96.27 and 95.48, respectively, with the concentrate grade estimations increasing from 96.27 to 99.45 with recursive identification. The results demonstrate the effectiveness of the proposed grey-box recursive identification method in accurately estimating parameters and predicting concentrate grade in a mineral flotation process.
[261] arXiv:2405.04278 [pdf, ps, other]: Title: Uncertainty Quantification Metrics for Deep Regression

Zilian Xiong, Simon Kristoffersson Lind, Per-Erik Forssén, Volker Krüger

Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.
[262] arXiv:2405.04279 [pdf, ps, other]: Title: Task Presentation and Human Perception in Interactive Video Retrieval

Nina Willis, Abraham Bernstein, Luca Rossetto

Subjects: Multimedia (cs.MM)

Interactive video retrieval is a cooperative process between humans and retrieval systems. Large-scale evaluation campaigns, however, often overlook human factors, such as the effects of perception, attention, and memory, when assessing media retrieval systems. Consequently, their setups fall short of emulating realistic retrieval scenarios. In this paper, we design novel task presentation modes based on concepts in media memorability, implement the pipelines necessary for processing target video segments, and build a custom experimental platform for the final evaluation. In order to study the effects of different task representation schemes, we conduct a large crowdsourced experiment. Our findings demonstrate that the way in which the target of a video retrieval task is presented has a substantial influence on the difficulty of the retrieval task and that individuals can successfully retrieve a target video segment despite reducing or even altering the provided hints, opening up a discussion around future evaluation protocols in the domain of interactive media retrieval.
[263] arXiv:2405.04280 [pdf, ps, other]: Title: Modal Folding: Discovering Smooth Folding Patterns for Sheet Materials using Strain-Space Modes

Pengbin Tang, Ronan Hinchet, Roi Poranne, Bernhard Thomaszewski, Stelian Coros

Comments: 9 pages, SIGGRAPH 2024 Conference

Subjects: Graphics (cs.GR)

Folding can transform mundane objects such as napkins into stunning works of art. However, finding new folding transformations for sheet materials is a challenging problem that requires expertise and real-world experimentation. In this paper, we present Modal Folding -- an automated approach for discovering energetically optimal folding transformations, i.e., large deformations that require little mechanical work. For small deformations, minimizing internal energy for fixed displacement magnitudes leads to the well-known elastic eigenmodes. While linear modes provide promising directions for bending, they cannot capture the rotational motion required for folding. To overcome this limitation, we introduce strain-space modes -- nonlinear analogues of elastic eigenmodes that operate on per-element curvatures instead of vertices. Using strain-space modes to determine target curvatures for bending elements, we can generate complex nonlinear folding motions by simply minimizing the sheet's internal energy. Our modal folding approach offers a systematic and automated way to create complex designs. We demonstrate the effectiveness of our method with simulation results for a range of shapes and materials, and validate our designs with physical prototypes.
[264] arXiv:2405.04282 [pdf, ps, html, other]: Title: CoqPyt: Proof Navigation in Python in the Era of LLMs

Pedro Carrott, Nuno Saavedra, Kyle Thompson, Sorin Lerner, João F. Ferreira, Emily First

Comments: Accepted to FSE '24 Demonstrations Track

Subjects: Software Engineering (cs.SE)

Proof assistants enable users to develop machine-checked proofs regarding software-related properties. Unfortunately, the interactive nature of these proof assistants imposes most of the proof burden on the user, making formal verification a complex, and time-consuming endeavor. Recent automation techniques based on neural methods address this issue, but require good programmatic support for collecting data and interacting with proof assistants. This paper presents CoqPyt, a Python tool for interacting with the Coq proof assistant. CoqPyt improves on other Coq-related tools by providing novel features, such as the extraction of rich premise data. We expect our work to aid development of tools and techniques, especially LLM-based, designed for proof synthesis and repair. A video describing and demonstrating CoqPyt is available at: this https URL.
[265] arXiv:2405.04283 [pdf, ps, other]: Title: PDCCH Scheduling via Maximum Independent Set

Lorenzo Maggi, Alvaro Valcarce Rial, Aloïs Herzog, Suresh Kalyanasundaram, Rakshak Agrawal

Subjects: Information Theory (cs.IT)

In 5G, the Physical Downlink Control CHannel (PDCCH) carries crucial information enabling the User Equipment (UE) to connect in UL and DL. UEs are unaware of the frequency location at which PDCCH is encoded, hence they need to perform blind decoding over a limited set of possible candidates. We address the problem faced by the gNodeB of selecting PDCCH candidates for each UE to optimize data transmission. We formulate it as a Maximum Weighted Independent Set (MWIS) problem, that is known to be an NP-hard problem and cannot even be approximated. A solution method called Weight-to-Degree Ratio (WDR) Greedy emerges as a strong contender for practical implementations due to its favorable performance-to-complexity trade-off and theoretical performance guarantees.
[266] arXiv:2405.04285 [pdf, ps, html, other]: Title: On the Foundations of Earth and Climate Foundation Models

Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

Subjects: Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric manner.We further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.
[267] arXiv:2405.04286 [pdf, ps, other]: Title: Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xuebo Liu, Lidia S. Chao, Min Zhang

Subjects: Computation and Language (cs.CL)

The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. This approach entails computing the Grammar Error Correction Score (GECScore) for the given text to distinguish between human-written and LLM-generated text. Extensive experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.7% and showing strong robustness against paraphrase and adversarial perturbation attacks.
[268] arXiv:2405.04287 [pdf, ps, other]: Title: Asymmetry of Frequency Distribution in Power Systems: Sources, Impact and Control

Taulant Kerci, Federico Milano

Subjects: Systems and Control (eess.SY)

This letter analyses the sources of asymmetry of frequency probability distributions (PDs) and their impact on the dynamic behaviour of power systems. The letter also discusses on how secondary control can reduce this asymmetry. We also propose an asymmetry index based on the difference between the left and right-hand side standard deviations of the frequency PDs. The IEEE 9-bus system and real-world data obtained from the Irish transmission system serve to show that losses, saturation's and wind generation lead to asymmetric PDs. A relevant result is that the droop-based frequency support provided by wind generation using a tight deadband of 15 mHz leads to significantly increase the asymmetry of the frequency PDs.
[269] arXiv:2405.04289 [pdf, ps, other]: Title: Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

Chenlin Zhou, Han Zhang, Liutao Yu, Yumin Ye, Zhaokun Zhou, Liwei Huang, Zhengyu Ma, Xiaopeng Fan, Huihui Zhou, Yonghong Tian

Comments: 29 pages

Subjects: Neural and Evolutionary Computing (cs.NE)

Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks (ANNs), in virtue of their high biological plausibility, rich spatial-temporal dynamics, and event-driven computation. The direct training algorithms based on the surrogate gradient method provide sufficient flexibility to design novel SNN architectures and explore the spatial-temporal dynamics of SNNs. According to previous studies, the performance of models is highly dependent on their sizes. Recently, direct training deep SNNs have achieved great progress on both neuromorphic datasets and large-scale static datasets. Notably, transformer-based SNNs show comparable performance with their ANN counterparts. In this paper, we provide a new perspective to summarize the theories and methods for training deep SNNs with high performance in a systematic and comprehensive way, including theory fundamentals, spiking neuron models, advanced SNN models and residual architectures, software frameworks and neuromorphic hardware, applications, and future trends. The reviewed papers are collected at this https URL
[270] arXiv:2405.04290 [pdf, ps, other]: Title: Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map

Yuxuan Xia, Erik Stenborg, Junsheng Fu, Gustaf Hendeby

Comments: 27th International Conference on Information Fusion

Subjects: Robotics (cs.RO); Signal Processing (eess.SP)

High-definition map with accurate lane-level information is crucial for autonomous driving, but the creation of these maps is a resource-intensive process. To this end, we present a cost-effective solution to create lane-level roadmaps using only the global navigation satellite system (GNSS) and a camera on customer vehicles. Our proposed solution utilizes a prior standard-definition (SD) map, GNSS measurements, visual odometry, and lane marking edge detection points, to simultaneously estimate the vehicle's 6D pose, its position within a SD map, and also the 3D geometry of traffic lines. This is achieved using a Bayesian simultaneous localization and multi-object tracking filter, where the estimation of traffic lines is formulated as a multiple extended object tracking problem, solved using a trajectory Poisson multi-Bernoulli mixture (TPMBM) filter. In TPMBM filtering, traffic lines are modeled using B-spline trajectories, and each trajectory is parameterized by a sequence of control points. The proposed solution has been evaluated using experimental data collected by a test vehicle driving on highway. Preliminary results show that the traffic line estimates, overlaid on the satellite image, generally align with the lane markings up to some lateral offsets.
[271] arXiv:2405.04292 [pdf, ps, other]: Title: Mitigating Clickbait: An Approach to Spoiler Generation Using Multitask Learning

Sayantan Pal, Souvik Das, Rohini K. Srihari

Comments: Accepted in ICON 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This study introduces 'clickbait spoiling', a novel technique designed to detect, categorize, and generate spoilers as succinct text responses, countering the curiosity induced by clickbait content. By leveraging a multi-task learning framework, our model's generalization capabilities are significantly enhanced, effectively addressing the pervasive issue of clickbait. The crux of our research lies in generating appropriate spoilers, be it a phrase, an extended passage, or multiple, depending on the spoiler type required. Our methodology integrates two crucial techniques: a refined spoiler categorization method and a modified version of the Question Answering (QA) mechanism, incorporated within a multi-task learning paradigm for optimized spoiler extraction from context. Notably, we have included fine-tuning methods for models capable of handling longer sequences to accommodate the generation of extended spoilers. This research highlights the potential of sophisticated text processing techniques in tackling the omnipresent issue of clickbait, promising an enhanced user experience in the digital realm.
[272] arXiv:2405.04294 [pdf, ps, other]: Title: Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework

Xiangpeng Wan, Haicheng Deng, Kai Zou, Shiqi Xu

Subjects: Artificial Intelligence (cs.AI)

Structured finance, which involves restructuring diverse assets into securities like MBS, ABS, and CDOs, enhances capital market efficiency but presents significant due diligence challenges. This study explores the integration of artificial intelligence (AI) with traditional asset review processes to improve efficiency and accuracy in structured finance. Using both open-sourced and close-sourced large language models (LLMs), we demonstrate that AI can automate the verification of information between loan applications and bank statements effectively. While close-sourced models such as GPT-4 show superior performance, open-sourced models like LLAMA3 offer a cost-effective alternative. Dual-agent systems further increase accuracy, though this comes with higher operational costs. This research highlights AI's potential to minimize manual errors and streamline due diligence, suggesting a broader application of AI in financial document analysis and risk management.
[273] arXiv:2405.04296 [pdf, ps, other]: Title: Open Implementation and Study of BEST-RQ for Speech Processing

Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève

Comments: Accepted in IEEE ICASSP 2024 workshop on Self-supervision in Audio, Speech and Beyond (SASB 2024)

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2.0. Despite BEST-RQ's great performance, details are lacking in the original paper, such as the amount of GPU/TPU hours used in pre-training, and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR and speech translation. In this work, we describe a re-implementation of a Random-projection quantizer and perform a preliminary study with a comparison to wav2vec 2.0 on four downstream tasks. We discuss the details and differences of our implementation. We show that a random projection quantizer can achieve similar downstream performance as wav2vec 2.0 while decreasing training time by over a factor of two.
[274] arXiv:2405.04297 [pdf, ps, other]: Title: Certifying Phase Abstraction

Nils Froleyks, Emily Yu, Armin Biere, Keijo Heljanko

Subjects: Symbolic Computation (cs.SC)

Certification helps to increase trust in formal verification of safety-critical systems which require assurance on their correctness. In hardware model checking, a widely used formal verification technique, phase abstraction is considered one of the most commonly used preprocessing techniques. We present an approach to certify an extended form of phase abstraction using a generic certificate format. As in earlier works our approach involves constructing a witness circuit with an inductive invariant property that certifies the correctness of the entire model checking process, which is then validated by an independent certificate checker. We have implemented and evaluated the proposed approach including certification for various preprocessing configurations on hardware model checking competition benchmarks. As an improvement on previous work in this area, the proposed method is able to efficiently complete certification with an overhead of a fraction of model checking time.
[275] arXiv:2405.04299 [pdf, ps, html, other]: Title: ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to sensor deployment constraints. To address this issue, we propose our learning-first view attention mechanism for effective multi-view feature aggregation. Moreover, we showcase the scalability of our view attention across diverse multi-view 3D tasks, such as map construction and 3D object detection. Leveraging the proposed view attention as well as an additional multi-frame streaming temporal attention, we introduce ViewFormer, a vision-centric transformer-based framework for spatiotemporal feature aggregation. To further explore occupancy-level flow representation, we present FlowOcc3D, a benchmark built on top of existing high-quality datasets. Qualitative and quantitative analyses on this benchmark reveal the potential to represent fine-grained dynamic scenes. Extensive experiments show that our approach significantly outperforms prior state-of-the-art methods. The codes and benchmark will be released soon.
[276] arXiv:2405.04300 [pdf, ps, other]: Title: Behaviour Planning: A Toolkit for Diverse Planning

Mustafa F Abdelwahed, Joan Espasa, Alice Toniolo, Ian P. Gent

Subjects: Artificial Intelligence (cs.AI)

Diverse planning is the problem of generating plans with distinct characteristics. This is valuable for many real-world scenarios, including applications related to plan recognition and business process automation. In this work, we introduce \emph{Behaviour Planning}, a diverse planning toolkit that can characterise and generate diverse plans based on modular diversity models. We present a qualitative framework for describing diversity models, a planning approach for generating plans aligned with any given diversity model, and provide a practical implementation of an SMT-based behaviour planner. We showcase how the qualitative approach offered by Behaviour Planning allows it to overcome various challenges faced by previous approaches. Finally, the experimental evaluation shows the effectiveness of Behaviour Planning in generating diverse plans compared to state-of-the-art approaches.
[277] arXiv:2405.04304 [pdf, ps, other]: Title: Accelerating Speculative Decoding using Dynamic Speculation Length

Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz

Subjects: Computation and Language (cs.CL)

Speculative decoding is a promising method for reducing the inference latency of large language models. The effectiveness of the method depends on the speculation length (SL) - the number of tokens generated by the draft model at each iteration. The vast majority of speculative decoding approaches use the same SL for all iterations. In this work, we show that this practice is suboptimal. We introduce DISCO, a DynamIc SpeCulation length Optimization method that uses a classifier to dynamically adjust the SL at each iteration, while provably preserving the decoding quality. Experiments with four benchmarks demonstrate average speedup gains of 10.3% relative to our best baselines.
[278] arXiv:2405.04305 [pdf, ps, other]: Title: A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields

Raiyan Rahman, Christopher Indris, Goetz Bramesfeld, Tianxiao Zhang, Kaidong Li, Xiangyu Chen, Ivan Grijalva, Brian McCornack, Daniel Flippo, Ajay Sharda, Guanghui Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Aphid infestations are one of the primary causes of extensive damage to wheat and sorghum fields and are one of the most common vectors for plant viruses, resulting in significant agricultural yield losses. To address this problem, farmers often employ the inefficient use of harmful chemical pesticides that have negative health and environmental impacts. As a result, a large amount of pesticide is wasted on areas without significant pest infestation. This brings to attention the urgent need for an intelligent autonomous system that can locate and spray sufficiently large infestations selectively within the complex crop canopies. We have developed a large multi-scale dataset for aphid cluster detection and segmentation, collected from actual sorghum fields and meticulously annotated to include clusters of aphids. Our dataset comprises a total of 54,742 image patches, showcasing a variety of viewpoints, diverse lighting conditions, and multiple scales, highlighting its effectiveness for real-world applications. In this study, we trained and evaluated four real-time semantic segmentation models and three object detection models specifically for aphid cluster segmentation and detection. Considering the balance between accuracy and efficiency, Fast-SCNN delivered the most effective segmentation results, achieving 80.46% mean precision, 81.21% mean recall, and 91.66 frames per second (FPS). For object detection, RT-DETR exhibited the best overall performance with a 61.63% mean average precision (mAP), 92.6% mean recall, and 72.55 on an NVIDIA V100 GPU. Our experiments further indicate that aphid cluster segmentation is more suitable for assessing aphid infestations than using detection models.
[279] arXiv:2405.04307 [pdf, ps, other]: Title: Improving Offline Reinforcement Learning with Inaccurate Simulators

Yiwen Hou, Haoyuan Sun, Jinming Ma, Feng Wu

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
[280] arXiv:2405.04308 [pdf, ps, other]: Title: Quality with Just Enough Diversity in Evolutionary Policy Search

Paul Templier, Luca Grillotti, Emmanuel Rachelson, Dennis G. Wilson, Antoine Cully

Subjects: Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

Evolution Strategies (ES) are effective gradient-free optimization methods that can be competitive with gradient-based approaches for policy search. ES only rely on the total episodic scores of solutions in their population, from which they estimate fitness gradients for their update with no access to true gradient information. However this makes them sensitive to deceptive fitness landscapes, and they tend to only explore one way to solve a problem. Quality-Diversity methods such as MAP-Elites introduced additional information with behavior descriptors (BD) to return a population of diverse solutions, which helps exploration but leads to a large part of the evaluation budget not being focused on finding the best performing solution. Here we show that behavior information can also be leveraged to find the best policy by identifying promising search areas which can then be efficiently explored with ES. We introduce the framework of Quality with Just Enough Diversity (JEDi) which learns the relationship between behavior and fitness to focus evaluations on solutions that matter. When trying to reach higher fitness values, JEDi outperforms both QD and ES methods on hard exploration tasks like mazes and on complex control problems with large policies.
[281] arXiv:2405.04309 [pdf, ps, html, other]: Title: Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling

Jiawei Shi, Hui Deng, Yuchao Dai

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Even though Non-rigid Structure-from-Motion (NRSfM) has been extensively studied and great progress has been made, there are still key challenges that hinder their broad real-world applications: 1) the inherent motion/rotation ambiguity requires either explicit camera motion recovery with extra constraint or complex Procrustean Alignment; 2) existing low-rank modeling of the global shape can over-penalize drastic deformations in the 3D shape sequence. This paper proposes to resolve the above issues from a spatial-temporal modeling perspective. First, we propose a novel Temporally-smooth Procrustean Alignment module that estimates 3D deforming shapes and adjusts the camera motion by aligning the 3D shape sequence consecutively. Our new alignment module remedies the requirement of complex reference 3D shape during alignment, which is more conductive to non-isotropic deformation modeling. Second, we propose a spatial-weighted approach to enforce the low-rank constraint adaptively at different locations to accommodate drastic spatially-variant deformation reconstruction better. Our modeling outperform existing low-rank based methods, and extensive experiments across different datasets validate the effectiveness of our method.
[282] arXiv:2405.04311 [pdf, ps, other]: Title: Cross-IQA: Unsupervised Learning for Image Quality Assessment

Zhen Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.
[283] arXiv:2405.04312 [pdf, ps, other]: Title: Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inference process and handle global dependencies. Building on this module, we adopt the DiT structure for upsampling and develop an infinite super-resolution model capable of upsampling images of various shapes and resolutions. Comprehensive experiments show that our model achieves SOTA performance in generating ultra-high-resolution images in both machine and human evaluation. Compared to commonly used UNet structures, our model can save more than 5x memory when generating 4096*4096 images. The project URL is this https URL.
[284] arXiv:2405.04319 [pdf, ps, html, other]: Title: GLIDS: A Global Latency Information Dissemination System

Cyrill Krähenbühl, Seyedali Tabaeiaghdaei, Simon Scherrer, Matthias Frei, Adrian Perrig

Comments: Extended version of the 7-page conference paper "Toward Global Latency Transparency" published at the IFIP Networking 2024 Conference

Subjects: Networking and Internet Architecture (cs.NI)

A recent advance in networking is the deployment of path-aware multipath network architectures, where network endpoints are given multiple network paths to send their data on. In this work, we tackle the challenge of selecting paths for latency-sensitive applications. Even today's path-aware networks, which are much smaller than the current Internet, already offer dozens and in several cases over a hundred paths to a given destination, making it impractical to measure all path latencies to find the lowest latency path. Furthermore, for short flows, performing latency measurements may not provide benefits as the flow may finish before completing the measurements. To overcome these issues, we argue that endpoints should be provided with a latency estimate before sending any packets, enabling latency-aware path choice for the first packet sent. As we cannot predict the end-to-end latency due to dynamically changing queuing delays, we measure and disseminate the propagation latency, enabling novel use cases and solving concrete problems in current network protocols. We present the Global Latency Information Dissemination System (GLIDS), which is a step toward global latency transparency through the dissemination of propagation latency information.
[285] arXiv:2405.04322 [pdf, ps, other]: Title: Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

Paul Templier, Emmanuel Rachelson, Antoine Cully, Dennis G. Wilson

Subjects: Neural and Evolutionary Computing (cs.NE)

Evolutionary Algorithms (EA) have been successfully used for the optimization of neural networks for policy search, but they still remain sample inefficient and underperforming in some cases compared to gradient-based reinforcement learning (RL). Various methods combine the two approaches, many of them training a RL algorithm on data from EA evaluations and injecting the RL actor into the EA population. However, when using Evolution Strategies (ES) as the EA, the RL actor can drift genetically far from the the ES distribution and injection can cause a collapse of the ES performance. Here, we highlight the phenomenon of genetic drift where the actor genome and the ES population distribution progressively drift apart, leading to injection having a negative impact on the ES. We introduce Genetic Drift Regularization (GDR), a simple regularization method in the actor training loss that prevents the actor genome from drifting away from the ES. We show that GDR can improve ES convergence on problems where RL learns well, but also helps RL training on other tasks, , fixes the injection issues better than previous controlled injection methods.
[286] arXiv:2405.04323 [pdf, ps, html, other]: Title: Beyond human subjectivity and error: a novel AI grading system

Alexandra Gobrecht, Felix Tuma, Moritz Möller, Thomas Zöller, Mark Zakhvatkin, Alexandra Wuttig, Holger Sommerfeldt, Sven Schütt

Subjects: Artificial Intelligence (cs.AI)

The grading of open-ended questions is a high-effort, high-impact task in education. Automating this task promises a significant reduction in workload for education professionals, as well as more consistent grading outcomes for students, by circumventing human subjectivity and error. While recent breakthroughs in AI technology might facilitate such automation, this has not been demonstrated at scale. It this paper, we introduce a novel automatic short answer grading (ASAG) system. The system is based on a fine-tuned open-source transformer model which we trained on large set of exam data from university courses across a large range of disciplines. We evaluated the trained model's performance against held-out test data in a first experiment and found high accuracy levels across a broad spectrum of unseen questions, even in unseen courses. We further compared the performance of our model with that of certified human domain experts in a second experiment: we first assembled another test dataset from real historical exams - the historic grades contained in that data were awarded to students in a regulated, legally binding examination process; we therefore considered them as ground truth for our experiment. We then asked certified human domain experts and our model to grade the historic student answers again without disclosing the historic grades. Finally, we compared the hence obtained grades with the historic grades (our ground truth). We found that for the courses examined, the model deviated less from the official historic grades than the human re-graders - the model's median absolute error was 44 % smaller than the human re-graders', implying that the model is more consistent than humans in grading. These results suggest that leveraging AI enhanced grading can reduce human subjectivity, improve consistency and thus ultimately increase fairness.
[287] arXiv:2405.04324 [pdf, ps, other]: Title: Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Yi Zhou, Chris Johnson, Aanchal Goyal, Hima Patel, Yousaf Shah, Petros Zerfos, Heiko Ludwig, Asim Munawar, Maxwell Crouse, Pavan Kapanipathi, Shweta Salaria, Bob Calio, Sophia Wen, Seetharami Seelam, Brian Belgodere, Carlos Fonseca, Amith Singhee, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.
[288] arXiv:2405.04325 [pdf, ps, html, other]: Title: Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation

Atharvan Dogra, Ameet Deshpande, John Nay, Tanmay Rajpurohit, Ashwin Kalyan, Balaraman Ravindran

Subjects: Computation and Language (cs.CL)

Recent developments in large language models (LLMs), while offering a powerful foundation for developing natural language agents, raise safety concerns about them and the autonomous agents built upon them. Deception is one potential capability of AI agents of particular concern, which we refer to as an act or statement that misleads, hides the truth, or promotes a belief that is not true in its entirety or in part. We move away from the conventional understanding of deception through straight-out lying, making objective selfish decisions, or giving false information, as seen in previous AI safety research. We target a specific category of deception achieved through obfuscation and equivocation. We broadly explain the two types of deception by analogizing them with the rabbit-out-of-hat magic trick, where (i) the rabbit either comes out of a hidden trap door or (ii) (our focus) the audience is completely distracted to see the magician bring out the rabbit right in front of them using sleight of hand or misdirection. Our novel testbed framework displays intrinsic deception capabilities of LLM agents in a goal-driven environment when directed to be deceptive in their natural language generations in a two-agent adversarial dialogue system built upon the legislative task of "lobbying" for a bill. Along the lines of a goal-driven environment, we show developing deceptive capacity through a reinforcement learning setup, building it around the theories of language philosophy and cognitive psychology. We find that the lobbyist agent increases its deceptive capabilities by ~ 40% (relative) through subsequent reinforcement trials of adversarial interactions, and our deception detection mechanism shows a detection capability of up to 92%. Our results highlight potential issues in agent-human interaction, with agents potentially manipulating humans towards its programmed end-goal.
[289] arXiv:2405.04327 [pdf, ps, other]: Title: Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation

Dogucan Yaman, Fevziye Irem Eyiokur, Leonard Bärmann, Seymanur Aktı, Hazım Kemal Ekenel, Alexander Waibel

Comments: CVPR2024 NTIRE Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the task of talking face generation, the objective is to generate a face video with lips synchronized to the corresponding audio while preserving visual details and identity information. Current methods face the challenge of learning accurate lip synchronization while avoiding detrimental effects on visual quality, as well as robustly evaluating such synchronization. To tackle these problems, we propose utilizing an audio-visual speech representation expert (AV-HuBERT) for calculating lip synchronization loss during training. Moreover, leveraging AV-HuBERT's features, we introduce three novel lip synchronization evaluation metrics, aiming to provide a comprehensive assessment of lip synchronization performance. Experimental results, along with a detailed ablation study, demonstrate the effectiveness of our approach and the utility of the proposed evaluation metrics.
[290] arXiv:2405.04330 [pdf, ps, other]: Title: How to reveal the rank of a matrix?

Anil Damle, Silke Glas, Alex Townsend, Annan Yu

Subjects: Numerical Analysis (math.NA)

We study algorithms called rank-revealers that reveal a matrix's rank structure. Such algorithms form a fundamental component in matrix compression, singular value estimation, and column subset selection problems. While column-pivoted QR has been widely adopted due to its practicality, it is not always a rank-revealer. Conversely, Gaussian elimination (GE) with a pivoting strategy known as global maximum volume pivoting is guaranteed to estimate a matrix's singular values but its exponential algorithmic complexity limits its interest to theory. We show that the concept of local maximum volume pivoting is a crucial and practical pivoting strategy for rank-revealers based on GE and QR, showing that it is both necessary and sufficient. This insight elevates Gu and Eisenstat's rank-revealing QR as an archetypal rank-revealer, and devise a version that is less than $2\times$ more computationally expensive than CPQR. We unify the landscape of rank-revealers by considering GE and QR together and prove that the success of any pivoting strategy can be assessed by benchmarking it against a local maximum volume pivot.
[291] arXiv:2405.04332 [pdf, ps, other]: Title: WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

Comments: Just accepted by the Automated Software Engineering Journal

Subjects: Cryptography and Security (cs.CR)

Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive security analysis but also a pressing need for specialized tools that can aid developers in reducing vulnerabilities during the development process. To fill the void, we present a comprehensive security analysis of browser-based wallets in this paper, along with the development of an automated tool designed for this purpose. We first compile a taxonomy of security vulnerabilities resident in cryptocurrency wallets by harvesting historical security reports. Based on this, we design WALLETRADAR, an automated detection framework that can accurately identify security issues based on static and dynamic analysis. Evaluation of 96 popular browser-based wallets shows WALLETRADAR's effectiveness, by successfully automating the detection process in 90% of these wallets with high precision. This evaluation has led to the discovery of 116 security vulnerabilities corresponding to 70 wallets. By the time of this paper, we have received confirmations of 10 vulnerabilities from 8 wallet developers, with over $2,000 bug bounties. Further, we observed that 12 wallet developers have silently fixed 16 vulnerabilities after our disclosure. WALLETRADAR can effectively automate the identification of security risks in cryptocurrency wallets, thereby enhancing software development quality and safety in the blockchain ecosystem.
[292] arXiv:2405.04333 [pdf, ps, other]: Title: A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

Hannah Chafetz, Sampriti Saxena, Stefaan G. Verhulst

Comments: 58 pages

Subjects: Artificial Intelligence (cs.AI)

Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field remain underexplored areas. This white paper seeks to unpack the relationship between open data and generative AI and explore possible components of a new Fourth Wave of Open Data: Is open data becoming AI ready? Is open data moving towards a data commons approach? Is generative AI making open data more conversational? Will generative AI improve open data quality and provenance? Towards this end, we provide a new Spectrum of Scenarios framework. This framework outlines a range of scenarios in which open data and generative AI could intersect and what is required from a data quality and provenance perspective to make open data ready for those specific scenarios. These scenarios include: pertaining, adaptation, inference and insight generation, data augmentation, and open-ended exploration. Through this process, we found that in order for data holders to embrace generative AI to improve open data access and develop greater insights from open data, they first must make progress around five key areas: enhance transparency and documentation, uphold quality and integrity, promote interoperability and standards, improve accessibility and useability, and address ethical considerations.
[293] arXiv:2405.04336 [pdf, ps, html, other]: Title: Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction

Zhihao Wen, Yuan Fang, Pengcheng Wei, Fayao Liu, Zhenghua Chen, Min Wu

Comments: 12 pages

Subjects: Artificial Intelligence (cs.AI)

Predicting Remaining Useful Life (RUL) plays a crucial role in the prognostics and health management of industrial systems that involve a variety of interrelated sensors. Given a constant stream of time series sensory data from such systems, deep learning models have risen to prominence at identifying complex, nonlinear temporal dependencies in these data. In addition to the temporal dependencies of individual sensors, spatial dependencies emerge as important correlations among these sensors, which can be naturally modelled by a temporal graph that describes time-varying spatial relationships. However, the majority of existing studies have relied on capturing discrete snapshots of this temporal graph, a coarse-grained approach that leads to loss of temporal information. Moreover, given the variety of heterogeneous sensors, it becomes vital that such inherent heterogeneity is leveraged for RUL prediction in temporal sensor graphs. To capture the nuances of the temporal and spatial relationships and heterogeneous characteristics in an interconnected graph of sensors, we introduce a novel model named Temporal and Heterogeneous Graph Neural Networks (THGNN). Specifically, THGNN aggregates historical data from neighboring nodes to accurately capture the temporal dynamics and spatial correlations within the stream of sensor data in a fine-grained manner. Moreover, the model leverages Feature-wise Linear Modulation (FiLM) to address the diversity of sensor types, significantly improving the model's capacity to learn the heterogeneity in the data sources. Finally, we have validated the effectiveness of our approach through comprehensive experiments. Our empirical findings demonstrate significant advancements on the N-CMAPSS dataset, achieving improvements of up to 19.2% and 31.6% in terms of two different evaluation metrics over state-of-the-art methods.
[294] arXiv:2405.04342 [pdf, ps, other]: Title: The Curse of Diversity in Ensemble-Based Exploration

Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

Comments: Published as a conference paper at ICLR 2024

Subjects: Machine Learning (cs.LG)

We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches.
[295] arXiv:2405.04344 [pdf, ps, html, other]: Title: Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Chenxi Qiu

Comments: To be published in IJCAI 2024

Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP.
In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.
[296] arXiv:2405.04345 [pdf, ps, other]: Title: Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications

Markus Hillemann, Robert Langendörfer, Max Heiken, Max Mehltretter, Andreas Schenk, Martin Weinmann, Stefan Hinz, Christian Heipke, Markus Ulrich

Comments: 8 pages, 8 figures, accepted for publication in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (ISPRS Archives) 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Neural Radiance Fields (NeRFs) have become a rapidly growing research field with the potential to revolutionize typical photogrammetric workflows, such as those used for 3D scene reconstruction. As input, NeRFs require multi-view images with corresponding camera poses as well as the interior orientation. In the typical NeRF workflow, the camera poses and the interior orientation are estimated in advance with Structure from Motion (SfM). But the quality of the resulting novel views, which depends on different parameters such as the number and distribution of available images, as well as the accuracy of the related camera poses and interior orientation, is difficult to predict. In addition, SfM is a time-consuming pre-processing step, and its quality strongly depends on the image content. Furthermore, the undefined scaling factor of SfM hinders subsequent steps in which metric information is required. In this paper, we evaluate the potential of NeRFs for industrial robot applications. We propose an alternative to SfM pre-processing: we capture the input images with a calibrated camera that is attached to the end effector of an industrial robot and determine accurate camera poses with metric scale based on the robot kinematics. We then investigate the quality of the novel views by comparing them to ground truth, and by computing an internal quality measure based on ensemble methods. For evaluation purposes, we acquire multiple datasets that pose challenges for reconstruction typical of industrial applications, like reflective objects, poor texture, and fine structures. We show that the robot-based pose determination reaches similar accuracy as SfM in non-demanding cases, while having clear advantages in more challenging scenarios. Finally, we present first results of applying the ensemble method to estimate the quality of the synthetic novel view in the absence of a ground truth.
[297] arXiv:2405.04346 [pdf, ps, other]: Title: Revisiting character-level adversarial attacks

Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher

Comments: Accepted in ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our implementation is available in this https URL.
[298] arXiv:2405.04347 [pdf, ps, other]: Title: Development of discontinuous Galerkin methods for hyperbolic systems that preserve a curl or a divergence constraint

Vincent Perrier (CAGIRE, LMAP)

Subjects: Numerical Analysis (math.NA)

Some hyperbolic systems are known to include implicit preservation of differential constraints: these are for example the time conservation of the curl or the divergence of a vector that appear as an implicit constraint. In this article, we show that this kind of constraint can be easily conserved at the discrete level with the classical discontinuous Galerkin method, provided the right approximation space is used for the vectorial space, and under some mild assumption on the numerical flux. For this, we develop a discrete differential geometry framework for some well chosen piece-wise polynomial vector approximation space. More precisely, we define the discrete Hodge star operator, the exterior derivative, and their adjoints. The discrete adjoint divergence and curl are proven to be exactly preserved by the discontinuous Galerkin method under a small assumption on the numerical flux. Numerical tests are performed on the wave system, the two dimensional Maxwell system and the induction equation, and confirm that the differential constraints are preserved at machine precision while keeping the high order of accuracy.
[299] arXiv:2405.04354 [pdf, ps, html, other]: Title: A transversality theorem for semi-algebraic sets with application to signal recovery from the second moment and cryo-EM

Tamir Bendory, Nadav Dym, Dan Edidin, Arun Suresh

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Algebraic Geometry (math.AG)

Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-algebraic sets in orthogonal or unitary representations of groups: with a suitable dimension bound, a generic translate of any semi-algebraic set is transverse to the orbits of the group action. This, in turn, implies that if a signal lies in a low-dimensional semi-algebraic set, then it can be recovered uniquely from measurements that separate orbits.
As an application, we consider the implications of the transversality theorem to the problem of recovering signals that are translated by random group actions from their second moment. As a special case, we discuss cryo-EM: a leading technology to constitute the spatial structure of biological molecules, which serves as our prime motivation. In particular, we derive explicit bounds for recovering a molecular structure from the second moment under a semi-algebraic prior and deduce information-theoretic implications. We also obtain information-theoretic bounds for three additional applications: factoring Gram matrices, multi-reference alignment, and phase retrieval. Finally, we deduce bounds for designing permutation invariant separators in machine learning.
[300] arXiv:2405.04355 [pdf, ps, html, other]: Title: SmmPack: Obfuscation for SMM Modules with TPM Sealed Key

Kazuki Matsuo, Satoshi Tanda, Kuniyasu Suzaki, Yuhei Kawakoya, Tatsuya Mori

Comments: 21 pages, 7 figures, 4 tables. Accepted at the 21st Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2024)

Subjects: Cryptography and Security (cs.CR)

System Management Mode (SMM) is the highest-privileged operating mode of x86 and x86-64 processors. Through SMM exploitation, attackers can tamper with the Unified Extensible Firmware Interface (UEFI) firmware, disabling the security mechanisms implemented by the operating system and hypervisor. Vulnerabilities enabling SMM code execution are often reported as Common Vulnerabilities and Exposures (CVEs); however, no security mechanisms currently exist to prevent attackers from analyzing those vulnerabilities. To increase the cost of vulnerability analysis of SMM modules, we introduced SmmPack. The core concept of SmmPack involves encrypting an SMM module with the key securely stored in a Trusted Platform Module (TPM). We assessed the effectiveness of SmmPack in preventing attackers from obtaining and analyzing SMM modules using various acquisition methods. Our results show that SmmPack significantly increases the cost by narrowing down the means of module acquisition. Furthermore, we demonstrated that SmmPack operates without compromising the performance of the original SMM modules. We also clarified the management and adoption methods of SmmPack, as well as the procedure for applying BIOS updates, and demonstrated that the implementation of SmmPack is realistic.
[301] arXiv:2405.04356 [pdf, ps, other]: Title: Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation

Jihyun Kim, Changjae Oh, Hoseok Do, Soohyun Kim, Kwanghoon Sohn

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simple mapping and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes. With GAN inversion, the estimated latent codes can be used to generate 2D or 3D-aware facial images. We further present a multi-step training strategy that reflects textual and structural representations into the generated image. Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs. We validate our method by using pre-trained 2D and 3D GANs, and our results outperform existing methods. Our project page is available at this https URL.
[302] arXiv:2405.04357 [pdf, ps, html, other]: Title: Global Scale Self-Supervised Channel Charting with Sensor Fusion

Omid Esrafilian, Mohsen Ahadi, Florian Kaltenberger, David Gesbert

Comments: This paper is submitted to the Globecom 2024 conference

Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)

The sensing and positioning capabilities foreseen in 6G have great potential for technology advancements in various domains, such as future smart cities and industrial use cases. Channel charting has emerged as a promising technology in recent years for radio frequency-based sensing and localization. However, the accuracy of these techniques is yet far behind the numbers envisioned in 6G. To reduce this gap, in this paper, we propose a novel channel charting technique capitalizing on the time of arrival measurements from surrounding Transmission Reception Points (TRPs) along with their locations and leveraging sensor fusion in channel charting by incorporating laser scanner data during the training phase of our algorithm. The proposed algorithm remains self-supervised during training and test phases, requiring no geometrical models or user position ground truth. Simulation results validate the achievement of a sub-meter level localization accuracy using our algorithm 90% of the time, outperforming the state-of-the-art channel charting techniques and the traditional triangulation-based approaches.
[303] arXiv:2405.04359 [pdf, ps, other]: Title: A Personalizable Controller for the Walking Assistive omNi-Directional Exo-Robot (WANDER)

A. Fortuna, M. Lorenzini, M. Leonori, JM. Gandarias, P. Balatti, Y. Cho, E. De Momi, A. Ajoudani

Comments: 6 pages, 4 figures, IEEE International Conference on Robotics and Automation (2024)

Subjects: Robotics (cs.RO)

Preserving and encouraging mobility in the elderly and adults with chronic conditions is of paramount importance. However, existing walking aids are either inadequate to provide sufficient support to users' stability or too bulky and poorly maneuverable to be used outside hospital environments. In addition, they all lack adaptability to individual requirements. To address these challenges, this paper introduces WANDER, a novel Walking Assistive omNi-Directional Exo-Robot. It consists of an omnidirectional platform and a robust aluminum structure mounted on top of it, which provides partial body weight support. A comfortable and minimally restrictive coupling interface embedded with a force/torque sensor allows to detect users' intentions, which are translated into command velocities by means of a variable admittance controller. An optimization technique based on users' preferences, i.e., Preference-Based Optimization (PBO) guides the choice of the admittance parameters (i.e., virtual mass and damping) to better fit subject-specific needs and characteristics. Experiments with twelve healthy subjects exhibited a significant decrease in energy consumption and jerk when using WANDER with PBO parameters as well as improved user performance and comfort. The great interpersonal variability in the optimized parameters highlights the importance of personalized control settings when walking with an assistive device, aiming to enhance users' comfort and mobility while ensuring reliable physical support.
[304] arXiv:2405.04363 [pdf, ps, other]: Title: Some Notes on the Sample Complexity of Approximate Channel Simulation

Gergely Flamich, Lennie Wells

Comments: Accepted as a spotlight paper at the first 'Learn to Compress' Workshop@ ISIT 2024

Subjects: Information Theory (cs.IT)

Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime instead. First, we strengthen a result of Agustsson and Theis and show that there is a class of pairs of target distribution $Q$ and coding distribution $P$, for which the runtime of any approximate scheme scales at least super-polynomially in $D_\infty[Q \Vert P]$. We then show, by contrast, that if we have access to an unnormalised Radon-Nikodym derivative $r \propto dQ/dP$ and knowledge of $D_{KL}[Q \Vert P]$, we can exploit global-bound, depth-limited A* coding to ensure $\mathrm{TV}[Q \Vert P] \leq \epsilon$ and maintain optimal coding performance with a sample complexity of only $\exp_2\big((D_{KL}[Q \Vert P] + o(1)) \big/ \epsilon\big)$.
[305] arXiv:2405.04370 [pdf, ps, other]: Title: Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos

Junyi Ma, Jingyi Xu, Xieyuanli Chen, Hesheng Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Understanding how humans would behave during hand-object interaction is vital for applications in service robot manipulation and extended reality. To achieve this, some recent works have been proposed to simultaneously predict hand trajectories and object affordances on human egocentric videos. They are regarded as the representation of future hand-object interactions, indicating potential human motion and motivation. However, the existing approaches mostly adopt the autoregressive paradigm for unidirectional prediction, which lacks mutual constraints within the holistic future sequence, and accumulates errors along the time axis. Meanwhile, these works basically overlook the effect of camera egomotion on first-person view predictions. To address these limitations, we propose a novel diffusion-based interaction prediction method, namely Diff-IP2D, to forecast future hand trajectories and object affordances concurrently in an iterative non-autoregressive manner. We transform the sequential 2D images into latent feature space and design a denoising diffusion model to predict future latent interaction features conditioned on past ones. Motion features are further integrated into the conditional denoising process to enable Diff-IP2D aware of the camera wearer's dynamics for more accurate interaction prediction. The experimental results show that our method significantly outperforms the state-of-the-art baselines on both the off-the-shelf metrics and our proposed new evaluation protocol. This highlights the efficacy of leveraging a generative paradigm for 2D hand-object interaction prediction. The code of Diff-IP2D will be released at this https URL.
[306] arXiv:2405.04371 [pdf, ps, other]: Title: Community Detection for Heterogeneous Multiple Social Networks

Ziqing Zhu, Guan Yuan, Tao Zhou, Jiuxin Cao

Comments: This paper was accepted by IEEE Transactions on Computational Social Systems(TCSS)

Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The community plays a crucial role in understanding user behavior and network characteristics in social networks. Some users can use multiple social networks at once for a variety of objectives. These users are called overlapping users who bridge different social networks. Detecting communities across multiple social networks is vital for interaction mining, information diffusion, and behavior migration analysis among networks. This paper presents a community detection method based on nonnegative matrix tri-factorization for multiple heterogeneous social networks, which formulates a common consensus matrix to represent the global fused community. Specifically, the proposed method involves creating adjacency matrices based on network structure and content similarity, followed by alignment matrices which distinguish overlapping users in different social networks. With the generated alignment matrices, the method could enhance the fusion degree of the global community by detecting overlapping user communities across networks. The effectiveness of the proposed method is evaluated with new metrics on Twitter, Instagram, and Tumblr datasets. The results of the experiments demonstrate its superior performance in terms of community quality and community fusion.
[307] arXiv:2405.04372 [pdf, ps, other]: Title: Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs

Martin Marzidovšek, Janja Francé, Vid Podpečan, Stanka Vadnjal, Jožica Dolenc, Patricija Mozetič

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this study, explainable machine learning techniques are applied to predict the toxicity of mussels in the Gulf of Trieste (Adriatic Sea) caused by harmful algal blooms. By analysing a newly created 28-year dataset containing records of toxic phytoplankton in mussel farming areas and toxin concentrations in mussels (Mytilus galloprovincialis), we train and evaluate the performance of ML models to accurately predict diarrhetic shellfish poisoning (DSP) events. The random forest model provided the best prediction of positive toxicity results based on the F1 score. Explainability methods such as permutation importance and SHAP identified key species (Dinophysis fortii and D. caudata) and environmental factors (salinity, river discharge and precipitation) as the best predictors of DSP outbreaks. These findings are important for improving early warning systems and supporting sustainable aquaculture practices.
[308] arXiv:2405.04373 [pdf, ps, other]: Title: Leveraging LSTM and GAN for Modern Malware Detection

Ishita Gupta, Sneha Kumari, Priya Jha, Mohona Ghosh

Comments: 11 pages

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.
[309] arXiv:2405.04376 [pdf, ps, other]: Title: Towards Stability of Parameter-free Optimization

Yijiang Pang, Shuyang Yu, Bao Hoang, Jiayu Zhou

Subjects: Machine Learning (cs.LG)

Hyperparameter tuning, particularly the selection of an appropriate learning rate in adaptive gradient training methods, remains a challenge. To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, AdamG (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning. The core technique underlying AdamG is our golden step size derived for the AdaGrad-Norm algorithm, which is expected to help AdaGrad-Norm preserve the tuning-free convergence and approximate the optimal step size in expectation w.r.t. various optimization scenarios. To better evaluate tuning-free performance, we propose a novel evaluation criterion, stability, to comprehensively assess the efficacy of parameter-free optimizers in addition to classical performance criteria. Empirical results demonstrate that compared with other parameter-free baselines, AdamG achieves superior performance, which is consistently on par with Adam using a manually tuned learning rate across various optimization tasks.
[310] arXiv:2405.04377 [pdf, ps, other]: Title: Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing

Boqiang Zhang, Hongtao Xie, Zuan Gao, Yuxin Wang

Comments: Accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene text images contain not only style information (font, background) but also content information (character, texture). Different scene text tasks need different information, but previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance. We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability in better addressing various downstream tasks (choose what you really need). Specifically, we synthesize a dataset of image pairs with identical style but different content. Based on the dataset, we decouple the two types of features by the supervision design. Clearly, we directly split the visual representation into style and content features, the content features are supervised by a text recognition loss, while an alignment loss aligns the style features in the image pairs. Then, style features are employed in reconstructing the counterpart image via an image decoder with a prompt that indicates the counterpart's content. Such an operation effectively decouples the features based on their distinctive properties. To the best of our knowledge, this is the first time in the field of scene text that disentangles the inherent properties of the text images. Our method achieves state-of-the-art performance in Scene Text Recognition, Removal, and Editing.
[311] arXiv:2405.04378 [pdf, ps, html, other]: Title: $\textbf{Splat-MOVER}$: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Ola Shorinwa, Johnathan Tucker, Aliyah Smith, Aiden Swann, Timothy Chen, Roya Firoozi, Monroe Kennedy III, Mac Schwager

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) $\textit{ASK-Splat}$, a GSplat representation that distills latent codes for language semantics and grasp affordance into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical for many robotics tasks; (ii) $\textit{SEE-Splat}$, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a "digital twin" of the evolving environment throughout the manipulation task; and (iii) $\textit{Grasp-Splat}$, a grasp generation module that uses ASK-Splat and SEE-Splat to propose candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks, as well as in four multi-stage manipulation tasks using the edited scene to reflect scene changes due to prior manipulation stages, which is not possible with the existing baselines. Code for this project and a link to the project page will be made available soon.
[312] arXiv:2405.04382 [pdf, ps, html, other]: Title: Large Language Models Cannot Explain Themselves

Advait Sarkar

Comments: In Proceedings of the ACM CHI 2024 Workshop on Human-Centered Explainable AI (HCXAI 2024)

Subjects: Human-Computer Interaction (cs.HC)

Large language models can be prompted to produce text. They can also be prompted to produce "explanations" of their output. But these are not really explanations, because they do not accurately reflect the mechanical process underlying the prediction. The illusion that they reflect the reasoning process can result in significant harms. These "explanations" can be valuable, but for promoting critical thinking rather than for understanding the model. I propose a recontextualisation of these "explanations", using the term "exoplanations" to draw attention to their exogenous nature. I discuss some implications for design and technology, such as the inclusion of appropriate guardrails and responses when models are prompted to generate explanations.
[313] arXiv:2405.04386 [pdf, ps, other]: Title: Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs

Antonio Bikić, Sayan Mukherjee

Comments: 16 pages

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Artificial neural networks (ANNs) perform extraordinarily on numerous tasks including classification or prediction, e.g., speech processing and image classification. These new functions are based on a computational model that is enabled to select freely all necessary internal model parameters as long as it eventually delivers the functionality it is supposed to exhibit. Here, we review the connection between the model parameter selection in machine learning (ML) algorithms running on ANNs and the epistemological theory of neopragmatism focusing on the theory's utility and anti-representationalist aspects. To understand the consequences of the model parameter selection of an ANN, we suggest using neopragmatist theories whose implications are well studied. Incidentally, neopragmatism's notion of optimization is also based on utility considerations. This means that applying this approach elegantly reveals the inherent connections between optimization in ML, using a numerical method during the learning phase, and optimization in the ethical theory of consequentialism, where it occurs as a maxim of action. We suggest that these connections originate from the way relevance is calculated in ML systems. This could ultimately reveal a tendency for specific actions in ML systems.
[314] arXiv:2405.04387 [pdf, ps, other]: Title: Parallelized Multi-Agent Bayesian Optimization in Lava

Shay Snyder (1), Derek Gobin (1), Victoria Clerico (1), Sumedh R. Risbud (2), Maryam Parsa (1) ((1) George Mason University, (2) Intel Labs)

Comments: 4 pages, 2 figures, 2 algorithms, 2 tables

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In parallel with the continuously increasing parameter space dimensionality, search and optimization algorithms should support distributed parameter evaluations to reduce cumulative runtime. Intel's neuromorphic optimization library, Lava-Optimization, was introduced as an abstract optimization system compatible with neuromorphic systems developed in the broader Lava software framework. In this work, we introduce Lava Multi-Agent Optimization (LMAO) with native support for distributed parameter evaluations communicating with a central Bayesian optimization system. LMAO provides an abstract framework for deploying distributed optimization and search algorithms within the Lava software framework. Moreover, LMAO introduces support for random and grid search along with process connections across multiple levels of mathematical precision. We evaluate the algorithmic performance of LMAO with a traditional non-convex optimization problem, a fixed-precision transductive spiking graph neural network for citation graph classification, and a neuromorphic satellite scheduling problem. Our results highlight LMAO's efficient scaling to multiple processes, reducing cumulative runtime and minimizing the likelihood of converging to local optima.
[315] arXiv:2405.04390 [pdf, ps, other]: Title: DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai

Comments: Accepted by CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pre-training is essential for extracting a universal representation. However, current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by introducing a world model-based autonomous driving 4D representation learning framework, dubbed \emph{DriveWorld}, which is capable of pre-training from multi-camera driving videos in a spatio-temporal fashion. Specifically, we propose a Memory State-Space Model for spatio-temporal modelling, which consists of a Dynamic Memory Bank module for learning temporal-aware latent dynamics to predict future changes and a Static Scene Propagation module for learning spatial-aware latent statics to offer comprehensive scene contexts. We additionally introduce a Task Prompt to decouple task-aware features for various downstream tasks. The experiments demonstrate that DriveWorld delivers promising results on various autonomous driving tasks. When pre-trained with the OpenScene dataset, DriveWorld achieves a 7.5% increase in mAP for 3D object detection, a 3.0% increase in IoU for online mapping, a 5.0% increase in AMOTA for multi-object tracking, a 0.1m decrease in minADE for motion forecasting, a 3.0% increase in IoU for occupancy prediction, and a 0.34m reduction in average L2 error for planning.
[316] arXiv:2405.04392 [pdf, ps, other]: Title: BILTS: A novel bi-invariant local trajectory-shape descriptor for rigid-body motion

Arno Verduyn, Erwin Aertbeliën, Glenn Maes, Joris De Schutter, Maxim Vochten

Comments: This work has been submitted as a regular research paper for consideration in the IEEE Transactions on Robotics. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)

Measuring the similarity between motions and established motion models is crucial for motion analysis, recognition, generation, and adaptation. To enhance similarity measurement across diverse contexts, invariant motion descriptors have been proposed. However, for rigid-body motion, few invariant descriptors exist that are bi-invariant, meaning invariant to both the body and world reference frames used to describe the motion. Moreover, their robustness to singularities is limited. This paper introduces a novel Bi-Invariant Local Trajectory-Shape descriptor (BILTS) and a corresponding dissimilarity measure. Mathematical relationships between BILTS and existing descriptors are derived, providing new insights into their properties. The paper also includes an algorithm to reproduce the motion from the BILTS descriptor, demonstrating its bidirectionality and usefulness for trajectory generation. Experimental validation using datasets of daily-life activities shows the higher robustness of the BILTS descriptor compared to the bi-invariant ISA descriptor. This higher robustness supports the further application of bi-invariant descriptors for motion recognition and generalization.
[317] arXiv:2405.04395 [pdf, ps, other]: Title: PACIFISTA: Conflict Evaluation and Management in Open RAN

Pietro Brach del Prever, Salvatore D'Oro, Leonardo Bonati, Michele Polese, Maria Tsampazi, Heiko Lehmann, Tommaso Melodia

Comments: 12 pages, 15 figures, 7 tables

Subjects: Networking and Internet Architecture (cs.NI)

The O-RAN ALLIANCE is defining architectures, interfaces, operations, and security requirements for cellular networks based on Open Radio Access Network (RAN) principles. In this context, O-RAN introduced the RAN Intelligent Controllers (RICs) to enable dynamic control of cellular networks via data-driven applications referred to as rApps and xApps. RICs enable for the first time truly intelligent and self-organizing cellular networks. However, enabling the execution of many Artificial Intelligence (AI) algorithms taking autonomous control decisions to fulfill diverse (and possibly conflicting) goals poses unprecedented challenges. For instance, the execution of one xApp aiming at maximizing throughput and one aiming at minimizing energy consumption would inevitably result in diametrically opposed resource allocation strategies. Therefore, conflict management becomes a crucial component of any functional intelligent O-RAN system. This article studies the problem of conflict mitigation in O-RAN and proposes PACIFISTA, a framework to detect, characterize, and mitigate conflicts. PACIFISTA leverages a profiling pipeline to tests O-RAN applications in a sandbox environment, and combines hierarchical graphs with statistical models to detect the existence of conflicts and evaluate their severity. Experiments on Colosseum and OpenRAN Gym demonstrate PACIFISTA's ability to predict conflicts and provide valuable information before potentially conflicting xApps are deployed in production systems. We demonstrate that even O-RAN applications with similar goals can result in 16% throughput loss, and show how applications with conflicting goals might cause severe instability and result in up to 30% performance degradation. We also show that PACIFISTA can help operators to identify coexisting applications and maintain performance degradation below a tolerable threshold.
[318] arXiv:2405.04396 [pdf, ps, other]: Title: Predicting Transonic Flowfields in Non-Homogeneous Unstructured Grids Using Autoencoder Graph Convolutional Networks

Gabriele Immordino, Andrea Vaiuso, Andrea Da Ronch, Marcello Righi

Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

This paper focuses on addressing challenges posed by non-homogeneous unstructured grids, commonly used in Computational Fluid Dynamics (CFD). Their prevalence in CFD scenarios has motivated the exploration of innovative approaches for generating reduced-order models. The core of our approach centers on geometric deep learning, specifically the utilization of graph convolutional network (GCN). The novel Autoencoder GCN architecture enhances prediction accuracy by propagating information to distant nodes and emphasizing influential points. This architecture, with GCN layers and encoding/decoding modules, reduces dimensionality based on pressure-gradient values. The autoencoder structure improves the network capability to identify key features, contributing to a more robust and accurate predictive model. To validate the proposed methodology, we analyzed two different test cases: wing-only model and wing--body configuration. Precise reconstruction of steady-state distributed quantities within a two-dimensional parametric space underscores the reliability and versatility of the implemented approach.
[319] arXiv:2405.04399 [pdf, ps, html, other]: Title: Solving ill-conditioned linear algebraic systems using methods that improve conditioning

A.S. Leonov

Subjects: Numerical Analysis (math.NA)

We consider the solution of systems of linear algebraic equations (SLAEs) with an ill-conditioned or degenerate exact matrix and an approximate right-hand side. An approach to solving such a problem is proposed and justified, which makes it possible to improve the conditionality of the SLAE matrix and, as a result, obtain an approximate solution that is stable to perturbations of the right hand side with higher accuracy than using other methods. The approach is implemented by an algorithm that uses so-called minimal pseudoinverse matrices. The results of numerical experiments are presented that confirm the theoretical provisions of the article.
[320] arXiv:2405.04400 [pdf, ps, html, other]: Title: Decentralized Algorithms for Out-of-System Interference Suppression in Distributed MIMO

Zakir Hussain Shaik, Erik G. Larsson

Comments: 5 pages, 2 figures, to be published in IEEE Wireless Communications Letters

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Out-of-system (OoS) interference is a potential limitation for distributed networks that operate in unlicensed spectrum or in a spectrum sharing scenario. The OoS interference differs from the in-system interference in that OoS signals and their associated channels (or even their statistics) are completely unknown. In this paper, we propose a novel distributed algorithm that can mitigate OoS interference in the uplink and suppress the signal transmission in the OoS direction in the downlink. To estimate the OoS interference, each access point (AP), upon receiving an estimate of OoS interference from a previous AP, computes a better estimate of OoS interference by rotate-and-average using Procrustes method and forwards the estimates to the next AP. This process continues until the central processing unit (CPU) receives the final estimate. Our method has comparable performance to that of a fully centralized interference rejection combining algorithm and has much lower fronthaul load requirements.
[321] arXiv:2405.04402 [pdf, ps, html, other]: Title: Utility-driven Optimization of TTL Cache Hierarchies under Network Delays

Karim S. Elsayed, Fabien Geyer, Amr Rizk

Subjects: Networking and Internet Architecture (cs.NI)

We optimize hierarchies of Time-to-Live (TTL) caches under random network delays. A TTL cache assigns individual eviction timers to cached objects that are usually refreshed upon a hit where upon a miss the object requires a random time to be fetched from a parent cache. Due to their object decoupling property, TTL caches are of particular interest since the optimization of a per-object utility enables service differentiation. However, state-of-the-art exact TTL cache optimization does not extend beyond single TTL caches, especially under network delays. In this paper, we leverage the object decoupling effect to formulate the non-linear utility maximization problem for TTL cache hierarchies in terms of the exact object hit probability under random network delays. We iteratively solve the utility maximization problem to find the optimal per-object TTLs. Further, we show that the exact model suffers from tractability issues for large hierarchies and propose a machine learning approach to estimate the optimal TTL values for large systems. Finally, we provide numerical and data center trace-based evaluations for both methods showing the significant offloading improvement due to TTL optimization considering the network delays.
[322] arXiv:2405.04403 [pdf, ps, other]: Title: Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks

Georgios Pantazopoulos, Amit Parekh, Malvina Nikandrou, Alessandro Suglia

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Augmenting Large Language Models (LLMs) with image-understanding capabilities has resulted in a boom of high-performing Vision-Language models (VLMs). While studying the alignment of LLMs to human values has received widespread attention, the safety of VLMs has not received the same attention. In this paper, we explore the impact of jailbreaking on three state-of-the-art VLMs, each using a distinct modeling approach. By comparing each VLM to their respective LLM backbone, we find that each VLM is more susceptible to jailbreaking. We consider this as an undesirable outcome from visual instruction-tuning, which imposes a forgetting effect on an LLM's safety guardrails. Therefore, we provide recommendations for future work based on evaluation strategies that aim to highlight the weaknesses of a VLM, as well as take safety measures into account during visual instruction tuning.
[323] arXiv:2405.04404 [pdf, ps, html, other]: Title: Vision Mamba: A Comprehensive Survey and Taxonomy

Xiao Liu, Chenxu Zhang, Lei Zhang

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP) and video understanding. By mapping sequence data to state space, long-term dependencies in the data can be better captured. In particular, modern SSMs have shown strong representational capabilities in NLP, especially in long sequence modeling, while maintaining linear time complexity. Notably, based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference. Given its impressive efficiency and strong long-range dependency modeling capability, Mamba is expected to become a new AI architecture that may outperform Transformer. Recently, a number of works have attempted to study the potential of Mamba in various fields, such as general vision, multi-modal, medical image analysis and remote sensing image analysis, by extending Mamba from natural language domain to visual domain. To fully understand Mamba in the visual domain, we conduct a comprehensive survey and present a taxonomy study. This survey focuses on Mamba's application to a variety of visual tasks and data types, and discusses its predecessors, recent advances and far-reaching impact on a wide range of domains. Since Mamba is now on an upward trend, please actively notice us if you have new findings, and new progress on Mamba will be included in this survey in a timely manner and updated on the Mamba project at this https URL.
[324] arXiv:2405.04405 [pdf, ps, other]: Title: Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation

Pei Liu, Luping Ji

Comments: ICML 2024

Subjects: Machine Learning (cs.LG)

Uncertainty estimation (UE), as an effective means of quantifying predictive uncertainty, is crucial for safe and reliable decision-making, especially in high-risk scenarios. Existing UE schemes usually assume that there are completely-labeled samples to support fully-supervised learning. In practice, however, many UE tasks often have no sufficiently-labeled data to use, such as the Multiple Instance Learning (MIL) with only weak instance annotations. To bridge this gap, this paper, for the first time, addresses the weakly-supervised issue of Multi-Instance UE (MIUE) and proposes a new baseline scheme, Multi-Instance Residual Evidential Learning (MIREL). Particularly, at the fine-grained instance UE with only weak supervision, we derive a multi-instance residual operator through the Fundamental Theorem of Symmetric Functions. On this operator derivation, we further propose MIREL to jointly model the high-order predictive distribution at bag and instance levels for MIUE. Extensive experiments empirically demonstrate that our MIREL not only could often make existing MIL networks perform better in MIUE, but also could surpass representative UE methods by large margins, especially in instance-level UE tasks.
[325] arXiv:2405.04406 [pdf, ps, html, other]: Title: R\'enyi divergence guarantees for hashing with linear codes

Madhura Pathegama, Alexander Barg

Subjects: Information Theory (cs.IT)

We consider the problem of distilling uniform random bits from an unknown source with a given $p$-entropy using linear hashing. As our main result, we estimate the expected $p$-divergence from the uniform distribution over the ensemble of random linear codes for all integer $p\ge 2$. The proof relies on analyzing how additive noise, determined by a random element of the code from the ensemble, acts on the source distribution. This action leads to the transformation of the source distribution into an approximately uniform one, a process commonly referred to as distribution smoothing. We also show that hashing with Reed-Muller matrices reaches intrinsic randomness of memoryless Bernoulli sources in the $l_p$ sense for all integer $p\ge 2$.
[326] arXiv:2405.04407 [pdf, ps, html, other]: Title: Super-Exponential Regret for UCT, AlphaGo and Variants

Laurent Orseau, Remi Munos

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.
[327] arXiv:2405.04408 [pdf, ps, other]: Title: DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Document image restoration is a crucial aspect of Document AI systems, as the quality of document images significantly influences the overall performance. Prevailing methods address distinct restoration tasks independently, leading to intricate systems and the incapability to harness the potential synergies of multi-task learning. To overcome this challenge, we propose DocRes, a generalist model that unifies five document image restoration tasks including dewarping, deshadowing, appearance enhancement, deblurring, and binarization. To instruct DocRes to perform various restoration tasks, we propose a novel visual prompt approach called Dynamic Task-Specific Prompt (DTSPrompt). The DTSPrompt for different tasks comprises distinct prior features, which are additional characteristics extracted from the input image. Beyond its role as a cue for task-specific execution, DTSPrompt can also serve as supplementary information to enhance the model's performance. Moreover, DTSPrompt is more flexible than prior visual prompt approaches as it can be seamlessly applied and adapted to inputs with high and variable resolutions. Experimental results demonstrate that DocRes achieves competitive or superior performance compared to existing state-of-the-art task-specific models. This underscores the potential of DocRes across a broader spectrum of document image restoration tasks. The source code is publicly available at this https URL
[328] arXiv:2405.04412 [pdf, ps, html, other]: Title: The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Lena Armstrong, Abbey Liu, Stephen MacNeil, Danaë Metaxa

Subjects: Computers and Society (cs.CY)

Large language models (LLMs) are increasingly being introduced in workplace settings, with the goals of improving efficiency and fairness. However, concerns have arisen regarding these models' potential to reflect or exacerbate social biases and stereotypes. This study explores the potential impact of LLMs on hiring practices. To do so, we conduct an algorithm audit of race and gender biases in one commonly-used LLM, OpenAI's GPT-3.5, taking inspiration from the history of traditional offline resume audits. We conduct two studies using names with varied race and gender connotations: resume assessment (Study 1) and resume generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different names (4 names for each combination of the 2 gender and 4 racial groups) and two anonymous options across 10 occupations and 3 evaluation tasks (overall rating, willingness to interview, and hireability). We find that the model reflects some biases based on stereotypes. In Study 2, we prompt GPT to create resumes (10 for each name) for fictitious job candidates. When generating resumes, GPT reveals underlying biases; women's resumes had occupations with less experience, while Asian and Hispanic resumes had immigrant markers, such as non-native English and non-U.S. education and work experiences. Our findings contribute to a growing body of literature on LLM biases, in particular when used in workplace contexts.
[329] arXiv:2405.04416 [pdf, ps, html, other]: Title: DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid

Sidun Liu, Peng Qiao, Zongxin Ye, Wenyu Li, Yong Dou

Comments: Originally submitted to Siggraph Asia 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and learn each sub-region using an individual NeRF. Such partitioning strategies help volume-based NeRF exceed the single GPU memory limit and scale to larger scenes. However, this approach requires multiple background NeRF to handle out-of-partition rays, which leads to redundancy of learning. Inspired by the fact that the background of current partition is the foreground of adjacent partition, we propose a scalable scene reconstruction method based on joint Multi-resolution Hash Grids, named DistGrid. In this method, the scene is divided into multiple closely-paved yet non-overlapped Axis-Aligned Bounding Boxes, and a novel segmented volume rendering method is proposed to handle cross-boundary rays, thereby eliminating the need for background NeRFs. The experiments demonstrate that our method outperforms existing methods on all evaluated large-scale scenes, and provides visually plausible scene reconstruction. The scalability of our method on reconstruction quality is further evaluated qualitatively and quantitatively.
[330] arXiv:2405.04420 [pdf, ps, html, other]: Title: Fully Automated Selfish Mining Analysis in Efficient Proof Systems Blockchains

Krishnendu Chatterjee, Amirali Ebrahimzadeh, Mehrdad Karrabi, Krzysztof Pietrzak, Michelle Yeo, Đorđe Žikelić

Subjects: Cryptography and Security (cs.CR)

We study selfish mining attacks in longest-chain blockchains like Bitcoin, but where the proof of work is replaced with efficient proof systems -- like proofs of stake or proofs of space -- and consider the problem of computing an optimal selfish mining attack which maximizes expected relative revenue of the adversary, thus minimizing the chain quality. To this end, we propose a novel selfish mining attack that aims to maximize this objective and formally model the attack as a Markov decision process (MDP). We then present a formal analysis procedure which computes an $\epsilon$-tight lower bound on the optimal expected relative revenue in the MDP and a strategy that achieves this $\epsilon$-tight lower bound, where $\epsilon>0$ may be any specified precision. Our analysis is fully automated and provides formal guarantees on the correctness. We evaluate our selfish mining attack and observe that it achieves superior expected relative revenue compared to two considered baselines.
In concurrent work [Sarenche FC'24] does an automated analysis on selfish mining in predictable longest-chain blockchains based on efficient proof systems. Predictable means the randomness for the challenges is fixed for many blocks (as used e.g., in Ouroboros), while we consider unpredictable (Bitcoin-like) chains where the challenge is derived from the previous block.
[331] arXiv:2405.04428 [pdf, ps, html, other]: Title: BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs

Alexis Baudin, Clémence Magnien, Lionel Tabourier

Comments: 21 pages, 4 figures, 3 tables

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.
[332] arXiv:2405.04431 [pdf, ps, other]: Title: Optimizing Information Freshness in IoT Systems with Update Rate Constraints: A Token-Based Approach

Erfan Delfani, Nikolaos Pappas

Comments: IFIP/IEEE Networking 2024

Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In Internet of Things (IoT) status update systems, where information is sampled and subsequently transmitted from a source to a destination node, the imperative necessity lies in maintaining the timeliness of information and updating the system with optimal frequency. Optimizing information freshness in resource-limited status update systems often involves Constrained Markov Decision Process (CMDP) problems with update rate constraints. Solving CMDP problems, especially with multiple constraints, is a challenging task. To address this, we present a token-based approach that transforms CMDP into an unconstrained MDP, simplifying the solution process. We apply this approach to systems with one and two update rate constraints for optimizing Age of Incorrect Information (AoII) and Age of Information (AoI) metrics, respectively, and explore the analytical and numerical aspects. Additionally, we introduce an iterative triangle bisection method for solving the CMDP problems with two constraints, comparing its results with the token-based MDP approach. Our findings show that the token-based approach yields superior performance over baseline policies, converging to the optimal policy as the maximum number of tokens increases.
[333] arXiv:2405.04432 [pdf, ps, other]: Title: Designing the Network Intelligence Stratum for 6G Networks

Paola Soto, Miguel Camelo, Gines Garcia-Aviles, Esteban Municio, Marco Gramaglia, Evangelos Kosmatos, Nina Slamnik-Kriještorac, Danny De Vleeschauwer, Antonio Bazco-Nogueras, Lidia Fuentes, Joaquin Ballesteros, Andra Lutu, Luca Cominardi, Ivan Paez, Sergi Alcalá-Marín, Livia Elena Chatzieleftheriou, Andres Garcia-Saavedra, Marco Fiore

Subjects: Networking and Internet Architecture (cs.NI)

As network complexity escalates, there is an increasing need for more sophisticated methods to manage and operate these networks, focusing on enhancing efficiency, reliability, and security. A wide range of Artificial Intelligence (AI)/Machine Learning (ML) models are being developed in response. These models are pivotal in automating decision-making, conducting predictive analyses, managing networks proactively, enhancing security, and optimizing network performance. They are foundational in shaping the future of networks, collectively forming what is known as Network Intelligence (NI). Prominent Standard-Defining Organizations (SDOs) are integrating NI into future network architectures, particularly emphasizing the closed-loop approach. However, existing methods for seamlessly integrating NI into network architectures are not yet fully effective. This paper introduces an in-depth architectural design for a Network Intelligence Stratum (NI Stratum). This stratum is supported by a novel end-to-end NI orchestrator that supports closed-loop NI operations across various network domains. The primary goal of this design is to streamline the deployment and coordination of NI throughout the entire network infrastructure, tackling issues related to scalability, conflict resolution, and effective data management. We detail exhaustive workflows for managing the NI lifecycle and demonstrate a reference implementation of the NI Stratum, focusing on its compatibility and integration with current network systems and open-source platforms such as Kubernetes and Kubeflow, as well as on its validation on real-world environments. The paper also outlines major challenges and open issues in deploying and managing NI.
[334] arXiv:2405.04433 [pdf, ps, other]: Title: Learning local Dirichlet-to-Neumann maps of nonlinear elliptic PDEs with rough coefficients

Miranda Boutilier, Konstantin Brenner, Larissa Miguez

Comments: 19 pages, 14 figures, submitted to CEMRACS 2023 proceedings

Subjects: Numerical Analysis (math.NA)

Partial differential equations (PDEs) involving high contrast and oscillating coefficients are common in scientific and industrial applications. Numerical approximation of these PDEs is a challenging task that can be addressed, for example, by multi-scale finite element analysis. For linear problems, multi-scale finite element method (MsFEM) is well established and some viable extensions to non-linear PDEs are known. However, some features of the method seem to be intrinsically based on linearity-based. In particular, traditional MsFEM rely on the reuse of computations. For example, the stiffness matrix can be calculated just once, while being used for several right-hand sides, or as part of a multi-level iterative algorithm. Roughly speaking, the offline phase of the method amounts to pre-assembling the local linear Dirichlet-to-Neumann (DtN) operators. We present some preliminary results concerning the combination of MsFEM with machine learning tools. The extension of MsFEM to nonlinear problems is achieved by means of learning local nonlinear DtN maps. The resulting learning-based multi-scale method is tested on a set of model nonlinear PDEs involving the $p-$Laplacian and degenerate nonlinear diffusion.
[335] arXiv:2405.04434 [pdf, ps, other]: Title: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-AI

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. The model checkpoints are available at "this https URL.
[336] arXiv:2405.04435 [pdf, ps, html, other]: Title: Fast Exact Retrieval for Nearest-neighbor Lookup (FERN)

Richard Zhu

Comments: NAACL 2024 SRW

Subjects: Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)

Exact nearest neighbor search is a computationally intensive process, and even its simpler sibling -- vector retrieval -- can be computationally complex. This is exacerbated when retrieving vectors which have high-dimension $d$ relative to the number of vectors, $N$, in the database. Exact nearest neighbor retrieval has been generally acknowledged to be a $O(Nd)$ problem with no sub-linear solutions. Attention has instead shifted towards Approximate Nearest-Neighbor (ANN) retrieval techniques, many of which have sub-linear or even logarithmic time complexities. However, if our intuition from binary search problems (e.g. $d=1$ vector retrieval) carries, there ought to be a way to retrieve an organized representation of vectors without brute-forcing our way to a solution. For low dimension (e.g. $d=2$ or $d=3$ cases), \texttt{kd-trees} provide a $O(d\log N)$ algorithm for retrieval. Unfortunately the algorithm deteriorates rapidly to a $O(dN)$ solution at high dimensions (e.g. $k=128$), in practice. We propose a novel algorithm for logarithmic Fast Exact Retrieval for Nearest-neighbor lookup (FERN), inspired by \texttt{kd-trees}. The algorithm achieves $O(d\log N)$ look-up with 100\% recall on 10 million $d=128$ uniformly randomly generated vectors.\footnote{Code available at this https URL}
[337] arXiv:2405.04437 [pdf, ps, other]: Title: vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Ramya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, Ashish Panwar

Comments: 15 pages, 12 figures, 8 tables

Subjects: Machine Learning (cs.LG); Operating Systems (cs.OS)

Efficient use of GPU memory is essential for high throughput LLM inference. Prior systems reserved memory for the KV-cache ahead-of-time, resulting in wasted capacity due to internal fragmentation. Inspired by OS-based virtual memory systems, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation, enabling high-throughput LLM serving with larger batch sizes. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. This change requires attention kernels to be rewritten to support paging, and serving framework to implement a memory manager. Thus, the PagedAttention model leads to software complexity, portability issues, redundancy and inefficiency.
In this paper, we propose vAttention for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention retains KV-cache in contiguous virtual memory and leverages low-level system support for demand paging, that already exists, to enable on-demand physical memory allocation. Thus, vAttention unburdens the attention kernel developer from having to explicitly support paging and avoids re-implementation of memory management in the serving framework. We show that vAttention enables seamless dynamic memory management for unchanged implementations of various attention kernels. vAttention also generates tokens up to 1.97x faster than vLLM, while processing input prompts up to 3.92x and 1.45x faster than the PagedAttention variants of FlashAttention and FlashInfer.
[338] arXiv:2405.04441 [pdf, ps, other]: Title: Designing, Developing, and Validating Network Intelligence for Scaling in Service-Based Architectures based on Deep Reinforcement Learning

Paola Soto, Miguel Camelo, Danny De Vleeschauwer, Yorick De Bock, Nina Slamnik-Kriještorac, Chia-Yu Chang, Natalia Gaviria, Erik Mannens, Juan F. Botero, Steven Latré

Subjects: Networking and Internet Architecture (cs.NI)

Automating network processes without human intervention is crucial for the complex 6G environment. This requires zero-touch management and orchestration, the integration of Network Intelligence (NI) into the network architecture, and the efficient lifecycle management of intelligent functions. Reinforcement Learning (RL) plays a key role in this context, offering intelligent decision-making capabilities suited to networks' dynamic nature. Despite its potential, integrating RL poses challenges in model development and application. To tackle those issues, we delve into designing, developing, and validating RL algorithms for scaling network functions in service-based network architectures such as Open Radio Access Network (O-RAN). It builds upon and expands previous research on RL lifecycle management by proposing several RL algorithms and Reward Functions (RFns). Our proposed methodology is anchored on a dual approach: firstly, it evaluates the training performance of these algorithms under varying RFns, and secondly, it validates their performance after being trained to discern the practical applicability in real-world settings. We show that, despite significant progress, the development stage of RL techniques for networking applications, particularly in scaling scenarios, still leaves room for significant improvements. This study underscores the importance of ongoing research and development to enhance the practicality and resilience of RL techniques in real-world networking environments.
[339] arXiv:2405.04442 [pdf, ps, html, other]: Title: AugmenTory: A Fast and Flexible Polygon Augmentation Library

Tanaz Ghahremani, Mohammad Hoseyni, Mohammad Javad Ahmadi, Pouria Mehrabi, Amirhossein Nikoofard

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Data augmentation is a key technique for addressing the challenge of limited datasets, which have become a major component in the training procedures of image processing. Techniques such as geometric transformations and color space adjustments have been thoroughly tested for their ability to artificially expand training datasets and generate semi-realistic data for training purposes. Data augmentation is the most important key to addressing the challenge of limited datasets, which have become a major component of image processing training procedures. Data augmentation techniques, such as geometric transformations and color space adjustments, are thoroughly tested for their ability to artificially expand training datasets and generate semi-realistic data for training purposes. Polygons play a crucial role in instance segmentation and have seen a surge in use across advanced models, such as YOLOv8. Despite their growing popularity, the lack of specialized libraries hampers the polygon-augmentation process. This paper introduces a novel solution to this challenge, embodied in the newly developed AugmenTory library. Notably, AugmenTory offers reduced computational demands in both time and space compared to existing methods. Additionally, the library includes a postprocessing thresholding feature. The AugmenTory package is publicly available on GitHub, where interested users can access the source code: this https URL
[340] arXiv:2405.04443 [pdf, ps, html, other]: Title: POV Learning: Individual Alignment of Multimodal Models using Human Perception

Simon Werner, Katharina Christ, Laura Bernardy, Marion G. Müller, Achim Rettinger

Subjects: Artificial Intelligence (cs.AI)

Aligning machine learning systems with human expectations is mostly attempted by training with manually vetted human behavioral samples, typically explicit feedback. This is done on a population level since the context that is capturing the subjective Point-Of-View (POV) of a concrete person in a specific situational context is not retained in the data. However, we argue that alignment on an individual level can boost the subjective predictive performance for the individual user interacting with the system considerably. Since perception differs for each person, the same situation is observed differently. Consequently, the basis for decision making and the subsequent reasoning processes and observable reactions differ. We hypothesize that individual perception patterns can be used for improving the alignment on an individual level. We test this, by integrating perception information into machine learning systems and measuring their predictive performance wrt.~individual subjective assessments. For our empirical study, we collect a novel data set of multimodal stimuli and corresponding eye tracking sequences for the novel task of Perception-Guided Crossmodal Entailment and tackle it with our Perception-Guided Multimodal Transformer. Our findings suggest that exploiting individual perception signals for the machine learning of subjective human assessments provides a valuable cue for individual alignment. It does not only improve the overall predictive performance from the point-of-view of the individual user but might also contribute to steering AI systems towards every person's individual expectations and values.
[341] arXiv:2405.04453 [pdf, ps, html, other]: Title: Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Wenjun Ke, Peng Wang, Ziyu Shang, Jinhua Gao, Guozheng Li, Ke Ji, Yanhe Liu

Comments: Accepted by AAAI 2024

Subjects: Artificial Intelligence (cs.AI)

Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.
[342] arXiv:2405.04457 [pdf, ps, html, other]: Title: Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Rapid progress in text-to-image generative models coupled with their deployment for visual content creation has magnified the importance of thoroughly evaluating their performance and identifying potential biases. In pursuit of models that generate images that are realistic, diverse, visually appealing, and consistent with the given prompt, researchers and practitioners often turn to automated metrics to facilitate scalable and cost-effective performance profiling. However, commonly-used metrics often fail to account for the full diversity of human preference; often even in-depth human evaluations face challenges with subjectivity, especially as interpretations of evaluation criteria vary across regions and cultures. In this work, we conduct a large, cross-cultural study to study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images from state-of-the art public APIs. We collect over 65,000 image annotations and 20 survey responses. We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity. For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative. In addition, the utility of automatic evaluations is dependent on assumptions about their set-up, such as the alignment of feature extractors with human perception of object similarity or the definition of "appeal" captured in reference datasets used to ground evaluations. We recommend steps for improved automatic and human evaluations.
[343] arXiv:2405.04459 [pdf, ps, other]: Title: A Significantly Better Class of Activation Functions Than ReLU Like Activation Functions

Mathew Mithra Noel, Yug Oswal

Comments: 14 pages

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

This paper introduces a significantly better class of activation functions than the almost universally used ReLU like and Sigmoidal class of activation functions. Two new activation functions referred to as the Cone and Parabolic-Cone that differ drastically from popular activation functions and significantly outperform these on the CIFAR-10 and Imagenette benchmmarks are proposed. The cone activation functions are positive only on a finite interval and are strictly negative except at the end-points of the interval, where they become zero. Thus the set of inputs that produce a positive output for a neuron with cone activation functions is a hyperstrip and not a half-space as is the usual case. Since a hyper strip is the region between two parallel hyper-planes, it allows neurons to more finely divide the input feature space into positive and negative classes than with infinitely wide half-spaces. In particular the XOR function can be learn by a single neuron with cone-like activation functions. Both the cone and parabolic-cone activation functions are shown to achieve higher accuracies with significantly fewer neurons on benchmarks. The results presented in this paper indicate that many nonlinear real-world datasets may be separated with fewer hyperstrips than half-spaces. The Cone and Parabolic-Cone activation functions have larger derivatives than ReLU and are shown to significantly speedup training.
[344] arXiv:2405.04463 [pdf, ps, other]: Title: Large-Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users

Remco Bloemen, Daniel Kales, Philipp Sippl, Roman Walch

Subjects: Cryptography and Security (cs.CR)

In this work we tackle privacy concerns in biometric verification systems that typically require server-side processing of sensitive data (e.g., fingerprints and Iris Codes). Concretely, we design a solution that allows us to query whether a given Iris Code is similar to one contained in a given database, while all queries and datasets are being protected using secure multiparty computation (MPC). Addressing the substantial performance demands of operational systems like World ID and aid distributions by the Red Cross, we propose new protocols to improve performance by more than three orders of magnitude compared to the recent state-of-the-art system Janus (S&P 24). Our final protocol can achieve a throughput of over a million Iris Code comparisons per second on a single CPU core, while protecting the privacy of both the query and database Iris Codes. We additionally investigate GPU acceleration for some building blocks of our protocol, which results in further speedups of over 38x compared to the respective multi-threaded CPU implementation.
[345] arXiv:2405.04467 [pdf, ps, html, other]: Title: Online List Labeling with Near-Logarithmic Writes

Martin P. Seybold

Comments: 12 pages, 1 figure. Improved version of a rejected draft

Subjects: Data Structures and Algorithms (cs.DS)

In the Online List Labeling problem, a set of $n \leq N$ elements from a totally ordered universe must be stored in sorted order in an array with $m=N+\lceil\varepsilon N \rceil$ slots, where $\varepsilon \in (0,1]$ is constant, while an adversary chooses elements that must be inserted and deleted from the set.
We devise a skip-list based algorithm for maintaining order against an oblivious adversary and show that the expected amortized number of writes is $O(\varepsilon^{-1}\log (n) \operatorname{poly}(\log \log n))$ per update.
[346] arXiv:2405.04471 [pdf, ps, other]: Title: Universal Spatial Audio Transcoder

Amaia Sagasti, Davide Scaini, Daniel Arteaga

Comments: 12 pages, 8 figures. Accepted for presentation at the AES 156th Convention, Madrid, Spain (June 2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper addresses the challenges associated with both the conversion between different spatial audio formats and the decoding of a spatial audio format to a specific loudspeaker layout. Existing approaches often rely on layout remapping tools, which may not guarantee optimal conversion from a psychoacoustic perspective. To overcome these challenges, we present the Universal Spatial Audio Transcoder(USAT) method and its corresponding open source implementation. USAT generates an optimal decoder or transcoder for any input spatial audio format, adapting it to any output format or 2D/3D loudspeaker configuration. Drawing upon optimization techniques based on psychoacoustic principles, the algorithm maximizes the preservation of spatial information. We present examples of the decoding and transcoding of several audio formats, and show that USAT approach is advantageous compared to the most common methods in the field.
[347] arXiv:2405.04477 [pdf, ps, html, other]: Title: Designing an Objective-Driven Test Method for the Comparative Performance Evaluation of Commercial DTI Solutions for Counter UAS systems

Ali Mohamoud, Johan van de Pol, Hanno Hildmann, Rob van Heijster, Beatrice Masini, Martijn van den Heuvel, Amber van Keeken

Comments: Published to make content available online, though this is still work in progress and subject to additions and change

Subjects: Software Engineering (cs.SE)

Unmanned Aerial Systems (UASs) or drones become more and more commercially available and cheap. There has been much emphasis on developing and deploying Counter-UAS systems (UASs) with Detection Tracking and Identification (DTI) solutions. However, the capabilities of these systems are hard to benchmark. Performance claims of these systems are currently not supported by evidence. In addition, no standard test methodologies are available for these DTI systems and different test methodologies make comparison of these systems hard or impossible. We report on the definition, development and verification of an objective-driven test method and corresponding comparative performance evaluation for commercial DTI solutions for C-UASs. The developed methodology is based on end-user scenarios that are operationally relevant. The test methodology is based on a generic DTI system lay-out and is detailed towards detection, tracking and identification, taking into account contextual information and end-user input. The comparative performance evaluation is developed to enable the use of the methodology in a relevant environment, thereby taking into account any potential environmental aspect that might influence DTI system performance. Validation of the work in a relevant environment has been done in three operational trials. The operational trial results show that the method allows for performance evaluation at component level (i.e., detection, tracking or identification component) and at system level (combinations of these components and integrated DTI system of system solutions).
[348] arXiv:2405.04478 [pdf, ps, other]: Title: Exploration of Novel Neuromorphic Methodologies for Materials Applications

Derek Gobin (1), Shay Snyder (1), Guojing Cong (2), Shruti R. Kulkarni (2), Catherine Schuman (3), Maryam Parsa (1) ((1) George Mason University, (2) Oak Ridge National Laboratory, (3) University of Tennessee - Knoxville)

Comments: 5 pages, 2 figures, 1 table

Subjects: Emerging Technologies (cs.ET)

Many of today's most interesting questions involve understanding and interpreting complex relationships within graph-based structures. For instance, in materials science, predicting material properties often relies on analyzing the intricate network of atomic interactions. Graph neural networks (GNNs) have emerged as a popular approach for these tasks; however, they suffer from limitations such as inefficient hardware utilization and over-smoothing. Recent advancements in neuromorphic computing offer promising solutions to these challenges. In this work, we evaluate two such neuromorphic strategies known as reservoir computing and hyperdimensional computing. We compare the performance of both approaches for bandgap classification and regression using a subset of the Materials Project dataset. Our results indicate recent advances in hyperdimensional computing can be applied effectively to better represent molecular graphs.
[349] arXiv:2405.04480 [pdf, ps, other]: Title: Concentration Tail-Bound Analysis of Coevolutionary and Bandit Learning Algorithms

Per Kristian Lehre, Shishen Lin

Comments: Accepted at International Joint Conference on Artificial Intelligence (IJCAI) 2024

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Runtime analysis, as a branch of the theory of AI, studies how the number of iterations algorithms take before finding a solution (its runtime) depends on the design of the algorithm and the problem structure. Drift analysis is a state-of-the-art tool for estimating the runtime of randomised algorithms, such as evolutionary and bandit algorithms. Drift refers roughly to the expected progress towards the optimum per iteration. This paper considers the problem of deriving concentration tail-bounds on the runtime/regret of algorithms. It provides a novel drift theorem that gives precise exponential tail-bounds given positive, weak, zero and even negative drift. Previously, such exponential tail bounds were missing in the case of weak, zero, or negative drift. Our drift theorem can be used to prove a strong concentration of the runtime/regret of algorithms in AI. For example, we prove that the regret of the \rwab bandit algorithm is highly concentrated, while previous analyses only considered the expected regret. This means that the algorithm obtains the optimum within a given time frame with high probability, i.e. a form of algorithm reliability. Moreover, our theorem implies that the time needed by the co-evolutionary algorithm RLS-PD to obtain a Nash equilibrium in a \bilinear max-min-benchmark problem is highly concentrated. However, we also prove that the algorithm forgets the Nash equilibrium, and the time until this occurs is highly concentrated. This highlights a weakness in the RLS-PD which should be addressed by future work.
[350] arXiv:2405.04484 [pdf, ps, html, other]: Title: OptPDE: Discovering Novel Integrable Systems via AI-Human Collaboration

Subhash Kantamneni, Ziming Liu, Max Tegmark

Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

Integrable partial differential equation (PDE) systems are of great interest in natural science, but are exceedingly rare and difficult to discover. To solve this, we introduce OptPDE, a first-of-its-kind machine learning approach that Optimizes PDEs' coefficients to maximize their number of conserved quantities, $n_{\rm CQ}$, and thus discover new integrable systems. We discover four families of integrable PDEs, one of which was previously known, and three of which have at least one conserved quantity but are new to the literature to the best of our knowledge. We investigate more deeply the properties of one of these novel PDE families, $u_t = (u_x+a^2u_{xxx})^3$. Our paper offers a promising schema of AI-human collaboration for integrable system discovery: machine learning generates interpretable hypotheses for possible integrable systems, which human scientists can verify and analyze, to truly close the discovery loop.
[351] arXiv:2405.04485 [pdf, ps, other]: Title: Adapting WavLM for Speech Emotion Recognition

Daria Diatlova, Anton Udalov, Vitalii Shutov, Egor Spirin

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Recently, the usage of speech self-supervised models (SSL) for downstream tasks has been drawing a lot of attention. While large pre-trained models commonly outperform smaller models trained from scratch, questions regarding the optimal fine-tuning strategies remain prevalent. In this paper, we explore the fine-tuning strategies of the WavLM Large model for the speech emotion recognition task on the MSP Podcast Corpus. More specifically, we perform a series of experiments focusing on using gender and semantic information from utterances. We then sum up our findings and describe the final model we used for submission to Speech Emotion Recognition Challenge 2024.
[352] arXiv:2405.04489 [pdf, ps, other]: Title: S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling

Minh Tran, Adrian De Luis, Haitao Liao, Ying Huang, Roy McCann, Alan Mantooth, Jack Cothren, Ngan Le

Comments: Preprint

Subjects: Computer Vision and Pattern Recognition (cs.CV)

As the impact of climate change escalates, the global necessity to transition to sustainable energy sources becomes increasingly evident. Renewable energies have emerged as a viable solution for users, with Photovoltaic energy being a favored choice for small installations due to its reliability and efficiency. Accurate mapping of PV installations is crucial for understanding the extension of its adoption and informing energy policy. To meet this need, we introduce S3Former, designed to segment solar panels from aerial imagery and provide size and location information critical for analyzing the impact of such installations on the grid. Solar panel identification is challenging due to factors such as varying weather conditions, roof characteristics, Ground Sampling Distance variations and lack of appropriate initialization weights for optimized training. To tackle these complexities, S3Former features a Masked Attention Mask Transformer incorporating a self-supervised learning pretrained backbone. Specifically, our model leverages low-level and high-level features extracted from the backbone and incorporates an instance query mechanism incorporated on the Transformer architecture to enhance the localization of solar PV installations. We introduce a self-supervised learning phase (pretext task) to improve the initialization weights on the backbone of S3Former. We evaluated S3Former using diverse datasets, demonstrate improvement state-of-the-art models.
[353] arXiv:2405.04490 [pdf, ps, other]: Title: Resource-Efficient and Self-Adaptive Quantum Search in a Quantum-Classical Hybrid System

Zihao Jiang, Zefan Du, Shaolun Ruan, Juntao Chen, Yong Wang, Long Cheng, Rajkumar Buyya, Ying Mao

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Quantum Physics (quant-ph)

Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to break limits. Major companies like IBM, Google, and Microsoft provide access to noisy intermediate-scale quantum (NISQ) computers. Despite the theoretical promise of Shor's and Grover's algorithms, practical implementation on current quantum devices faces challenges, such as demanding additional resources and a high number of controlled operations. To tackle these challenges and optimize the utilization of limited onboard qubits, we introduce ReSaQuS, a resource-efficient index-value searching system within a quantum-classical hybrid framework. Building on Grover's algorithm, ReSaQuS employs an automatically managed iterative search approach. This method analyzes problem size, filters fewer probable data points, and progressively reduces the dataset with decreasing qubit requirements. Implemented using Qiskit and evaluated through extensive experiments, ReSaQuS has demonstrated a substantial reduction, up to 86.36\% in cumulative qubit consumption and 72.72\% in active periods, reinforcing its potential in optimizing quantum computing application deployment.
[354] arXiv:2405.04491 [pdf, ps, html, other]: Title: TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters

Jonathan Wilder Lavington, Ke Zhang, Vasileios Lioutas, Matthew Niedoba, Yunpeng Liu, Dylan Green, Saeid Naderiparizi, Xiaoxuan Liang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)

The training, testing, and deployment, of autonomous vehicles requires realistic and efficient simulators. Moreover, because of the high variability between different problems presented in different autonomous systems, these simulators need to be easy to use, and easy to modify. To address these problems we introduce TorchDriveSim and its benchmark extension TorchDriveEnv. TorchDriveEnv is a lightweight reinforcement learning benchmark programmed entirely in Python, which can be modified to test a number of different factors in learned vehicle behavior, including the effect of varying kinematic models, agent types, and traffic control patterns. Most importantly unlike many replay based simulation approaches, TorchDriveEnv is fully integrated with a state of the art behavioral simulation API. This allows users to train and evaluate driving models alongside data driven Non-Playable Characters (NPC) whose initializations and driving behavior are reactive, realistic, and diverse. We illustrate the efficiency and simplicity of TorchDriveEnv by evaluating common reinforcement learning baselines in both training and validation environments. Our experiments show that TorchDriveEnv is easy to use, but difficult to solve.
[355] arXiv:2405.04494 [pdf, ps, html, other]: Title: Representation Learning of Daily Movement Data Using Text Encoders

Alexander Capstick, Tianyu Cui, Yu Chen, Payam Barnaghi

Comments: Accepted at ICLR 2024 Workshop on Learning from Time Series For Health: this https URL

Journal-ref: International Conference on Learning Representations 2024 Workshop on Learning from Time Series For Health

Subjects: Machine Learning (cs.LG)

Time-series representation learning is a key area of research for remote healthcare monitoring applications. In this work, we focus on a dataset of recordings of in-home activity from people living with Dementia. We design a representation learning method based on converting activity to text strings that can be encoded using a language model fine-tuned to transform data from the same participants within a $30$-day window to similar embeddings in the vector space. This allows for clustering and vector searching over participants and days, and the identification of activity deviations to aid with personalised delivery of care.
[356] arXiv:2405.04495 [pdf, ps, html, other]: Title: Toward In-Context Teaching: Adapting Examples to Students' Misconceptions

Alexis Ross, Jacob Andreas

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

When a teacher provides examples for a student to study, these examples must be informative, enabling a student to progress from their current state toward a target concept or skill. Good teachers must therefore simultaneously infer what students already know and adapt their teaching to students' changing state of knowledge. There is increasing interest in using computational models, particularly large language models, as pedagogical tools. As students, language models in particular have shown a remarkable ability to adapt to new tasks given small numbers of examples. But how effectively can these models adapt as teachers to students of different types? To study this question, we introduce a suite of models and evaluation methods we call AdapT. AdapT has two components: (1) a collection of simulated Bayesian student models that can be used for evaluation of automated teaching methods; (2) a platform for evaluation with human students, to characterize the real-world effectiveness of these methods. We additionally introduce (3) AToM, a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimizes for the correctness of future beliefs. In evaluations of simulated students across three learning domains (fraction arithmetic, English morphology, function learning), AToM systematically outperforms LLM-based and standard Bayesian teaching models. In human experiments, both AToM and LLMs outperform non-adaptive random example selection. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
[357] arXiv:2405.04496 [pdf, ps, html, other]: Title: Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Shuyuan Yang, Yuwei Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing diffusion-based video editing methods have achieved impressive results in motion editing. Most of the existing methods focus on the motion alignment between the edited video and the reference video. However, these methods do not constrain the background and object content of the video to remain unchanged, which makes it possible for users to generate unexpected videos. In this paper, we propose a one-shot video motion editing method called Edit-Your-Motion that requires only a single text-video pair for training. Specifically, we design the Detailed Prompt-Guided Learning Strategy (DPL) to decouple spatio-temporal features in space-time diffusion models. DPL separates learning object content and motion into two training stages. In the first training stage, we focus on learning the spatial features (the features of object content) and breaking down the temporal relationships in the video frames by shuffling them. We further propose Recurrent-Causal Attention (RC-Attn) to learn the consistent content features of the object from unordered video frames. In the second training stage, we restore the temporal relationship in video frames to learn the temporal feature (the features of the background and object's motion). We also adopt the Noise Constraint Loss to smooth out inter-frame differences. Finally, in the inference stage, we inject the content features of the source object into the editing branch through a two-branch structure (editing branch and reconstruction branch). With Edit-Your-Motion, users can edit the motion of objects in the source video to generate more exciting and diverse videos. Comprehensive qualitative experiments, quantitative experiments and user preference studies demonstrate that Edit-Your-Motion performs better than other methods.
[358] arXiv:2405.04497 [pdf, ps, other]: Title: Unveiling Disparities in Web Task Handling Between Human and Web Agent

Kihoon Son, Jinhyeon Kwon, DeEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim

Subjects: Human-Computer Interaction (cs.HC)

With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizability of these agents. This study investigates the disparities between human and web agents' performance in web tasks (e.g., information search) by concentrating on planning, action, and reflection aspects during task execution. We conducted a web task study with a think-aloud protocol, revealing distinct cognitive actions and operations on websites employed by humans. Comparative examination of existing agent structures and human behavior with thought processes highlighted differences in knowledge updating and ambiguity handling when performing the task. Humans demonstrated a propensity for exploring and modifying plans based on additional information and investigating reasons for failure. These findings offer insights into designing planning, reflection, and information discovery modules for web agents and designing the capturing method for implicit human knowledge in a web task.
[359] arXiv:2405.04498 [pdf, ps, other]: Title: Generative Planning with Fast Collision Checks for High Speed Navigation

Craig Knuth, Cora Dimmig, Brian Bittner

Comments: Presented at Agile Robotics ICRA 2024 workshop

Subjects: Robotics (cs.RO)

Reasoning about large numbers of diverse plans to achieve high speed navigation in cluttered environments remains a challenge for robotic systems even in the case of perfect perceptual information. Often, this is tackled by methods that iteratively optimize around a prior seeded trajectory and consequently restrict to local optima. We present a novel planning method using normalizing flows (NFs) to encode expert-styled motion primitives. We also present an accelerated collision checking framework that enables rejecting samples from the prior distribution before running them through the NF model for rapid sampling of collision-free trajectories. The choice of an NF as the generator permits a flexible way to encode diverse multi-modal behavior distributions while maintaining a smooth relation to the input space which allows approximating collision checks on NF inputs rather than outputs. We show comparable performance to model predictive path integral control in random cluttered environments and improved exit rates in a cul-de-sac environment. We conclude by discussing our plans for future work to improve both safety and performance of our controller.
[360] arXiv:2405.04503 [pdf, ps, html, other]: Title: Physics-data hybrid dynamic model of a multi-axis manipulator for sensorless dexterous manipulation and high-performance motion planning

Wu-Te Yang, Jyun-Ming Liao, Pei-Chun Lin

Comments: 26 pages, 16 figures

Subjects: Robotics (cs.RO)

We report on the development of an implementable physics-data hybrid dynamic model for an articulated manipulator to plan and operate in various scenarios. Meanwhile, the physics-based and data-driven dynamic models are studied in this research to select the best model for planning. The physics-based model is constructed using the Lagrangian method, and the loss terms include inertia loss, viscous loss, and friction loss. As for the data-driven model, three methods are explored, including DNN, LSTM, and XGBoost. Our modeling results demonstrate that, after comprehensive hyperparameter optimization, the XGBoost architecture outperforms DNN and LSTM in accurately representing manipulator dynamics. The hybrid model with physics-based and data-driven terms has the best performance among all models based on the RMSE criteria, and it only needs about 24k of training data. In addition, we developed a virtual force sensor of a manipulator using the observed external torque derived from the dynamic model and designed a motion planner through the physics-data hybrid dynamic model. The external torque contributes to forces and torque on the end effector, facilitating interaction with the surroundings, while the internal torque governs manipulator motion dynamics and compensates for internal losses. By estimating external torque via the difference between measured joint torque and internal losses, we implement a sensorless control strategy which is demonstrated through a peg-in-hole task. Lastly, a learning-based motion planner based on the hybrid dynamic model assists in planning time-efficient trajectories for the manipulator. This comprehensive approach underscores the efficacy of integrating physics-based and data-driven models for advanced manipulator control and planning in industrial environments.
[361] arXiv:2405.04513 [pdf, ps, html, other]: Title: Switchable Decision: Dynamic Neural Generation Networks

Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

Comments: Accepted to ICML 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance. Automatically making decisions on where to skip and how to balance quality and computation cost with constrained optimization, our dynamic neural generation networks enforce the efficient inference path and determine the optimized trade-off. Experiments across question answering, summarization, and classification benchmarks show that our method benefits from less computation cost during inference while keeping the same accuracy. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
[362] arXiv:2405.04515 [pdf, ps, other]: Title: A Transformer with Stack Attention

Jiaoda Li, Jennifer C. White, Mrinmaya Sachan, Ryan Cotterell

Comments: NAACL 2024

Subjects: Computation and Language (cs.CL)

Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.
[363] arXiv:2405.04517 [pdf, ps, other]: Title: xLSTM: Extended Long Short-Term Memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.
[364] arXiv:2405.04519 [pdf, ps, html, other]: Title: Local Advice and Local Decompression

Alkida Balliu, Sebastian Brandt, Fabian Kuhn, Krzysztof Nowicki, Dennis Olivetti, Eva Rotenberg, Jukka Suomela

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Algorithms with advice have received ample attention in the distributed and online settings, and they have recently proven useful also in dynamic settings. In this work we study local computation with advice: the goal is to solve a graph problem $\Pi$ with a distributed algorithm in $f(\Delta)$ communication rounds, for some function $f$ that only depends on the maximum degree $\Delta$ of the graph, and the key question is how many bits of advice per node are needed. Our main results are:
- Any locally checkable labeling problem can be solved in graphs with sub-exponential growth with only $1$ bit of advice per node. Moreover, we can make the set of nodes that carry advice bits arbitrarily sparse, that is, we can make arbitrarily small the ratio between nodes carrying a 1 and the nodes carrying a 0. - The assumption of sub-exponential growth is necessary: assuming the Exponential-Time Hypothesis, there are LCLs that cannot be solved in general with any constant number of bits per node. - In any graph we can find an almost-balanced orientation (indegrees and outdegrees differ by at most one) with $1$ bit of advice per node, and again we can make the advice arbitrarily sparse. - As a corollary, we can also compress an arbitrary subset of edges so that a node of degree $d$ stores only $d/2 + 2$ bits, and we can decompress it locally, in $f(\Delta)$ rounds. - In any graph of maximum degree $\Delta$, we can find a $\Delta$-coloring (if it exists) with $1$ bit of advice per node, and again, we can make the advice arbitrarily sparse. - In any $3$-colorable graph, we can find a $3$-coloring with $1$ bit of advice per node. Here, it remains open whether we can make the advice arbitrarily sparse.
Our work shows that for many problems the key threshold is not whether we can achieve, say, $1$ bit of advice per node, but whether we can make the advice arbitrarily sparse.
[365] arXiv:2405.04520 [pdf, ps, other]: Title: NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)

Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on algorithm and data science, insufficiently satisfying challenging requirements prevalent in real-world coding. To fill this gap, we propose NaturalCodeBench (NCB), a challenging code benchmark designed to mirror the complexity and variety of scenarios in real coding tasks. NCB comprises 402 high-quality problems in Python and Java, meticulously selected from natural user queries from online coding services, covering 6 different domains. Noting the extraordinary difficulty in creating testing cases for real-world queries, we also introduce a semi-automated pipeline to enhance the efficiency of test case construction. Comparing with manual solutions, it achieves an efficiency increase of more than 4 times. Our systematic experiments on 39 LLMs find that performance gaps on NCB between models with close HumanEval scores could still be significant, indicating a lack of focus on practical code synthesis scenarios or over-specified optimization on HumanEval. On the other hand, even the best-performing GPT-4 is still far from satisfying on NCB. The evaluation toolkit and development set are available at this https URL.
[366] arXiv:2405.04525 [pdf, ps, other]: Title: Comparing Ways of Obtaining Candidate Orderings from Approval Ballots

Théo Delemazure, Chris Dong, Dominik Peters, Magdaléna Tydrichová

Comments: 43 pages including appendix, accepted to IJCAI 2024

Subjects: Computer Science and Game Theory (cs.GT)

To understand and summarize approval preferences and other binary evaluation data, it is useful to order the items on an axis which explains the data. In a political election using approval voting, this could be an ideological left-right axis such that each voter approves adjacent candidates, an analogue of single-peakedness. In a perfect axis, every approval set would be an interval, which is usually not possible, and so we need to choose an axis that gets closest to this ideal. The literature has developed algorithms for optimizing several objective functions (e.g., minimize the number of added approvals needed to get a perfect axis), but provides little help with choosing among different objectives. In this paper, we take a social choice approach and compare 5 different axis selection rules axiomatically, by studying the properties they satisfy. We establish some impossibility theorems, and characterize (within the class of scoring rules) the rule that chooses the axes that maximize the number of votes that form intervals, using the axioms of ballot monotonicity and resistance to cloning. Finally, we study the behavior of the rules on data from French election surveys, on the votes of justices of the US Supreme Court, and on synthetic data.
[367] arXiv:2405.04526 [pdf, ps, other]: Title: PoW Security-Latency under Random Delays and the Effect of Transaction Fees

Mustafa Doger, Sennur Ulukus, Nail Akar

Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Information Theory (cs.IT)

Safety guarantees and security-latency problem of Nakamoto consensus have been extensively studied in the last decade with a bounded delay model. Recent studies have shown that PoW protocol is secure under random delay models as well. In this paper, we analyze the security-latency problem, i.e., how secure a block is, after it becomes k-deep in the blockchain, under general random delay distributions. We provide tight and explicit bounds which only require determining the distribution of the number of Poisson arrivals during the random delay. We further consider potential effects of recent Bitcoin halving on the security-latency problem by extending our results.
[368] arXiv:2405.04528 [pdf, ps, other]: Title: Implementing ISO/IEC TS 27560:2023 Consent Records and Receipts for GDPR and DGA

Harshvardhan J. Pandit, Jan Lindquist, Georg P. Krog

Subjects: Cryptography and Security (cs.CR)

The ISO/IEC TS 27560:2023 Privacy technologies - Consent record information structure provides guidance for the creation and maintenance of records regarding consent as machine-readable information. It also provides guidance on the use of this information to exchange such records between entities in the form of 'receipts'. In this article, we compare requirements regarding consent between ISO/IEC TS 27560:2023, ISO/IEC 29184:2020 Privacy Notices, and the EU's General Data Protection Regulation (GDPR) to show how these standards can be used to support GDPR compliance. We then use the Data Privacy Vocabulary (DPV) to implement ISO/IEC TS 27560:2023 and create interoperable consent records and receipts. We also discuss how this work benefits the the implementation of EU Data Governance Act (DGA), specifically for machine-readable consent forms.
[369] arXiv:2405.04532 [pdf, ps, other]: Title: QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

Comments: The first three authors contribute equally to this project and are listed in the alphabetical order. Yujun Lin leads the quantization algorithm, Haotian Tang and Shang Yang lead the GPU kernels and the serving system. Code is available at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)

Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing INT4 quantization methods suffer from significant runtime overhead (20-90%) when dequantizing either weights or partial sums on GPUs. To address this challenge, we introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache. QoQ stands for quattuor-octo-quattuor, which represents 4-8-4 in Latin. QoQ is implemented by the QServe inference library that achieves measured speedup. The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores. Building upon this insight, in QoQ algorithm, we introduce progressive quantization that can allow low dequantization overhead in W4A8 GEMM. Additionally, we develop SmoothAttention to effectively mitigate the accuracy degradation incurred by 4-bit KV quantization. In the QServe system, we perform compute-aware weight reordering and take advantage of register-level parallelism to reduce dequantization latency. We also make fused attention memory-bound, harnessing the performance gain brought by KV4 quantization. As a result, QServe improves the maximum achievable serving throughput of Llama-3-8B by 1.2x on A100, 1.4x on L40S; and Qwen1.5-72B by 2.4x on A100, 3.5x on L40S, compared to TensorRT-LLM. Remarkably, QServe on L40S GPU can achieve even higher throughput than TensorRT-LLM on A100. Thus, QServe effectively reduces the dollar cost of LLM serving by 3x. Code is available at this https URL.
[370] arXiv:2405.04533 [pdf, ps, other]: Title: ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning

Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including the estimation of 3D pose, shape, contact, human-object interaction, emotion, and more. Each of these methods works in isolation instead of synergistically. Here we address this problem and build a language-driven human understanding system -- ChatHuman, which combines and integrates the skills of many different methods. To do so, we finetune a Large Language Model (LLM) to select and use a wide variety of existing tools in response to user inputs. In doing so, ChatHuman is able to combine information from multiple tools to solve problems more accurately than the individual tools themselves and to leverage tool output to improve its ability to reason about humans. The novel features of ChatHuman include leveraging academic publications to guide the application of 3D human-related tools, employing a retrieval-augmented generation model to generate in-context-learning examples for handling new tools, and discriminating and integrating tool results to enhance 3D human understanding. Our experiments show that ChatHuman outperforms existing models in both tool selection accuracy and performance across multiple 3D human-related tasks. ChatHuman is a step towards consolidating diverse methods for human analysis into a single, powerful, system for 3D human reasoning.
[371] arXiv:2405.04534 [pdf, ps, html, other]: Title: Tactile-Augmented Radiance Fields

Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens

Comments: CVPR 2024, Project page: this https URL, Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a scene representation, which we call a tactile-augmented radiance field (TaRF), that brings vision and touch into a shared 3D space. This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene. We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes. Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features. We use these insights to register touch signals to a captured visual scene, and to train a conditional diffusion model that, provided with an RGB-D image rendered from a neural radiance field, generates its corresponding tactile signal. To evaluate our approach, we collect a dataset of TaRFs. This dataset contains more touch samples than previous real-world datasets, and it provides spatially aligned visual signals for each captured touch signal. We demonstrate the accuracy of our cross-modal generative model and the utility of the captured visual-tactile data on several downstream tasks. Project page: this https URL

[372] arXiv:2405.01780 (cross-list from quant-ph) [pdf, ps, other]: Title: Quantum Machine Learning: Quantum Kernel Methods

Sanjeev Naguleswaran

Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

Quantum algorithms based on quantum kernel methods have been investigated previously [1]. A quantum advantage is derived from the fact that it is possible to construct a family of datasets for which, only quantum processing can recognise the intrinsic labelling patterns, while for classical computers the dataset looks like noise. This is due to the algorithm leveraging inherent efficiencies in the computation of logarithms in a cyclic group. The discrete log this http URL a well-known advantage of quantum vs classical computation: where it is possible to generate all the members of the group using a single mathematical operation. Kernel methods are a powerful and popular technique in classical Machine Learning. The use of a quantum feature space that can only be calculated efficiently on a quantum computer potentially allows for deriving a quantum advantage. In this paper, we intend to first describe the application of such a kernel method to a Quantum version of the classical Support Vector Machine (SVM) algorithm to identify conditions under which, a quantum advantage is realised. A data dependent projected quantum kernel was shown to provide significant advantage over classical kernels. Further, we present results of investigations and ideas pertaining to extending the use of quantum kernels as a feature extraction layer in a Convolutional Neural Networks (CNN) that is a widely used architecture in deep-learning applications.
[373] arXiv:2405.03152 (cross-list from eess.AS) [pdf, ps, html, other]: Title: MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. However, GER encounters challenges such as fixed N-best hypotheses, insufficient utilization of acoustic information, and limited specificity to multi-accent scenarios. In this paper, we explore the application of GER in multi-accent scenarios. Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution. In this work, we propose a unified ASR-AR GER model, named MMGER, leveraging multi-modal correction, and multi-granularity correction. Multi-task ASR-AR learning is employed to provide dynamic 1-best hypotheses and accent embeddings. Multi-modal correction accomplishes fine-grained frame-level correction by force-aligning the acoustic features of speech with the corresponding character-level 1-best hypothesis sequence. Multi-granularity correction supplements the global linguistic information by incorporating regular 1-best hypotheses atop fine-grained multi-modal correction to achieve coarse-grained utterance-level correction. MMGER effectively mitigates the limitations of GER and tailors LLM-based ASR error correction for the multi-accent scenarios. Experiments conducted on the multi-accent Mandarin KeSpeech dataset demonstrate the efficacy of MMGER, achieving a 26.72% relative improvement in AR accuracy and a 27.55% relative reduction in ASR character error rate, compared to a well-established standard baseline.
[374] arXiv:2405.03692 (cross-list from eess.IV) [pdf, ps, other]: Title: Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

Shuoyao Wang, Jiawei Lin, Fangwei Ye

Comments: submitted to IEEE Journal

Subjects: Image and Video Processing (eess.IV); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel approach that combines imitation learning with the information bottleneck technique, to learn from the complex offline optimal scenario rather than inefficient exploration. In particular, we leverage the deterministic offline bitrate optimization problem with the future throughput realization as the expert and formulate it as a mixed-integer non-linear programming (MINLP) problem. To enable large-scale training for improved performance, we propose an alternative optimization algorithm that efficiently solves the MINLP problem. To address the issues of overfitting due to the future information leakage in MINLP, we incorporate an adversarial information bottleneck framework. By compressing the video streaming state into a latent space, we retain only action-relevant information. Additionally, we introduce a future adversarial term to mitigate the influence of future information leakage, where Model Prediction Control (MPC) policy without any future information is employed as the adverse expert. Experimental results demonstrate the effectiveness of our proposed approach in significantly enhancing the quality of adaptive video streaming, providing a 7.30\% average QoE improvement and a 30.01\% average ranking reduction.
[375] arXiv:2405.03693 (cross-list from physics.data-an) [pdf, ps, other]: Title: Trajectory analysis through entropy characterization over coded representation

Roxana Peña-Mendieta, Ania Mesa-Rodríguez, Ernesto Estevez-Rams, Daniel Estevez-Moya, Danays Kunka

Subjects: Data Analysis, Statistics and Probability (physics.data-an); Information Theory (cs.IT)

Any continuous curve in a higher dimensional space can be considered a trajectory that can be parameterized by a single variable, usually taken as time. It is well known that a continuous curve can have a fractional dimensionality, which can be estimated using already standard algorithms. However, characterizing a trajectory from an entropic perspective is far less developed. The search for such characterization leads us to use chain coding to discretize the description of a curve. Calculating the entropy density and entropy-related magnitudes from the resulting finite alphabet code becomes straightforward. In such a way, the entropy of a trajectory can be defined and used as an effective tool to assert creativity and pattern formation from a Shannon perspective. Applying the procedure to actual experimental physiological data and modelled trajectories of astronomical dynamics proved the robustness of the entropic characterization in a wealth of trajectories of different origins and the insight that can be gained from its use.
[376] arXiv:2405.03701 (cross-list from physics.soc-ph) [pdf, ps, other]: Title: QxEAI -- Automated probabilistic forecasting with Quantum-like evolutionary algorithm

Kevin Xin, Lizhi Xin

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); General Economics (econ.GN)

Forecasting, to estimate future events, is crucial for business and decision-making. This paper proposes QxEAI, a methodology that produces a probabilistic forecast that utilizes a quantum-like evolutionary algorithm based on training a quantum-like logic decision tree and a classical value tree on a small number of related time series. By using different cycles of the Dow Jones Index (yearly, monthly, weekly, daily), we demonstrate how our methodology produces accurate forecasts while requiring little to none manual work.
[377] arXiv:2405.03713 (cross-list from eess.IV) [pdf, ps, html, other]: Title: Improve Cross-Modality Segmentation by Treating MRI Images as Inverted CT Scans

Hartmut Häntze, Lina Xu, Leonhard Donle, Felix J. Dorfner, Alessa Hering, Lisa C. Adams, Keno K. Bressem

Comments: 3 pages, 2 figures

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Computed tomography (CT) segmentation models frequently include classes that are not currently supported by magnetic resonance imaging (MRI) segmentation models. In this study, we show that a simple image inversion technique can significantly improve the segmentation quality of CT segmentation models on MRI data, by using the TotalSegmentator model, applied to T1-weighted MRI images, as example. Image inversion is straightforward to implement and does not require dedicated graphics processing units (GPUs), thus providing a quick alternative to complex deep modality-transfer models for generating segmentation masks for MRI data.
[378] arXiv:2405.03726 (cross-list from q-bio.GN) [pdf, ps, other]: Title: sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures

Andac Demir, Elizaveta Solovyeva, James Boylan, Mei Xiao, Fabrizio Serluca, Sebastian Hoersch, Jeremy Jenkins, Murthy Devarakonda, Bulent Kiziltan

Comments: ICLR 2024, Machine Learning for Genomics Explorations Workshop

Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)

Influenced by breakthroughs in LLMs, single-cell foundation models are emerging. While these models show successful performance in cell type clustering, phenotype classification, and gene perturbation response prediction, it remains to be seen if a simpler model could achieve comparable or better results, especially with limited data. This is important, as the quantity and quality of single-cell data typically fall short of the standards in textual data used for training LLMs. Single-cell sequencing often suffers from technical artifacts, dropout events, and batch effects. These challenges are compounded in a weakly supervised setting, where the labels of cell states can be noisy, further complicating the analysis. To tackle these challenges, we present sc-OTGM, streamlined with less than 500K parameters, making it approximately 100x more compact than the foundation models, offering an efficient alternative. sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated from a combination of the finite multivariate Gaussian distributions. The core function of sc-OTGM is to create a probabilistic latent space utilizing a GMM as its prior distribution and distinguish between distinct cell populations by learning their respective marginal PDFs. It uses a Hit-and-Run Markov chain sampler to determine the OT plan across these PDFs within the GMM framework. We evaluated our model against a CRISPR-mediated perturbation dataset, called CROP-seq, consisting of 57 one-gene perturbations. Our results demonstrate that sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification through a recommender system. It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
[379] arXiv:2405.03732 (cross-list from eess.IV) [pdf, ps, other]: Title: Accelerated MR Cholangiopancreatography with Deep Learning-based Reconstruction

Jinho Kim, Marcel Dominik Nickel, Florian Knoll

Comments: 20 pages, 6 figures, 2 tables

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This study accelerates MR cholangiopancreatography (MRCP) acquisitions using deep learning-based (DL) reconstruction at 3T and 0.55T. Thirty healthy volunteers underwent conventional two-fold MRCP scans at field strengths of 3T or 0.55T. We trained a variational network (VN) using retrospectively six-fold undersampled data obtained at 3T. We then evaluated our method against standard techniques such as parallel imaging (PI) and compressed sensing (CS), focusing on peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as metrics. Furthermore, considering acquiring fully-sampled MRCP is impractical, we added a self-supervised DL reconstruction (SSDU) to the evaluating group. We also tested our method in a prospective accelerated scenario to reflect real-world clinical applications and evaluated its adaptability to MRCP at 0.55T. Our method demonstrated a remarkable reduction of average acquisition time from 599/542 to 255/180 seconds for MRCP at 3T/0.55T. In both retrospective and prospective undersampling scenarios, the PSNR and SSIM of VN were higher than those of PI, CS, and SSDU. At the same time, VN preserved the image quality of undersampled data, i.e., sharpness and the visibility of hepatobiliary ducts. In addition, VN also produced high quality reconstructions at 0.55T resulting in the highest PSNR and SSIM. In summary, VN trained for highly accelerated MRCP allows to reduce the acquisition time by a factor of 2.4/3.0 at 3T/0.55T while maintaining the image quality of the conventional acquisition.
[380] arXiv:2405.03762 (cross-list from eess.IV) [pdf, ps, html, other]: Title: Deep learning classifier of locally advanced rectal cancer treatment response from endoscopy images

Jorge Tapias Gomez, Aneesh Rangnekar, Hannah Williams, Hannah Thompson, Julio Garcia-Aguilar, Joshua Jesse Smith, Harini Veeraraghavan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

We developed a deep learning classifier of rectal cancer response (tumor vs. no-tumor) to total neoadjuvant treatment (TNT) from endoscopic images acquired before, during, and following TNT. We further evaluated the network's ability in a near out-of-distribution (OOD) problem to identify local regrowth (LR) from follow-up endoscopy images acquired several months to years after completing TNT. We addressed endoscopic image variability by using optimal mass transport-based image harmonization. We evaluated multiple training regularization schemes to study the ResNet-50 network's in-distribution and near-OOD generalization ability. Test time augmentation resulted in the most considerable accuracy improvement. Image harmonization resulted in slight accuracy improvement for the near-OOD cases. Our results suggest that off-the-shelf deep learning classifiers can detect rectal cancer from endoscopic images at various stages of therapy for surveillance.
[381] arXiv:2405.03879 (cross-list from stat.ML) [pdf, ps, html, other]: Title: Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

Sarah Zhao, Aditya Ravuri, Vidhi Lalchand, Neil D. Lawrence

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP)

Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.
[382] arXiv:2405.03880 (cross-list from physics.med-ph) [pdf, ps, html, other]: Title: Efficient Radiation Treatment Planning based on Voxel Importance

Sebastian Mair, Anqi Fu, Jens Sjölund

Comments: 20 pages, 11 figures

Subjects: Medical Physics (physics.med-ph); Machine Learning (cs.LG)

Optimization is a time-consuming part of radiation treatment planning. We propose to reduce the optimization problem by only using a representative subset of informative voxels. This way, we improve planning efficiency while maintaining or enhancing the plan quality. To reduce the computational complexity of the optimization problem, we propose to subsample the set of voxels via importance sampling. We derive a sampling distribution based on an importance score that we obtain from pre-solving an easy optimization problem involving a simplified probing objective. By solving a reduced version of the original optimization problem using this subset, we effectively reduce the problem's size and computational demands while accounting for regions in which satisfactory dose deliveries are challenging. In contrast to other stochastic (sub-)sampling methods, our technique only requires a single sampling step to define a reduced optimization problem. This problem can be efficiently solved using established solvers. Empirical experiments on open benchmark data highlight substantially reduced optimization times, up to 50 times faster than the original ones, for intensity-modulated radiation therapy (IMRT), all while upholding plan quality comparable to traditional methods. Our approach has the potential to significantly accelerate radiation treatment planning by addressing its inherent computational challenges. We reduce the treatment planning time by reducing the size of the optimization problem rather than improving the optimization method. Our efforts are thus complementary to much of the previous developments.
[383] arXiv:2405.03913 (cross-list from q-bio.QM) [pdf, ps, html, other]: Title: Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

Fuqiang Cheng, Wei Xie, Hua Zheng

Comments: 12 pages, 5 figures

Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Machine Learning (stat.ML)

Biomanufacturing innovation relies on an efficient design of experiments (DoE) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach that can guide a sequential DoEs for digital twin model calibration. In this study, we consider a multi-scale mechanistic model for cell culture process, also known as Biological Systems-of-Systems (Bio-SoS), as our digital twin. This model with modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs.
[384] arXiv:2405.04003 (cross-list from physics.comp-ph) [pdf, ps, html, other]: Title: High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

Joseph Farmer, Ethan Smith, William Bennett, Ryan McClarren

Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG)

Radiative heat transfer is a fundamental process in high energy density physics and inertial fusion. Accurately predicting the behavior of Marshak waves across a wide range of material properties and drive conditions is crucial for design and analysis of these systems. Conventional numerical solvers and analytical approximations often face challenges in terms of accuracy and computational efficiency. In this work, we propose a novel approach to model Marshak waves using Fourier Neural Operators (FNO). We develop two FNO-based models: (1) a base model that learns the mapping between the drive condition and material properties to a solution approximation based on the widely used analytic model by Hammer & Rosen (2003), and (2) a model that corrects the inaccuracies of the analytic approximation by learning the mapping to a more accurate numerical solution. Our results demonstrate the strong generalization capabilities of the FNOs and show significant improvements in prediction accuracy compared to the base analytic model.
[385] arXiv:2405.04022 (cross-list from math.AC) [pdf, ps, html, other]: Title: On $n$-Dimensional Sequences. I

Graham H. Norton

Comments: This is my original latex document submitted to Journal of Symbolic Computation without the typographical errors which were introduced: 'The Journal apologizes for the typographical errors in Norton (1995) introduced in the subediting process'; see this journal, (1995)20, 769-770

Journal-ref: J. Symbolic Computation (1995), 20, 71-92

Subjects: Commutative Algebra (math.AC); Symbolic Computation (cs.SC)

Let $R$ be a commutative ring and let $n \geq 1.$ We study $\Gamma(s)$, the generating function and Ann$(s)$, the ideal of characteristic polynomials of $s$, an $n$--dimensional sequence over $R$.
We express $f(X_1,\ldots,X_n) \cdot \Gamma(s)(X_1^{-1},\ldots ,X_n^{-1})$ as a partitioned sum. That is, we give (i) a $2^n$--fold ``border'' partition (ii) an explicit expression for the product as a $2^n$--fold sum; the support of each summand is contained in precisely one member of the partition. A key summand is $\beta_0(f,s)$, the ``border polynomial'' of $f$ and $s$, which is divisible by $X_1\cdots X_n$.
We say that $s$ is {\em eventually rectilinear} if the elimination ideals Ann$(s)\cap R[X_i]$ contain an $f_i(X_i)$ for $1 \leq i \leq n$. In this case, we show that $\mbox{Ann}(s)$ is the ideal quotient $(\sum_{i=1}^n(f_i)\ :\ \beta_0(f,s)/(X_1\cdots X_n)).$
When $R$ and $R[[X_1,X_2, \ldots ,X_n]]$ are factorial domains (e.g. $R$ a principal ideal domain or ${\Bbb F}[X_1,\ldots,X_n]$), we compute {\em the monic generator} $\gamma _i$ of $\mbox{Ann}(s) \cap R[X_i]$ from known $f_i \in \mbox{Ann}(s) \cap R[X_i]$ or from a finite number of $1$--dimensional linear recurring sequences over $R$. Over a field ${\Bbb F}$ this gives an $O(\prod_{i=1}^n \delta \gamma _i^3)$ algorithm to compute an ${\Bbb F}$--basis for $\mbox{Ann}(s)$.
[386] arXiv:2405.04023 (cross-list from eess.IV) [pdf, ps, other]: Title: Lumbar Spine Tumor Segmentation and Localization in T2 MRI Images Using AI

Rikathi Pal, Sudeshna Mondal, Aditi Gupta, Priya Saha, Somoballi Ghoshal, Amlan Chakrabarti, Susmita Sur-Kolay

Comments: 9 pages, 12 figures

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In medical imaging, segmentation and localization of spinal tumors in three-dimensional (3D) space pose significant computational challenges, primarily stemming from limited data availability. In response, this study introduces a novel data augmentation technique, aimed at automating spine tumor segmentation and localization through AI approaches. Leveraging a fusion of fuzzy c-means clustering and Random Forest algorithms, the proposed method achieves successful spine tumor segmentation based on predefined masks initially delineated by domain experts in medical imaging. Subsequently, a Convolutional Neural Network (CNN) architecture is employed for tumor classification. Moreover, 3D vertebral segmentation and labeling techniques are used to help pinpoint the exact location of the tumors in the lumbar spine. Results indicate a remarkable performance, with 99% accuracy for tumor segmentation, 98% accuracy for tumor classification, and 99% accuracy for tumor localization achieved with the proposed approach. These metrics surpass the efficacy of existing state-of-the-art techniques, as evidenced by superior Dice Score, Class Accuracy, and Intersection over Union (IOU) on class accuracy metrics. This innovative methodology holds promise for enhancing the diagnostic capabilities in detecting and characterizing spinal tumors, thereby facilitating more effective clinical decision-making.
[387] arXiv:2405.04026 (cross-list from stat.ML) [pdf, ps, html, other]: Title: Federated Control in Markov Decision Processes

Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.
[388] arXiv:2405.04043 (cross-list from stat.CO) [pdf, ps, other]: Title: Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Conor Hassan, Matthew Sutton, Antonietta Mira, Kerrie Mengersen

Comments: 30 pages, 5 figures, 3 tables

Subjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.
[389] arXiv:2405.04056 (cross-list from physics.optics) [pdf, ps, other]: Title: Bidirectional Adversarial Autoencoders for the design of Plasmonic Metasurfaces

Yuansan Liu, Jeygopi Panisilvam, Peter Dower, Sejeong Kim, James Bailey

Comments: 7 pages, 5 figures

Subjects: Optics (physics.optics); Machine Learning (cs.LG)

Deep Learning has been a critical part of designing inverse design methods that are computationally efficient and accurate. An example of this is the design of photonic metasurfaces by using their photoluminescent spectrum as the input data to predict their topology. One fundamental challenge of these systems is their ability to represent nonlinear relationships between sets of data that have different dimensionalities. Existing design methods often implement a conditional Generative Adversarial Network in order to solve this problem, but in many cases the solution is unable to generate structures that provide multiple peaks when validated. It is demonstrated that in response to the target spectrum, the Bidirectional Adversarial Autoencoder is able to generate structures that provide multiple peaks on several occasions. As a result the proposed model represents an important advance towards the generation of nonlinear photonic metasurfaces that can be used in advanced metasurface design.
[390] arXiv:2405.04072 (cross-list from quant-ph) [pdf, ps, other]: Title: Quantum Unpredictability

Tomoyuki Morimae, Shogo Yamada, Takashi Yamakawa

Comments: 38 pages, 1 figure

Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)

Unpredictable functions (UPFs) play essential roles in classical cryptography, including message authentication codes (MACs) and digital signatures. In this paper, we introduce a quantum analog of UPFs, which we call unpredictable state generators (UPSGs). UPSGs are implied by pseudorandom function-like states generators (PRFSs), which are a quantum analog of pseudorandom functions (PRFs), and therefore UPSGs could exist even if one-way functions do not exist, similar to other recently introduced primitives like pseudorandom state generators (PRSGs), one-way state generators (OWSGs), and EFIs. In classical cryptography, UPFs are equivalent to PRFs, but in the quantum case, the equivalence is not clear, and UPSGs could be weaker than PRFSs. Despite this, we demonstrate that all known applications of PRFSs are also achievable with UPSGs. They include IND-CPA-secure secret-key encryption and EUF-CMA-secure MACs with unclonable tags. Our findings suggest that, for many applications, quantum unpredictability, rather than quantum pseudorandomness, is sufficient.
[391] arXiv:2405.04074 (cross-list from cond-mat.dis-nn) [pdf, ps, html, other]: Title: A simple theory for training response of deep neural networks

Kenichi Nakazato

Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO)

Deep neural networks give us a powerful method to model the training dataset's relationship between input and output. We can regard that as a complex adaptive system consisting of many artificial neurons that work as an adaptive memory as a whole. The network's behavior is training dynamics with a feedback loop from the evaluation of the loss function. We already know the training response can be constant or shows power law-like aging in some ideal situations. However, we still have gaps between those findings and other complex phenomena, like network fragility. To fill the gap, we introduce a very simple network and analyze it. We show the training response consists of some different factors based on training stages, activation functions, or training methods. In addition, we show feature space reduction as an effect of stochastic training dynamics, which can result in network fragility. Finally, we discuss some complex phenomena of deep networks.
[392] arXiv:2405.04096 (cross-list from eess.AS) [pdf, ps, html, other]: Title: Speaker Characterization by means of Attention Pooling

Federico Costa, Miquel India, Javier Hernando

Comments: IberSpeech 2022

Journal-ref: Proc. IberSPEECH 2022, 166-170

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. The authors have recently proposed the use of a Double Multi-Head Self-Attention pooling for speaker recognition, placed between a CNN-based front-end and a set of fully connected layers. This has shown to be an excellent approach to efficiently select the most relevant features captured by the front-end from the speech signal. In this paper we show excellent experimental results by adapting this architecture to other different speaker characterization tasks, such as emotion recognition, sex classification and COVID-19 detection.
[393] arXiv:2405.04119 (cross-list from math.CO) [pdf, ps, other]: Title: Diameter of the inversion graph

Frédéric Havet, Florian Hörsch, Clément Rembaud

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

In an oriented graph $\vec{G}$, the inversion of a subset $X$ of vertices consists in reversing the orientation of all arcs with both endvertices in $X$. The inversion graph of a labelled graph $G$, denoted by ${\mathcal{I}}(G)$, is the graph whose vertices are the labelled orientations of $G$ in which two labelled orientations $\vec{G}_1$ and $\vec{G}_2$ of $G$ are adjacent if and only if there is an inversion $X$ transforming $\vec{G}_1$ into $\vec{G}_2$. In this paper, we study the inversion diameter of a graph which is the diameter of its inversion graph denoted by $diam(\mathcal{I}(G))$. We show that the inversion diameter is tied to the star chromatic number, the acyclic chromatic number and the oriented chromatic number. Thus a graph class has bounded inversion diameter if and only if it also has bounded star chromatic number, acyclic chromatic number and oriented chromatic number. We give some upper bounds on the inversion diameter of a graph $G$ contained in one of the following graph classes: planar graphs ($diam(\mathcal{I}(G)) \leq 12$), planar graphs of girth 8 ($diam(\mathcal{I}(G)) \leq 3$), graphs with maximum degree $\Delta$ ($diam(\mathcal{I}(G)) \leq 2\Delta -1$), graphs with treewidth at mots $t$ ($diam(\mathcal{I}(G)) \leq 2t$).
We also show that determining the inversion diameter of a given graph is NP-hard.
[394] arXiv:2405.04147 (cross-list from stat.ML) [pdf, ps, other]: Title: Multiparameter regularization and aggregation in the context of polynomial functional regression

Elke R. Gizewski, Markus Holzleitner, Lukas Mayer-Suess, Sergiy Pereverzyev Jr., Sergei V. Pereverzyev

Comments: 18 pages

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)

Most of the recent results in polynomial functional regression have been focused on an in-depth exploration of single-parameter regularization schemes. In contrast, in this study we go beyond that framework by introducing an algorithm for multiple parameter regularization and presenting a theoretically grounded method for dealing with the associated parameters. This method facilitates the aggregation of models with varying regularization parameters. The efficacy of the proposed approach is assessed through evaluations on both synthetic and some real-world medical data, revealing promising results.
[395] arXiv:2405.04169 (cross-list from eess.IV) [pdf, ps, html, other]: Title: D-TrAttUnet: Toward Hybrid CNN-Transformer Architecture for Generic and Subtle Segmentation in Medical Images

Fares Bougourzi, Fadi Dornaika, Cosimo Distante, Abdelmalik Taleb-Ahmed

Comments: arXiv admin note: text overlap with arXiv:2303.15576

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Over the past two decades, machine analysis of medical imaging has advanced rapidly, opening up significant potential for several important medical applications. As complicated diseases increase and the number of cases rises, the role of machine-based imaging analysis has become indispensable. It serves as both a tool and an assistant to medical experts, providing valuable insights and guidance. A particularly challenging task in this area is lesion segmentation, a task that is challenging even for experienced radiologists. The complexity of this task highlights the urgent need for robust machine learning approaches to support medical staff. In response, we present our novel solution: the D-TrAttUnet architecture. This framework is based on the observation that different diseases often target specific organs. Our architecture includes an encoder-decoder structure with a composite Transformer-CNN encoder and dual decoders. The encoder includes two paths: the Transformer path and the Encoders Fusion Module path. The Dual-Decoder configuration uses two identical decoders, each with attention gates. This allows the model to simultaneously segment lesions and organs and integrate their segmentation losses.
To validate our approach, we performed evaluations on the Covid-19 and Bone Metastasis segmentation tasks. We also investigated the adaptability of the model by testing it without the second decoder in the segmentation of glands and nuclei. The results confirmed the superiority of our approach, especially in Covid-19 infections and the segmentation of bone metastases. In addition, the hybrid encoder showed exceptional performance in the segmentation of glands and nuclei, solidifying its role in modern medical image analysis.
[396] arXiv:2405.04172 (cross-list from math.OC) [pdf, ps, other]: Title: An efficient active-set method with applications to sparse approximations and risk minimization

Spyridon Pougkakiotis, Jacek Gondzio, Dionysis Kalogerias

Comments: arXiv admin note: substantial text overlap with arXiv:2302.14497, arXiv:2201.10211

Subjects: Optimization and Control (math.OC); Mathematical Software (cs.MS)

In this paper we present an efficient active-set method for the solution of convex quadratic programming problems with general piecewise-linear terms in the objective, with applications to sparse approximations and risk-minimization. The algorithm is derived by combining a proximal method of multipliers (PMM) with a standard semismooth Newton method (SSN), and is shown to be globally convergent under minimal assumptions. Further local linear (and potentially superlinear) convergence is shown under standard additional conditions. The major computational bottleneck of the proposed approach arises from the solution of the associated SSN linear systems. These are solved using a Krylov-subspace method, accelerated by certain novel general-purpose preconditioners which are shown to be optimal with respect to the proximal penalty parameters. The preconditioners are easy to store and invert, since they exploit the structure of the nonsmooth terms appearing in the problem's objective to significantly reduce their memory requirements. We showcase the efficiency, robustness, and scalability of the proposed solver on a variety of problems arising in risk-averse portfolio selection, $L^1$-regularized partial differential equation constrained optimization, quantile regression, and binary classification via linear support vector machines. We provide computational evidence, on real-world datasets, to demonstrate the ability of the solver to efficiently and competitively handle a diverse set of medium- and large-scale optimization instances.
[397] arXiv:2405.04230 (cross-list from physics.comp-ph) [pdf, ps, other]: Title: Unveiling the optimization process of Physics Informed Neural Networks: How accurate and competitive can PINNs be?

Jorge F. Urbán, Petros Stefanou, José A. Pons

Comments: 46 pages, 13 figures, submitted for publication in JCP

Subjects: Computational Physics (physics.comp-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This study investigates the potential accuracy boundaries of physics-informed neural networks, contrasting their approach with previous similar works and traditional numerical methods. We find that selecting improved optimization algorithms significantly enhances the accuracy of the results. Simple modifications to the loss function may also improve precision, offering an additional avenue for enhancement. Despite optimization algorithms having a greater impact on convergence than adjustments to the loss function, practical considerations often favor tweaking the latter due to ease of implementation. On a global scale, the integration of an enhanced optimizer and a marginally adjusted loss function enables a reduction in the loss function by several orders of magnitude across diverse physical problems. Consequently, our results obtained using compact networks (typically comprising 2 or 3 layers of 20-30 neurons) achieve accuracies comparable to finite difference schemes employing thousands of grid points. This study encourages the continued advancement of PINNs and associated optimization techniques for broader applications across various fields.
[398] arXiv:2405.04272 (cross-list from eess.AS) [pdf, ps, other]: Title: BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models

Eloi Moliner, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann, Vesa Välimäki

Comments: Submitted to IWAENC 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the reverse diffusion trajectory. A measurement consistency criterion enforces the fidelity of the generated speech with the reverberant measurement, while an unconditional diffusion model implements a strong prior for clean speech generation. Without any knowledge of the room impulse response nor any coupled reverberant-anechoic data, we can successfully perform dereverberation in various acoustic scenarios. Our method significantly outperforms previous blind unsupervised baselines, and we demonstrate its increased robustness to unseen acoustic conditions in comparison to blind supervised methods. Audio samples and code are available online.
[399] arXiv:2405.04274 (cross-list from eess.IV) [pdf, ps, html, other]: Title: Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression

Zhenghao Chen, Luping Zhou, Zhihao Hu, Dong Xu

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents. Although these methods have been very practical in neural image compression (NIC), their application in neural video compression (NVC) is still limited due to two main aspects: 1), video compression relies heavily on temporal redundancy, therefore updating just one or a few frames can lead to significant errors accumulating over time; 2), NVC frameworks are generally more complex, with many large components that are not easy to update quickly during encoding. To address the previously mentioned challenges, we have developed a content-adaptive NVC technique called Group-aware Parameter-Efficient Updating (GPU). Initially, to minimize error accumulation, we adopt a group-aware approach for updating encoder parameters. This involves adopting a patch-based Group of Pictures (GoP) training strategy to segment a video into patch-based GoPs, which will be updated to facilitate a globally optimized domain-transferable solution. Subsequently, we introduce a parameter-efficient delta-tuning strategy, which is achieved by integrating several light-weight adapters into each coding component of the encoding process by both serial and parallel configuration. Such architecture-agnostic modules stimulate the components with large parameters, thereby reducing both the update cost and the encoding time. We incorporate our GPU into the latest NVC framework and conduct comprehensive experiments, whose results showcase outstanding video compression efficiency across four video benchmarks and adaptability of one medical image benchmark.
[400] arXiv:2405.04288 (cross-list from eess.IV) [pdf, ps, html, other]: Title: BetterNet: An Efficient CNN Architecture with Residual Learning and Attention for Precision Polyp Segmentation

Owen Singh, Sandeep Singh Sengar

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Colorectal cancer contributes significantly to cancer-related mortality. Timely identification and elimination of polyps through colonoscopy screening is crucial in order to decrease mortality rates. Accurately detecting polyps in colonoscopy images is difficult because of the differences in characteristics such as size, shape, texture, and similarity to surrounding tissues. Current deep-learning methods often face difficulties in capturing long-range connections necessary for segmentation. This research presents BetterNet, a convolutional neural network (CNN) architecture that combines residual learning and attention methods to enhance the accuracy of polyp segmentation. The primary characteristics encompass (1) a residual decoder architecture that facilitates efficient gradient propagation and integration of multiscale features. (2) channel and spatial attention blocks within the decoder block to concentrate the learning process on the relevant areas of polyp regions. (3) Achieving state-of-the-art performance on polyp segmentation benchmarks while still ensuring computational efficiency. (4) Thorough ablation tests have been conducted to confirm the influence of architectural components. (5) The model code has been made available as open-source for further contribution. Extensive evaluations conducted on datasets such as Kvasir-SEG, CVC ClinicDB, Endoscene, EndoTect, and Kvasir-Sessile demonstrate that BetterNets outperforms current SOTA models in terms of segmentation accuracy by significant margins. The lightweight design enables real-time inference for various applications. BetterNet shows promise in integrating computer-assisted diagnosis techniques to enhance the detection of polyps and the early recognition of cancer. Link to the code: this https URL
[401] arXiv:2405.04295 (cross-list from eess.IV) [pdf, ps, other]: Title: Semi-Supervised Disease Classification based on Limited Medical Image Data

Yan Zhang, Chun Li, Zhaoxia Liu, Ming Li

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medical image-aided diagnosis algorithms, numerous theoretical and practical obstacles persist. The research on PU learning for medical image-assisted diagnosis holds substantial importance, as it aims to reduce the time spent by professional experts in classifying images. Unlike natural images, medical images are typically accompanied by a scarcity of annotated data, while an abundance of unlabeled cases exists. Addressing these challenges, this paper introduces a novel generative model inspired by Hölder divergence, specifically designed for semi-supervised disease classification using positive and unlabeled medical image data. In this paper, we present a comprehensive formulation of the problem and establish its theoretical feasibility through rigorous mathematical analysis. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on five benchmark datasets commonly used in PU medical learning: BreastMNIST, PneumoniaMNIST, BloodMNIST, OCTMNIST, and AMD. The experimental results clearly demonstrate the superiority of our method over existing approaches based on KL divergence. Notably, our approach achieves state-of-the-art performance on all five disease classification benchmarks.
By addressing the limitations imposed by limited labeled data and harnessing the untapped potential of unlabeled medical images, our novel generative model presents a promising direction for enhancing semi-supervised disease classification in the field of medical image analysis.
[402] arXiv:2405.04321 (cross-list from cond-mat.mtrl-sci) [pdf, ps, other]: Title: Molecular Identification via Molecular Fingerprint extraction from Atomic Force Microscopy images

Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez

Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

Non--Contact Atomic Force Microscopy with CO--functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR--AFM images, leading to molecular identification. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints, the 1024--bit Extended Connectivity Chemical Fingerprints of radius 2 (ECFP4), that were developed for substructure and similarity searching. ECFPs provide local structural information of the molecule, each bit correlating with a particular substructure within the molecule. Our DL model is able to extract this optimized structural descriptor from the 3D HR--AFM stacks and use it, through virtual screening, to identify molecules from their predicted ECFP4 with a retrieval accuracy on theoretical images of 95.4\%. Furthermore, this approach, unlike previous DL models, assigns a confidence score, the Tanimoto similarity, to each of the candidate molecules, thus providing information on the reliability of the identification.
By construction, the number of times a certain substructure is present in the molecule is lost during the hashing process, necessary to make them useful for machine learning applications. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model that predicts from the same HR--AFM stacks the chemical formula, boosting the identification accuracy up to a 97.6\%. Finally, we perform a limited test with experimental images, obtaining promising results towards the application of this pipeline under real conditions
[403] arXiv:2405.04380 (cross-list from math.OC) [pdf, ps, other]: Title: Preserving Nonlinear Constraints in Variational Flow Filtering Data Assimilation

Amit N. Subrahmanya, Andrey A. Popov, Reid J. Gomillion, Adrian Sandu

Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)

Data assimilation aims to estimate the states of a dynamical system by optimally combining sparse and noisy observations of the physical system with uncertain forecasts produced by a computational model. The states of many dynamical systems of interest obey nonlinear physical constraints, and the corresponding dynamics is confined to a certain sub-manifold of the state space. Standard data assimilation techniques applied to such systems yield posterior states lying outside the manifold, violating the physical constraints. This work focuses on particle flow filters which use stochastic differential equations to evolve state samples from a prior distribution to samples from an observation-informed posterior distribution. The variational Fokker-Planck (VFP) -- a generic particle flow filtering framework -- is extended to incorporate non-linear, equality state constraints in the analysis. To this end, two algorithmic approaches that modify the VFP stochastic differential equation are discussed: (i) VFPSTAB, to inexactly preserve constraints with the addition of a stabilizing drift term, and (ii) VFPDAE, to exactly preserve constraints by treating the VFP dynamics as a stochastic differential-algebraic equation (SDAE). Additionally, an implicit-explicit time integrator is developed to evolve the VFPDAE dynamics. The strength of the proposed approach for constraint preservation in data assimilation is demonstrated on three test problems: the double pendulum, Korteweg-de-Vries, and the incompressible Navier-Stokes equations.
[404] arXiv:2405.04385 (cross-list from math.PR) [pdf, ps, html, other]: Title: A Random Walk Approach to Broadcasting on Random Recursive Trees

Ernst Althaus, Lisa Hartung, Rebecca Steiner

Subjects: Probability (math.PR); Discrete Mathematics (cs.DM)

In the broadcasting problem on trees, a $\{0,1\}$-message originating in an unknown node is passed along the tree with a certain error probability $q$. The goal is to estimate the original message without knowing the order in which the nodes were informed. A variation of the problem is considering this broadcasting process on a randomly growing tree, which Addario-Berry et al. have investigated for uniform and linear preferential attachment recursive trees. We extend their studies of the majority estimator to the entire group of very simple increasing trees as well as shape exchangeable trees using the connection to inhomogeneous random walks and other stochastic processes with memory effects such as Pólya Urns.
[405] arXiv:2405.04393 (cross-list from stat.ML) [pdf, ps, other]: Title: Efficient Online Set-valued Classification with Bandit Feedback

Zhou Wang, Xingye Qiao

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with correct actions, thereby affecting the accuracy of quantile estimation. To address these limitations, we propose Bandit Class-specific Conformal Prediction (BCCP), offering coverage guarantees on a class-specific granularity. Using an unbiased estimation of an estimand involving the true label, BCCP trains the model and makes set-valued inferences through stochastic gradient descent. Our approach overcomes the challenges of sparsely labeled data in each iteration and generalizes the reliability and applicability of conformal prediction to online decision-making environments.
[406] arXiv:2405.04476 (cross-list from eess.AS) [pdf, ps, html, other]: Title: BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

Comments: Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of real background noise or lack universal frameworks for blindly estimating RAPs and RPPs from noisy single-channel speech signals, particularly sound source distances, direction-of-arrival (DOA) of sound sources, and occupancy levels. On the other hand, in this paper, we propose a novel universal blind estimation framework called the blind estimator of room acoustical and physical parameters (BERP), by introducing a new stochastic room impulse response (RIR) model, namely, the sparse stochastic impulse response (SSIR) model, and endowing the BERP with a unified encoder and multiple separate predictors to estimate RPPs and SSIR parameters in parallel. This estimation framework enables the computationally efficient and universal estimation of room parameters by solely using noisy single-channel speech signals. Finally, all the RAPs can be simultaneously derived from the RIRs synthesized from SSIR model with the estimated parameters. To evaluate the effectiveness of the proposed BERP and SSIR models, we compile a task-specific dataset from several publicly available datasets. The results reveal that the BERP achieves state-of-the-art (SOTA) performance. Moreover, the evaluation results pertaining to the SSIR RIR model also demonstrated its efficacy. The code is available on GitHub.
[407] arXiv:2405.04514 (cross-list from quant-ph) [pdf, ps, other]: Title: Scalable Circuit Cutting and Scheduling in a Resource-constrained and Distributed Quantum System

Shuwen Kan, Zefan Du, Miguel Palma, Samuel A Stein, Chenxu Liu, Wenqi Wei, Juntao Chen, Ang Li, Ying Mao

Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC)

Despite quantum computing's rapid development, current systems remain limited in practical applications due to their limited qubit count and quality. Various technologies, such as superconducting, trapped ions, and neutral atom quantum computing technologies are progressing towards a fault tolerant era, however they all face a diverse set of challenges in scalability and control. Recent efforts have focused on multi-node quantum systems that connect multiple smaller quantum devices to execute larger circuits. Future demonstrations hope to use quantum channels to couple systems, however current demonstrations can leverage classical communication with circuit cutting techniques. This involves cutting large circuits into smaller subcircuits and reconstructing them post-execution. However, existing cutting methods are hindered by lengthy search times as the number of qubits and gates increases. Additionally, they often fail to effectively utilize the resources of various worker configurations in a multi-node system. To address these challenges, we introduce FitCut, a novel approach that transforms quantum circuits into weighted graphs and utilizes a community-based, bottom-up approach to cut circuits according to resource constraints, e.g., qubit counts, on each worker. FitCut also includes a scheduling algorithm that optimizes resource utilization across workers. Implemented with Qiskit and evaluated extensively, FitCut significantly outperforms the Qiskit Circuit Knitting Toolbox, reducing time costs by factors ranging from 3 to 2000 and improving resource utilization rates by up to 3.88 times on the worker side, achieving a system-wide improvement of 2.86 times.
[408] arXiv:2405.04518 (cross-list from physics.soc-ph) [pdf, ps, other]: Title: Competition between self- and other-regarding preferences in resolving social dilemmas

Chaoqian Wang, Attila Szolnoki

Comments: 22 pages, 6 figures, accepted for publication at New Journal of Physics

Subjects: Physics and Society (physics.soc-ph); Statistical Mechanics (cond-mat.stat-mech); Computer Science and Game Theory (cs.GT)

Evolutionary game theory assumes that individuals maximize their benefits when choosing strategies. However, an alternative perspective proposes that individuals seek to maximize the benefits of others. To explore the relationship between these perspectives, we develop a model where self- and other-regarding preferences compete in public goods games. We find that other-regarding preferences are more effective in promoting cooperation, even when self-regarding preferences are more productive. Cooperators with different preferences can coexist in a new phase where two classic solutions invade each other, resulting in a dynamical equilibrium. As a consequence, a lower productivity of self-regarding cooperation can provide a higher cooperation level. Our results, which are also valid in a well-mixed population, may explain why other-regarding preferences could be a viable and frequently observed attitude in human society.

[409] arXiv:0901.1988 (replaced) [pdf, ps, html, other]: Title: Many-Help-One Problem for Gaussian Sources with a Tree Structure on their Correlation

Yasutada Oohama

Comments: 21 pages, 4 figures

Subjects: Information Theory (cs.IT)

In this paper we consider the separate coding problem for $L+1$ correlated Gaussian memoryless sources. We deal with the case where $L$ separately encoded data of sources work as side information at the decoder for the reconstruction of the remaining source. The determination problem of the rate distortion region for this system is the so called many-help-one problem and has been known as a highly challenging problem. The author determined the rate distortion region in the case where the $L$ sources working as partial side information are conditionally independent if the remaining source we wish to reconstruct is given. This condition on the correlation is called the CI condition. In this paper we extend the author's previous result to the case where $L+1$ sources satisfy a kind of tree structure on their correlation. We call this tree structure of information sources the TS condition, which contains the CI condition as a special case. In this paper we derive an explicit outer bound of the rate distortion region when information sources satisfy the TS condition. We further derive an explicit sufficient condtion for this outer bound to be tight. In particular, we determine the sum rate part of the rate distortion region for the case where information sources satisfy the TS condition. For some class of Gaussian sources with the TS condition we derive an explicit recursive formula of this sum rate part.
[410] arXiv:1811.02733 (replaced) [pdf, ps, other]: Title: Generalized prolate spheroidal functions: algorithms and analysis

Philip Greengard

Subjects: Numerical Analysis (math.NA)

Generalized prolate spheroidal functions (GPSFs) arise naturally in the study of bandlimited functions as the eigenfunctions of a certain truncated Fourier transform. In one dimension, the theory of GPSFs (typically referred to as prolate spheroidal wave functions) has a long history and is fairly complete. Furthermore, more recent work has led to the development of numerical algorithms for their computation and use in applications. In this paper we consider the more general problem, extending the one dimensional analysis and algorithms to the case of arbitrary dimension. Specifically, we introduce algorithms for efficient evaluation of GPSFs and their corresponding eigenvalues, quadrature rules for bandlimited functions, formulae for interpolation via GPSF expansion, and various analytical properties of GPSFs. We illustrate the numerical and analytical results with several numerical examples.
[411] arXiv:2004.13938 (replaced) [pdf, ps, other]: Title: Families of sequences with good family complexity and cross-correlation measure

Kenan Doğan, Murat Şahin, Oğuz Yayla

Comments: 15 pages. Comments are welcome!

Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Number Theory (math.NT)

In this paper we study pseudorandomness of a family of sequences in terms of two measures, the family complexity ($f$-complexity) and the cross-correlation measure of order $\ell$. We consider sequences not only on binary alphabet but also on $k$-symbols ($k$-ary) alphabet. We first generalize some known methods on construction of the family of binary pseudorandom sequences. We prove a bound on the $f$-complexity of a large family of binary sequences of Legendre-symbols of certain irreducible polynomials. We show that this family as well as its dual family have both a large family complexity and a small cross-correlation measure up to a rather large order. Next, we present another family of binary sequences having high $f$-complexity and low cross-correlation measure. Then we extend the results to the family of sequences on $k$-symbols alphabet.
[412] arXiv:2104.02726 (replaced) [pdf, ps, html, other]: Title: Creativity and Machine Learning: A Survey

Giorgio Franceschelli, Mirco Musolesi

Comments: 38 pages, 1 figure, 2 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field.
[413] arXiv:2111.06877 (replaced) [pdf, ps, html, other]: Title: Continuity, Uniqueness and Long-Term Behavior of Nash Flows Over Time

Neil Olver, Leon Sering, Laura Vargas Koch

Subjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)

We consider a dynamic model of traffic that has received a lot of attention in the past few years. Users control infinitesimal flow particles aiming to travel from an origin to a destination as quickly as possible. Flow patterns vary over time, and congestion effects are modeled via queues, which form whenever the inflow into a link exceeds its capacity. Despite lots of interest, some very basic questions remain open in this model. We resolve a number of them:
- We show uniqueness of journey times in equilibria.
- We show continuity of equilibria: small perturbations to the instance or to the traffic situation at some moment cannot lead to wildly different equilibrium evolutions.
- We demonstrate that, assuming constant inflow into the network at the source, equilibria always settle down into a "steady state" in which the behavior extends forever in a linear fashion.
One of our main conceptual contributions is to show that the answer to the first two questions, on uniqueness and continuity, are intimately connected to the third. To resolve the third question, we substantially extend the approach of Cominetti et al., who show a steady-state result in the regime where the input flow rate is smaller than the network capacity.
[414] arXiv:2112.05023 (replaced) [pdf, ps, other]: Title: Polynomial XL: A Variant of the XL Algorithm Using Macaulay Matrices over Polynomial Rings

Hiroki Furue, Momonari Kudo

Comments: 35 pages, 1 figure

Journal-ref: Proceedings of PQCrypto 2024

Subjects: Symbolic Computation (cs.SC); Cryptography and Security (cs.CR); Commutative Algebra (math.AC)

Solving a system of $m$ multivariate quadratic equations in $n$ variables over finite fields (the MQ problem) is one of the important problems in the theory of computer science. The XL algorithm (XL for short) is a major approach for solving the MQ problem with linearization over a coefficient field. Furthermore, the hybrid approach with XL (h-XL) is a variant of XL guessing some variables beforehand. In this paper, we present a variant of h-XL, which we call the \textit{polynomial XL (PXL)}. In PXL, the whole $n$ variables are divided into $k$ variables to be fixed and the remaining $n-k$ variables as ``main variables'', and we generate a Macaulay matrix with respect to the $n-k$ main variables over a polynomial ring of the $k$ (sub-)variables. By eliminating some columns of the Macaulay matrix over the polynomial ring before guessing $k$ variables, the amount of operations required for each guessed value can be reduced compared with h-XL. Our complexity analysis of PXL (under some practical assumptions and heuristics) gives a new theoretical bound, and it indicates that PXL could be more efficient than other algorithms in theory on the random system with $n=m$, which is the case of general multivariate signatures. For example, on systems over the finite field with ${2^8}$ elements with $n=m=80$, the numbers of operations deduced from the theoretical bounds of the hybrid approaches with XL and Wiedemann XL, Crossbred, and PXL with optimal $k$ are estimated as $2^{252}$, $2^{234}$, $2^{237}$, and $2^{220}$, respectively.
[415] arXiv:2201.12981 (replaced) [pdf, ps, html, other]: Title: G$ \mathbf{^2} $VD Planner: Efficient Motion Planning With Grid-based Generalized Voronoi Diagrams

Jian Wen, Xuebo Zhang, Qingchen Bi, Hui Liu, Jing Yuan, Yongchun Fang

Comments: This paper has been accepted for publication in the IEEE Transactions on Automation Science and Engineering

Subjects: Robotics (cs.RO)

In this paper, an efficient motion planning approach with grid-based generalized Voronoi diagrams (G$ \mathbf{^2} $VD) is newly proposed for mobile robots. Different from existing approaches, the novelty of this work is twofold: 1) a new state lattice-based path searching approach is proposed, in which the search space is reduced to a novel Voronoi corridor to further improve the search efficiency; 2) an efficient quadratic programming-based path smoothing approach is presented, wherein the clearance to obstacles is considered to improve the path clearance of hard-constrained path smoothing approaches. We validate the efficiency and smoothness of our approach in various challenging simulation scenarios and outdoor environments. It is shown that the computational efficiency is improved by 17.1% in the path searching stage, and path smoothing with the proposed approach is 6.6 times faster than an advanced sparse-banded structure-based path smoothing approach and 53.3 times faster than the popular timed-elastic-band planner. A video showing outdoor navigation on our campus is available at this https URL.
[416] arXiv:2202.08081 (replaced) [pdf, ps, other]: Title: Reasoning with fuzzy and uncertain evidence using epistemic random fuzzy sets: general framework and practical models

Thierry Denoeux

Journal-ref: Fuzzy Sets and Systems, Vol. 453, Pages 1-36, 2023

Subjects: Artificial Intelligence (cs.AI); Methodology (stat.ME)

We introduce a general theory of epistemic random fuzzy sets for reasoning with fuzzy or crisp evidence. This framework generalizes both the Dempster-Shafer theory of belief functions, and possibility theory. Independent epistemic random fuzzy sets are combined by the generalized product-intersection rule, which extends both Dempster's rule for combining belief functions, and the product conjunctive combination of possibility distributions. We introduce Gaussian random fuzzy numbers and their multi-dimensional extensions, Gaussian random fuzzy vectors, as practical models for quantifying uncertainty about scalar or vector quantities. Closed-form expressions for the combination, projection and vacuous extension of Gaussian random fuzzy numbers and vectors are derived.
[417] arXiv:2203.15467 (replaced) [pdf, ps, other]: Title: Graded Monads and Behavioural Equivalence Games

Chase Ford, Harsh Beohar, Barbara König, Stefan Milius, Lutz Schröder

Subjects: Logic in Computer Science (cs.LO)

The framework of graded semantics uses graded monads to capture behavioural equivalences of varying granularity, for example as found on the linear-time/branching-time spectrum, over general system types. We describe a generic Spoiler-Duplicator game for graded semantics that is extracted from the given graded monad, and may be seen as playing out an equational proof; instances include standard pebble games for simulation and bisimulation as well as games for trace-like equivalences and coalgebraic behavioural equivalence. Considerations on an infinite variant of such games lead to a novel notion of infinite-depth graded semantics. Under reasonable restrictions, the infinite-depth graded semantics associated to a given graded equivalence can be characterized in terms of a determinization construction for coalgebras under the equivalence at hand.
[418] arXiv:2204.00435 (replaced) [pdf, ps, html, other]: Title: The higher dimensional propositional calculus

Antonio Bucciarelli, Pierre-Louis Curien, Antonio Ledda, Francesco Paoli, Antonino Salibra

Comments: arXiv admin note: text overlap with arXiv:1806.06537

Subjects: Logic in Computer Science (cs.LO)

In recent research, some of the present authors introduced the concept of an n-dimensional Boolean algebra and its corresponding propositional logic nCL, generalising the Boolean propositional calculus to n>= 2 perfectly symmetric truth values. This paper presents a sound and complete sequent calculus for nCL, named nLK. We provide two proofs of completeness: one syntactic and one semantic. The former implies as a corollary that nLK enjoys the cut admissibility property. The latter relies on the generalisation to the n-ary case of the classical proof based on the Lindenbaum algebra of formulas and Boolean ultrafilters.
[419] arXiv:2204.03827 (replaced) [pdf, ps, html, other]: Title: IA-GCN: Interactive Graph Convolutional Network for Recommendation

Yinan Zhang, Pei Wang, Congcong Liu, Xiwei Zhao, Hao Qi, Jie He, Junsheng Jin, Changping Peng, Zhangang Lin, Jingping Shao

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Recently, Graph Convolutional Network (GCN) has become a novel state-of-art for Collaborative Filtering (CF) based Recommender Systems (RS). It is a common practice to learn informative user and item representations by performing embedding propagation on a user-item bipartite graph, and then provide the users with personalized item suggestions based on the representations. Despite effectiveness, existing algorithms neglect precious interactive features between user-item pairs in the embedding process. When predicting a user's preference for different items, they still aggregate the user tree in the same way, without emphasizing target-related information in the user neighborhood. Such a uniform aggregation scheme easily leads to suboptimal user and item representations, limiting the model expressiveness to some extent.
In this work, we address this problem by building bilateral interactive guidance between each user-item pair and proposing a new model named IA-GCN (short for InterActive GCN). Specifically, when learning the user representation from its neighborhood, we assign higher attention weights to those neighbors similar to the target item. Correspondingly, when learning the item representation, we pay more attention to those neighbors resembling the target user. This leads to interactive and interpretable features, effectively distilling target-specific information through each graph convolutional operation. Our model is built on top of LightGCN, a state-of-the-art GCN model for CF, and can be combined with various GCN-based CF architectures in an end-to-end fashion. Extensive experiments on three benchmark datasets demonstrate the effectiveness and robustness of IA-GCN.
[420] arXiv:2206.09418 (replaced) [pdf, ps, other]: Title: LordNet: An Efficient Neural Network for Learning to Solve Parametric Partial Differential Equations without Simulated Data

Xinquan Huang, Wenlei Shi, Xiaotian Gao, Xinran Wei, Jia Zhang, Jiang Bian, Mao Yang, Tie-Yan Liu

Journal-ref: Neural Networks, 2024

Subjects: Machine Learning (cs.LG)

Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE). However, it requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, which we refer to it as mean squared residual (MSR) loss constructed by the discretized PDE. We investigate the physical information in the MSR loss, which we called long-range entanglements, and identify the challenge that the neural network requires the capacity to model the long-range entanglements in the spatial domain of the PDE, whose patterns vary in different PDEs. To tackle the challenge, we propose LordNet, a tunable and efficient neural network for modeling various entanglements. Inspired by the traditional solvers, LordNet models the long-range entanglements with a series of matrix multiplications, which can be seen as the low-rank approximation to the general fully-connected layers and extracts the dominant pattern with reduced computational cost. The experiments on solving Poisson's equation and (2D and 3D) Navier-Stokes equation demonstrate that the long-range entanglements from the MSR loss can be well modeled by the LordNet, yielding better accuracy and generalization ability than other neural networks. The results show that the Lordnet can be $40\times$ faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.
[421] arXiv:2207.03418 (replaced) [pdf, ps, other]: Title: Parallel Dual-Numbers Reverse AD

Tom Smeding, Matthijs Vákár

Comments: This is the journal version for JFP. For the shorter conference version (POPL'23), see arXiv:2207.03418v2. For an earlier preprint, see arXiv:2205.11368

Subjects: Programming Languages (cs.PL)

Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.
[422] arXiv:2209.13816 (replaced) [pdf, ps, other]: Title: Revisiting Few-Shot Learning from a Causal Perspective

Guoliang Lin, Yongheng Xu, Hanjiang Lai, Jian Yin

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Few-shot learning with $N$-way $K$-shot scheme is an open challenge in machine learning. Many metric-based approaches have been proposed to tackle this problem, e.g., the Matching Networks and CLIP-Adapter. Despite that these approaches have shown significant progress, the mechanism of why these methods succeed has not been well explored. In this paper, we try to interpret these metric-based few-shot learning methods via causal mechanism. We show that the existing approaches can be viewed as specific forms of front-door adjustment, which can alleviate the effect of spurious correlations and thus learn the causality. This causal interpretation could provide us a new perspective to better understand these existing metric-based methods. Further, based on this causal interpretation, we simply introduce two causal methods for metric-based few-shot learning, which considers not only the relationship between examples but also the diversity of representations. Experimental results demonstrate the superiority of our proposed methods in few-shot classification on various benchmark datasets. Code is available in this https URL.
[423] arXiv:2209.15168 (replaced) [pdf, ps, other]: Title: Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab

Comments: Accepted Oral Presentation at LREC-COLING 2024; 10 pages, 9 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficiently in terms of needed samples or steps. To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers. We compare DWAtt to a basic concatenation-based layer fusion method (Concat), and compare both to a deeper model baseline -- all kept within a similar parameter budget. Our findings show that DWAtt and Concat are more step- and sample-efficient than the baseline, especially in the few-shot setting. DWAtt outperforms Concat on larger data sizes. On CoNLL-03 NER, layer fusion shows 3.68--9.73% F1 gain at different few-shot sizes. The layer fusion models presented significantly outperform the baseline in various training scenarios with different data sizes, architectures, and training constraints.
[424] arXiv:2210.10651 (replaced) [pdf, ps, html, other]: Title: Fant\^omas: Understanding Face Anonymization Reversibility

Julian Todt, Simon Hanisch, Thorsten Strufe

Comments: Accepted to Proceedings on Privacy Enhancing Technologies (2024.4)

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Face images are a rich source of information that can be used to identify individuals and infer private information about them. To mitigate this privacy risk, anonymizations employ transformations on clear images to obfuscate sensitive information, all while retaining some utility. Albeit published with impressive claims, they sometimes are not evaluated with convincing methodology.
Reversing anonymized images to resemble their real input -- and even be identified by face recognition approaches -- represents the strongest indicator for flawed anonymization. Some recent results indeed indicate that this is possible for some approaches. It is, however, not well understood, which approaches are reversible, and why. In this paper, we provide an exhaustive investigation in the phenomenon of face anonymization reversibility. Among other things, we find that 11 out of 15 tested face anonymizations are at least partially reversible and highlight how both reconstruction and inversion are the underlying processes that make reversal possible.
[425] arXiv:2211.06533 (replaced) [pdf, ps, html, other]: Title: Adaptive Joint Estimation of Temporal Vertex and Edge Signals

Yi Yan, Tian Xie, Ercan E. Kuruoglu

Subjects: Social and Information Networks (cs.SI); Signal Processing (eess.SP)

The adaptive estimation of coexisting temporal vertex (node) and edge signals on graphs is a critical task when a change in edge signals influences the temporal dynamics of the vertex signals. However, the current Graph Signal Processing algorithms mostly consider only the signals existing on the graph vertices and have neglected the fact that signals can reside on the edges. We propose an Adaptive Joint Vertex-Edge Estimation (AJVEE) algorithm for jointly estimating time-varying vertex and edge signals through a time-varying regression, incorporating both vertex signal filtering and edge signal filtering. Accompanying AJVEE is a newly proposed Adaptive Least Mean Square procedure based on the Hodge Laplacian (ALMS-Hodge), which is inspired by classical adaptive filters combining simplicial filtering and simplicial regression. AJVEE is able to operate jointly on the vertices and edges by merging two ALMS-Hodge algorithms specified on the vertices and edges into a unified formulation. A more generalized case extending AJVEE beyond the vertices and edges is being discussed. Experimenting on real-world traffic networks and population mobility networks, we have confirmed that our proposed AJVEE algorithm could accurately and jointly track time-varying vertex and edge signals on graphs.
[426] arXiv:2211.06818 (replaced) [pdf, ps, html, other]: Title: CFLOBDDs: Context-Free-Language Ordered Binary Decision Diagrams

Meghana Sistla, Swarat Chaudhuri, Thomas Reps

Comments: 144 pages

Journal-ref: TOPLAS 2024

Subjects: Symbolic Computation (cs.SC)

This paper presents a new compressed representation of Boolean functions, called CFLOBDDs (for Context-Free-Language Ordered Binary Decision Diagrams). They are essentially a plug-compatible alternative to BDDs (Binary Decision Diagrams), and hence useful for representing certain classes of functions, matrices, graphs, relations, etc. in a highly compressed fashion. CFLOBDDs share many of the good properties of BDDs, but--in the best case--the CFLOBDD for a Boolean function can be exponentially smaller than any BDD for that function. Compared with the size of the decision tree for a function, a CFLOBDD--again, in the best case--can give a double-exponential reduction in size. They have the potential to permit applications to (i) execute much faster, and (ii) handle much larger problem instances than has been possible heretofore.
CFLOBDDs are a new kind of decision diagram that go beyond BDDs (and their many relatives). The key insight is a new way to reuse sub-decision-diagrams: components of CFLOBDDs are structured hierarchically, so that sub-decision-diagrams can be treated as standalone ''procedures'' and reused.
We applied CFLOBDDs to the problem of simulating quantum circuits, and found that for several standard problems the improvement in scalability--compared to simulation using BDDs--is quite dramatic. In particular, the number of qubits that could be handled using CFLOBDDs was larger, compared to BDDs, by a factor of 128x for GHZ; 1,024x for BV; 8,192x for DJ; and 128x for Grover's algorithm. (With a 15-minute timeout, the number of qubits that CFLOBDDs can handle are 65,536 for GHZ, 524,288 for BV; 4,194,304 for DJ; and 4,096 for Grover's Algorithm.)
[427] arXiv:2211.07125 (replaced) [pdf, ps, other]: Title: Global Performance Guarantees for Neural Network Models of AC Power Flow

Samuel Chevalier, Spyros Chatzivasileiadis

Comments: Submitted to IEEE IAS

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Machine learning, which can generate extremely fast and highly accurate black-box surrogate models, is increasingly being applied to a variety of AC power flow problems. Rigorously verifying the accuracy of the resulting black-box models, however, is computationally challenging. This paper develops a tractable neural network verification procedure which incorporates the ground truth of the non-linear AC power flow equations to determine worst-case neural network prediction error. Our approach, termed Sequential Targeted Tightening (STT), leverages a loosely convexified reformulation of the original verification problem, which is an intractable mixed integer quadratic program (MIQP). Using the sequential addition of targeted cuts, we iteratively tighten our formulation until either the solution is sufficiently tight or a satisfactory performance guarantee has been generated. After learning neural network models of the 14, 57, 118, and 200-bus PGLib test cases, we compare the performance guarantees generated by our STT procedure with ones generated by a state-of-the-art MIQP solver, Gurobi 11.0. We show that STT often generates performance guarantees which are far tighter than the MIQP upper bound.
[428] arXiv:2211.11926 (replaced) [pdf, ps, other]: Title: The weak Galerkin finite element method for Stokes interface problems with curved interface

Lin Yang, Qilong Zhai, Ran Zhang

Subjects: Numerical Analysis (math.NA)

In this paper, we develop a new weak Galerkin finite element scheme for the Stokes interface problem with curved interfaces. We take a unique vector-valued function at the interface and reflect the interface condition in the variational problem. Theoretical analysis and numerical experiments show that the errors can reach the optimal convergence order under the energy norm and $L^2$ norm.
[429] arXiv:2212.04083 (replaced) [pdf, ps, other]: Title: Convergence of the Fourier-Galerkin spectral method for the Boltzmann equation with uncertainties

Liu Liu, Kunlun Qi

Comments: 30 Pages. Revised Version. To appear in Communications in Mathematical Sciences

Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

It is well-known that the Fourier-Galerkin spectral method has been a popular approach for the numerical approximation of the deterministic Boltzmann equation with spectral accuracy rigorously proved. In this paper, we will show that such a spectral convergence of the Fourier-Galerkin spectral method also holds for the Boltzmann equation with uncertainties arising from both collision kernel and initial condition. Our proof is based on newly-established spaces and norms that are carefully designed and take the velocity variable and random variables with their high regularities into account altogether. For future studies, this theoretical result will provide a solid foundation for further showing the convergence of the full-discretized system where both the velocity and random variables are discretized simultaneously.
[430] arXiv:2212.08896 (replaced) [pdf, ps, other]: Title: Human Image Generation: A Comprehensive Survey

Zhen Jia, Zhang Zhang, Liang Wang, Tieniu Tan

Comments: Under Review

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to synthesizing high-fidelity human images as one of the most commonly seen object categories in daily lives, where a large number of studies are performed based on various models, task settings and applications. Thus, it is necessary to give a comprehensive overview on these variant methods on human image generation. In this paper, we divide human image generation techniques into three paradigms, i.e., data-driven methods, knowledge-guided methods and hybrid methods. For each paradigm, the most representative models and the corresponding variants are presented, where the advantages and characteristics of different methods are summarized in terms of model architectures. Besides, the main public human image datasets and evaluation metrics in the literature are summarized. Furthermore, due to the wide application potentials, the typical downstream usages of synthesized human images are covered. Finally, the challenges and potential opportunities of human image generation are discussed to shed light on future research.
[431] arXiv:2302.03456 (replaced) [pdf, ps, html, other]: Title: 1-in-3 vs. Not-All-Equal: Dichotomy of a broken promise

Lorenzo Ciardo, Marcin Kozik, Andrei Krokhin, Tamio-Vesa Nakajima, Stanislav Živný

Comments: Full version of a LICS 2024 paper; v2 has a different title, abstract, and introduction

Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

The 1-in-3 and Not-All-Equal satisfiability problems for Boolean CNF formulas are two well-known NP-hard problems. In contrast, the promise 1-in-3 vs. Not-All-Equal problem can be solved in polynomial time. In the present work, we investigate this constraint satisfaction problem in a regime where the promise is weakened from either side by a rainbow-free structure, and establish a complexity dichotomy for the resulting class of computational problems.
[432] arXiv:2302.13837 (replaced) [pdf, ps, html, other]: Title: Decentralized Learning Made Practical with Client Sampling

Martijn de Vos, Akash Dhasade, Anne-Marie Kermarrec, Erick Lavoie, Johan Pouwelse, Rishi Sharma

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Decentralized learning (DL) leverages edge devices for collaborative model training while avoiding coordination by a central server. Due to privacy concerns, DL has become an attractive alternative to centralized learning schemes since training data never leaves the device. In a round of DL, all nodes participate in model training and exchange their model with some other nodes. Performing DL in large-scale heterogeneous networks results in high communication costs and prolonged round durations due to slow nodes, effectively inflating the total training time. Furthermore, current DL algorithms also assume all nodes are available for training and aggregation at all times, diminishing the practicality of DL. This paper presents Plexus, an efficient, scalable, and practical DL system. Plexus (1) avoids network-wide participation by introducing a decentralized peer sampler that selects small subsets of available nodes that train the model each round and, (2) aggregates the trained models produced by nodes every round. Plexus is designed to handle joining and leaving nodes (churn). We extensively evaluate Plexus by incorporating realistic traces for compute speed, pairwise latency, network capacity, and availability of edge devices in our experiments. Our experiments on four common learning tasks empirically show that Plexus reduces time-to-accuracy by 1.2-8.3x, communication volume by 2.4-15.3x and training resources needed for convergence by 6.4-370x compared to baseline DL algorithms.
[433] arXiv:2302.14119 (replaced) [pdf, ps, html, other]: Title: Double-loop randomized quasi-Monte Carlo estimator for nested integration

Arved Bartuska, André Gustavo Carlon, Luis Espath, Sebastian Krumscheid, Raúl Tempone

Comments: 28 pages, 9 figures

Subjects: Numerical Analysis (math.NA)

Characterized by an outer integral connected to an inner integral through a nonlinear function, nested integration is a challenging problem in various fields, such as engineering and mathematical finance. The available numerical methods for nested integration based on Monte Carlo (MC) methods can be prohibitively expensive owing to the error propagating from the inner to the outer integral. Attempts to enhance the efficiency of these approximations using the quasi-MC (QMC) or randomized QMC (rQMC) method have focused on either the inner or outer integral approximation. This work introduces a novel nested rQMC method that simultaneously addresses the approximation of the inner and outer integrals. This method leverages the unique nested integral structure to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we address integrands exhibiting infinite variation in the Hardy--Krause sense, enabling theoretically sound error estimates. As the primary contribution, we derive asymptotic error bounds for the bias and variance of our estimator, along with the regularity conditions under which these bounds can be attained. In addition, we provide nearly optimal sample sizes for the rQMC approximations underlying the numerical implementation of the proposed method. Moreover, we indicate how to combine this method with importance sampling to remedy the measure concentration arising in the inner integral. We verify the estimator quality through numerical experiments in the context of expected information gain estimation. We compare the computational efficiency of the nested rQMC method against standard nested MC integration for two case studies: one in thermomechanics and the other in pharmacokinetics. These examples highlight the computational savings and enhanced applicability of the proposed approach.
[434] arXiv:2303.00391 (replaced) [pdf, ps, other]: Title: Grid-Forming Control Based On Emulated Synchronous Condenser Strategy Compliant With Challenging Grid Code Requirements

Julian Freytes, Antoine Rossé, Valentin Costan, Grégoire Prime

Comments: Replaced "GFo" by "GFM"

Subjects: Systems and Control (eess.SY)

Future power systems will include high shares of inverter-based generation. There is a general consensus that for allowing this transition, the Grid-Forming (GFM) control approach would be of great value. This article presents a GFM control strategy which is based on the concept of an Emulated Synchronous Condenser in parallel with a controlled current source with an explicit representation of the swing equation. The advantage of this control is that it can cope with challenging grid code requirements such as severe phase jumps, balanced and unbalanced Fault Ride-Through (FRT), main grid disconnection and black start. All these scenarios can be surpassed with a single control structure with no further logic involved (e.g. fault detection to turn on or off different control parts, freezes, etc.). The proposed strategy is evaluated via time-domain simulations of a 2-MW Battery Energy Storage System (BESS).
[435] arXiv:2303.07169 (replaced) [pdf, ps, other]: Title: Dynamic Event-based Optical Identification and Communication

Axel von Arnim, Jules Lecomte, Naima Elosegui Borras, Stanislaw Wozniak, Angeliki Pantazi

Journal-ref: Front. Neurorobot. 18:1290965

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Optical identification is often done with spatial or temporal visual pattern recognition and localization. Temporal pattern recognition, depending on the technology, involves a trade-off between communication frequency, range and accurate tracking. We propose a solution with light-emitting beacons that improves this trade-off by exploiting fast event-based cameras and, for tracking, sparse neuromorphic optical flow computed with spiking neurons. The system is embedded in a simulated drone and evaluated in an asset monitoring use case. It is robust to relative movements and enables simultaneous communication with, and tracking of, multiple moving beacons. Finally, in a hardware lab prototype, we demonstrate for the first time beacon tracking performed simultaneously with state-of-the-art frequency communication in the kHz range.
[436] arXiv:2303.13138 (replaced) [pdf, ps, other]: Title: Online search is more likely to lead students to validate true news than to refute false ones

Azza Bouleimen, Luca Luceri, Felipe Cardoso, Luca Botturi, Martin Hermida, Loredana Addimando, Chiara Beretta, Marzia Galloni, Silvia Giordano

Subjects: Computers and Society (cs.CY)

With the spread of high-speed Internet and portable smart devices, the way people access and consume information has drastically changed. However, this presents many challenges, including information overload, personal data leakage, and misinformation diffusion. Across the spectrum of risks that Internet users face nowadays, this work focuses on understanding how young people perceive and deal with false information. Within an experimental campaign involving 183 students, we presented six different news items to the participants and invited them to browse the Internet to assess the veracity of the presented information. Our results suggest that online search is more likely to lead students to validate true news than to refute false ones. We found that students change their opinion about a specific piece of information more often than their global idea about a broader topic. Also, our experiment reflected that most participants rely on online sources to obtain information and access the news, and those getting information from books and Internet browsing are the most accurate in assessing the veracity of a news item. This work provides a principled understanding of how young people perceive and distinguish true and false pieces of information, identifying strengths and weaknesses amidst young subjects and contributing to building tailored digital information literacy strategies for youth.
[437] arXiv:2304.01889 (replaced) [pdf, ps, html, other]: Title: Chasing Positive Bodies

Sayan Bhattacharya, Niv Buchbinder, Roie Levin, Thatchaphol Saranurak

Subjects: Data Structures and Algorithms (cs.DS)

We study the problem of chasing positive bodies in $\ell_1$: given a sequence of bodies $K_{t}=\{x^{t}\in\mathbb{R}_{+}^{n}\mid C^{t}x^{t}\geq 1,P^{t}x^{t}\leq 1\}$ revealed online, where $C^{t}$ and $P^{t}$ are nonnegative matrices, the goal is to (approximately) maintain a point $x_t \in K_t$ such that $\sum_t \|x_t - x_{t-1}\|_1$ is minimized. This captures the fully-dynamic low-recourse variant of any problem that can be expressed as a mixed packing-covering linear program and thus also the fractional version of many central problems in dynamic algorithms such as set cover, load balancing, hyperedge orientation, minimum spanning tree, and matching.
We give an $O(\log d)$-competitive algorithm for this problem, where $d$ is the maximum row sparsity of any matrix $C^t$. This bypasses and improves exponentially over the lower bound of $\sqrt{n}$ known for general convex bodies. Our algorithm is based on iterated information projections, and, in contrast to general convex body chasing algorithms, is entirely memoryless.
We also show how to round our solution dynamically to obtain the first fully dynamic algorithms with competitive recourse for all the stated problems above; i.e. their recourse is less than the recourse of every other algorithm on every update sequence, up to polylogarithmic factors. This is a significantly stronger notion than the notion of absolute recourse in the dynamic algorithms literature.
[438] arXiv:2305.04764 (replaced) [pdf, ps, other]: Title: ChatUniTest: A Framework for LLM-Based Test Generation

Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, Jianwei Yin

Comments: This shorter version is accepted by the FSE 2024 Demonstrations Track, and the previous longer version titled "ChatUniTest: a ChatGPT-based automated unit test generation tool" can be found at arXiv:2305.04764v1

Subjects: Software Engineering (cs.SE)

Unit testing is an essential yet frequently arduous task. Various automated unit test generation tools have been introduced to mitigate this challenge. Notably, methods based on large language models (LLMs) have garnered considerable attention and exhibited promising results in recent years. Nevertheless, LLM-based tools encounter limitations in generating accurate unit tests. This paper presents ChatUniTest, an LLM-based automated unit test generation framework. ChatUniTest incorporates an adaptive focal context mechanism to encompass valuable context in prompts and adheres to a generation-validation-repair mechanism to rectify errors in generated unit tests. Subsequently, we have developed ChatUniTest Core, a common library that implements core workflow, complemented by the ChatUniTest Toolchain, a suite of seamlessly integrated tools enhancing the capabilities of ChatUniTest. Our effectiveness evaluation reveals that ChatUniTest outperforms TestSpark and EvoSuite in half of the evaluated projects, achieving the highest overall line coverage. Furthermore, insights from our user study affirm that ChatUniTest delivers substantial value to various stakeholders in the software testing domain. ChatUniTest is available at this https URL, and the demo video is available at this https URL.
[439] arXiv:2305.07489 (replaced) [pdf, ps, other]: Title: Benchmarks and leaderboards for sound demixing tasks

Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas, including entertainment and hearing aids. In this paper, we introduce two new benchmarks for the sound source separation tasks and compare popular models for sound demixing, as well as their ensembles, on these benchmarks. For the models' assessments, we provide the leaderboard at this https URL, giving a comparison for a range of models. The new benchmark datasets are available for download. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem. The proposed solution was evaluated in the context of the Music Demixing Challenge 2023 and achieved top results in different tracks of the challenge. The code and the approach are open-sourced on GitHub.
[440] arXiv:2305.13551 (replaced) [pdf, ps, other]: Title: How Fragile is Relation Extraction under Entity Replacements?

Yiwei Wang, Bryan Hooi, Fei Wang, Yujun Cai, Yuxuan Liang, Wenxuan Zhou, Jing Tang, Manjuan Duan, Muhao Chen

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Relation extraction (RE) aims to extract the relations between entity names from the textual context. In principle, textual context determines the ground-truth relation and the RE models should be able to correctly identify the relations reflected by the textual context. However, existing work has found that the RE models memorize the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: ``are RE models robust to the entity replacements?'' In this work, we operate the random and type-constrained entity replacements over the RE instances in TACRED and evaluate the state-of-the-art RE models under the entity replacements. We observe the 30\% - 50\% F1 score drops on the state-of-the-art RE models under entity replacements. These results suggest that we need more efforts to develop effective RE models robust to entity replacements. We release the source code at this https URL.
[441] arXiv:2305.14851 (replaced) [pdf, ps, other]: Title: Sharpness-Aware Data Poisoning Attack

Pengfei He, Han Xu, Jie Ren, Yingqian Cui, Hui Liu, Charu C. Aggarwal, Jiliang Tang

Subjects: Cryptography and Security (cs.CR)

Recent research has highlighted the vulnerability of Deep Neural Networks (DNNs) against data poisoning attacks. These attacks aim to inject poisoning samples into the models' training dataset such that the trained models have inference failures. While previous studies have executed different types of attacks, one major challenge that greatly limits their effectiveness is the uncertainty of the re-training process after the injection of poisoning samples, including the re-training initialization or algorithms. To address this challenge, we propose a novel attack method called ''Sharpness-Aware Data Poisoning Attack (SAPA)''. In particular, it leverages the concept of DNNs' loss landscape sharpness to optimize the poisoning effect on the worst re-trained model. It helps enhance the preservation of the poisoning effect, regardless of the specific retraining procedure employed. Extensive experiments demonstrate that SAPA offers a general and principled strategy that significantly enhances various types of poisoning attacks.
[442] arXiv:2305.19076 (replaced) [pdf, ps, other]: Title: Approximate Bayesian Class-Conditional Models under Continuous Representation Shift

Thomas L. Lee, Amos Storkey

Comments: Published at AISTATS 2024, 9 pages

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

For models consisting of a classifier in some representation space, learning online from a non-stationary data stream often necessitates changes in the representation. So, the question arises of what is the best way to adapt the classifier to shifts in representation. Current methods only slowly change the classifier to representation shift, introducing noise into learning as the classifier is misaligned to the representation. We propose DeepCCG, an empirical Bayesian approach to solve this problem. DeepCCG works by updating the posterior of a class conditional Gaussian classifier such that the classifier adapts in one step to representation shift. The use of a class conditional Gaussian classifier also enables DeepCCG to use a log conditional marginal likelihood loss to update the representation. To perform the update to the classifier and representation, DeepCCG maintains a fixed number of examples in memory and so a key part of DeepCCG is selecting what examples to store, choosing the subset that minimises the KL divergence between the true posterior and the posterior induced by the subset. We explore the behaviour of DeepCCG in online continual learning (CL), demonstrating that it performs well against a spectrum of online CL methods and that it reduces the change in performance due to representation shift.
[443] arXiv:2305.19872 (replaced) [pdf, ps, other]: Title: Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials

Mingguo He, Zhewei Wei, Shikun Feng, Zhengjie Huang, Weibin Li, Yu Sun, Dianhai Yu

Comments: The Web Conference 2024 (12 pages)

Subjects: Machine Learning (cs.LG)

Heterogeneous Graph Neural Networks (HGNNs) have gained significant popularity in various heterogeneous graph learning tasks. However, most existing HGNNs rely on spatial domain-based methods to aggregate information, i.e., manually selected meta-paths or some heuristic modules, lacking theoretical guarantees. Furthermore, these methods cannot learn arbitrary valid heterogeneous graph filters within the spectral domain, which have limited expressiveness. To tackle these issues, we present a positive spectral heterogeneous graph convolution via positive noncommutative polynomials. Then, using this convolution, we propose PSHGCN, a novel Positive Spectral Heterogeneous Graph Convolutional Network. PSHGCN offers a simple yet effective method for learning valid heterogeneous graph filters. Moreover, we demonstrate the rationale of PSHGCN in the graph optimization framework. We conducted an extensive experimental study to show that PSHGCN can learn diverse heterogeneous graph filters and outperform all baselines on open benchmarks. Notably, PSHGCN exhibits remarkable scalability, efficiently handling large real-world graphs comprising millions of nodes and edges. Our codes are available at this https URL.
[444] arXiv:2306.00738 (replaced) [pdf, ps, other]: Title: ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Dana Arad, Hadas Orgad, Yonatan Belinkov

Comments: Accepted to NAACL 2024 (Main Conference)

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Our world is marked by unprecedented technological, global, and socio-political transformations, posing a significant challenge to text-to-image generative models. These models encode factual associations within their parameters that can quickly become outdated, diminishing their utility for end-users. To that end, we introduce ReFACT, a novel approach for editing factual associations in text-to-image models without relaying on explicit input from end-users or costly re-training. ReFACT updates the weights of a specific layer in the text encoder, modifying only a tiny portion of the model's parameters and leaving the rest of the model unaffected. We empirically evaluate ReFACT on an existing benchmark, alongside a newly curated dataset. Compared to other methods, ReFACT achieves superior performance in both generalization to related concepts and preservation of unrelated concepts. Furthermore, ReFACT maintains image generation quality, making it a practical tool for updating and correcting factual information in text-to-image models.
[445] arXiv:2306.01470 (replaced) [pdf, ps, html, other]: Title: Understanding MLP-Mixer as a Wide and Sparse MLP

Tomohiro Hayase, Ryo Karakida

Comments: Accepted in ICML 2024

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Multi-layer perceptron (MLP) is a fundamental component of deep learning, and recent MLP-based architectures, especially the MLP-Mixer, have achieved significant empirical success. Nevertheless, our understanding of why and how the MLP-Mixer outperforms conventional MLPs remains largely unexplored. In this work, we reveal that sparseness is a key mechanism underlying the MLP-Mixers. First, the Mixers have an effective expression as a wider MLP with Kronecker-product weights, clarifying that the Mixers efficiently embody several sparseness properties explored in deep learning. In the case of linear layers, the effective expression elucidates an implicit sparse regularization caused by the model architecture and a hidden relation to Monarch matrices, which is also known as another form of sparse parameterization. Next, for general cases, we empirically demonstrate quantitative similarities between the Mixer and the unstructured sparse-weight MLPs. Following a guiding principle proposed by Golubeva, Neyshabur and Gur-Ari (2021), which fixes the number of connections and increases the width and sparsity, the Mixers can demonstrate improved performance.
[446] arXiv:2306.02526 (replaced) [pdf, ps, other]: Title: Fast and high-order approximation of parabolic equations using hierarchical direct solvers and implicit Runge-Kutta methods

Ke Chen, Daniel Appelö, Tracy Babb, Per-Gunnar Martinsson

Subjects: Numerical Analysis (math.NA)

An additive Runge-Kutta method is used for the time stepping, which integrates the linear stiff terms by an explicit singly diagonally implicit Runge-Kutta (ESDIRK) method and the nonlinear terms by an explicit Runge-Kutta (ERK) method. In each time step, the implicit solve is performed by the recently developed Hierarchical Poincaré-Steklov (HPS) method. This is a fast direct solver for elliptic equations that decomposes the space domain into a hierarchical tree of subdomains and builds spectral collocation solvers locally on the subdomains. These ideas are naturally combined in the presented method since the singly diagonal coefficient in ESDIRK and a fixed time-step ensures that the coefficient matrix in the implicit solve of HPS remains the same for all time stages. This means that the precomputed inverse can be efficiently reused, leading to a scheme with complexity (in two dimensions) $\mathcal{O}(N^{1.5})$ for the precomputation where the solution operator to the elliptic problems is built, and then $\mathcal{O}(N \log N)$ for the solve in each time step. The stability of the method is proved for first order in time and any order in space, and numerical evidence substantiates a claim of stability for a much broader class of time discretization methods. Numerical experiments supporting the accuracy of efficiency of the method in one and two dimensions are presented.
[447] arXiv:2306.06182 (replaced) [pdf, ps, other]: Title: The effect of approximate coarsest-level solves on the convergence of multigrid V-cycle methods

Petr Vacek, Erin Carson, Kirk M. Soodhalter

Comments: 26 pages, 8 figures

Subjects: Numerical Analysis (math.NA)

The multigrid V-cycle method is a popular method for solving systems of linear equations. It computes an approximate solution by using smoothing on fine levels and solving a system of linear equations on the coarsest level. Solving on the coarsest level depends on the size and difficulty of the problem. If the size permits, it is typical to use a direct method based on LU or Cholesky decomposition. In settings with large coarsest-level problems, approximate solvers such as iterative Krylov subspace methods, or direct methods based on low-rank approximation, are often used. The accuracy of the coarsest-level solver is typically determined based on the experience of the users with the concrete problems and methods.
In this paper we present an approach to analyzing the effects of approximate coarsest-level solves on the convergence of the V-cycle method for symmetric positive definite problems. Using these results, we derive coarsest-level stopping criterion through which we may control the difference between the approximation computed by a V-cycle method with approximate coarsest-level solver and the approximation which would be computed if the coarsest-level problems were solved exactly. The coarsest-level stopping criterion may thus be set up such that the V-cycle method converges to a chosen finest-level accuracy in (nearly) the same number of V-cycle iterations as the V-cycle method with exact coarsest-level solver. We also utilize the theoretical results to discuss how the convergence of the V-cycle method may be affected by the choice of a tolerance in a coarsest-level stopping criterion based on the relative residual norm.
[448] arXiv:2306.07209 (replaced) [pdf, ps, other]: Title: Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

Various industries such as finance, meteorology, and energy produce vast amounts of heterogeneous data every day. There is a natural demand for humans to manage, process, and display data efficiently. However, it necessitates labor-intensive efforts and a high level of expertise for these data-related tasks. Considering large language models (LLMs) showcase promising capabilities in semantic understanding and reasoning, we advocate that the deployment of LLMs could autonomously manage and process massive amounts of data while interacting and displaying in a human-friendly manner. Based on this, we propose Data-Copilot, an LLM-based system that connects numerous data sources on one end and caters to diverse human demands on the other end. Acting as an experienced expert, Data-Copilot autonomously transforms raw data into multi-form output that best matches the user's intent. Specifically, it first designs multiple universal interfaces to satisfy diverse data-related requests, like querying, analysis, prediction, and visualization. In real-time response, it automatically deploys a concise workflow by invoking corresponding interfaces. The whole process is fully controlled by Data-Copilot, without human assistance. We release Data-Copilot-1.0 using massive Chinese financial data, e.g., stocks, funds, and news. Experiments indicate it achieves reliable performance with lower token consumption, showing promising application prospects.
[449] arXiv:2306.17436 (replaced) [pdf, ps, html, other]: Title: LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map

Xingyu Ji, Shenghai Yuan, Pengyu Yin, Lihua Xie

Subjects: Robotics (cs.RO)

This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distributions, a new residual metric is proposed for the filter-based Lidar inertial odometry, which demonstrates an improvement from merely quantifying distance to incorporating variance disparity, further enriching the comprehensiveness and accuracy of the residual metric. Due to the strategic design of the residual metric, we propose a simple yet effective voxel-solely mapping scheme, which only necessities the maintenance of one centroid and one covariance matrix for each voxel. Experiments on different datasets demonstrate the robustness and accuracy of our framework for various data inputs and environments. To the benefit of the robotics society, we open source the code at this https URL.
[450] arXiv:2307.02330 (replaced) [pdf, ps, other]: Title: Bibliometric Analysis of NIME References and Citations

Stefano Fasciani

Journal-ref: Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) 2024

Subjects: Digital Libraries (cs.DL)

This paper presents a bibliometric analysis that examines the works cited in, as well as those citing, NIME papers; for brevity, we refer to these as `references` and `citations`. Utilizing existing tools, we have computationally extracted data from the NIME proceedings archive and retrieved metadata from an academic database, including details of associated references and citations. From this data, we computed a range of metrics and statistics, which we present in this paper. We offer quantitative insights into NIME as a scholarly publication venue, its connections to other venues, and its relationship with various fields of study and authors. Based on our data interpretations, we provide several recommendations for the community's future. In sharing the software we developed for this study, and the summarized raw data, we enable other NIME researchers to conduct more in-depth investigations and examine specific trends.
[451] arXiv:2307.04216 (replaced) [pdf, ps, html, other]: Title: Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data

Hieu Le, Jian Tao

Comments: 14 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data, but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. 2D simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.
[452] arXiv:2307.04661 (replaced) [pdf, ps, other]: Title: On the power of graph neural networks and the role of the activation function

Sammy Khalife, Amitabh Basu

Subjects: Machine Learning (cs.LG)

In this article we present new results about the expressivity of Graph Neural Networks (GNNs). We prove that for any GNN with piecewise polynomial activations, whose architecture size does not grow with the graph input sizes, there exists a pair of non-isomorphic rooted trees of depth two such that the GNN cannot distinguish their root vertex up to an arbitrary number of iterations. The proof relies on tools from the algebra of symmetric polynomials. In contrast, it was already known that unbounded GNNs (those whose size is allowed to change with the graph sizes) with piecewise polynomial activations can distinguish these vertices in only two iterations. It was also known prior to our work that with ReLU (piecewise linear) activations, bounded GNNs are weaker than unbounded GNNs [Aamand & Al., 2022]. Our approach adds to this result by extending it to handle any piecewise polynomial activation function, which goes towards answering an open question formulated by Grohe [Grohe,2021] more completely. Our second result states that if one allows activations that are not piecewise polynomial, then in two iterations a single neuron perceptron can distinguish the root vertices of any pair of nonisomorphic trees of depth two (our results hold for activations like the sigmoid, hyperbolic tan and others). This shows how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.
[453] arXiv:2307.08200 (replaced) [pdf, ps, other]: Title: Ternary Stochastic Geometry Theory for Performance Analysis of RIS-Assisted UDN

Hongchi Lin, Qiyue yu

Comments: 13 pages, 10 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Statistics Theory (math.ST)

Currently, network topology becomes increasingly complex with the increased number of various network nodes, bringing in the challenge of network design and analysis. Most of the current studies are deduced based on the binary system stochastic geometry, overlooking the coupling and collaboration among nodes. This limitation makes it difficult to accurately analyze network systems, such as reconfigurable intelligent surface (RIS) assisted ultra-dense network (UDN). To address this issue, we propose a dual coordinate system analysis method, by using dual observation points and their established coordinates. The concept of a typical triangle that consists of a base station (BS), a RIS, and a user equipment (UE) is defined as the fundamental unit of analysis for ternary stochastic geometry. This triangle comprises the base station, the RIS, and the user equipment (UE). Furthermore, we extend Campbell's theorem and propose an approximate probability generating function for ternary stochastic geometry. Utilizing the theoretical framework of ternary stochastic geometry, we derive and analyze performance metrics of a RIS-assisted UDN system, such as coverage probability, area spectral efficiency, area energy efficiency, and energy coverage efficiency. Simulation results show that RIS can significantly enhance system performance, particularly for UEs with high signal-to-interference-plus-noise ratios, exhibiting a phenomenon similar to the Matthew effect.
[454] arXiv:2307.11833 (replaced) [pdf, ps, other]: Title: PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash

Comments: 17 pages (including 9 pages of main text, 3 pages of references, and 5 pages of appendix), 9 figures, 7 tables

Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs). However, conventional PINNs, relying on multilayer perceptrons (MLP), neglect the crucial temporal dependencies inherent in practical physics systems and thus fail to propagate the initial condition constraints globally and accurately capture the true solutions under various scenarios. In this paper, we introduce a novel Transformer-based framework, termed PINNsFormer, designed to address this limitation. PINNsFormer can accurately approximate PDE solutions by utilizing multi-head attention mechanisms to capture temporal dependencies. PINNsFormer transforms point-wise inputs into pseudo sequences and replaces point-wise PINNs loss with a sequential loss. Additionally, it incorporates a novel activation function, Wavelet, which anticipates Fourier decomposition through deep neural networks. Empirical results demonstrate that PINNsFormer achieves superior generalization ability and accuracy across various scenarios, including PINNs failure modes and high-dimensional PDEs. Moreover, PINNsFormer offers flexibility in integrating existing learning schemes for PINNs, further enhancing its performance.
[455] arXiv:2307.12732 (replaced) [pdf, ps, html, other]: Title: CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang, Junyu Bi, Xinqiang Yu, Han Yang, Boyu Diao, Yongjun Xu

Comments: CVPR-2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Contrastive Language-Image Pre-training (CLIP) has become a promising language-supervised visual pre-training framework. This paper aims to distill small CLIP models supervised by a large teacher CLIP model. We propose several distillation strategies, including relation, feature, gradient and contrastive paradigms, to examine the effectiveness of CLIP-Knowledge Distillation (KD). We show that a simple feature mimicry with Mean Squared Error loss works surprisingly well. Moreover, interactive contrastive learning across teacher and student encoders is also effective in performance improvement. We explain that the success of CLIP-KD can be attributed to maximizing the feature similarity between teacher and student. The unified method is applied to distill several student models trained on CC3M+12M. CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion-400M as the teacher, CLIP-KD achieves 57.5\% and 55.4\% zero-shot top-1 ImageNet accuracy over ViT-B/16 and ResNet-50, surpassing the original CLIP without KD by 20.5\% and 20.1\% margins, respectively. Our code is released on this https URL.
[456] arXiv:2308.05199 (replaced) [pdf, ps, html, other]: Title: gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur

Comments: 12 pages, 13 figures, and 2 tables. ICS '24

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to serious performance issues such as underutilized GPU devices and uncontrolled data distortion. In order to address these issues, in this paper, we propose gZCCL, a first-ever general framework that designs and optimizes GPU-aware, compression-enabled collectives with an accuracy-aware design to control error propagation. To validate our framework, we evaluate the performance on up to 512 NVIDIA A100 GPUs with real-world applications and datasets. Experimental results demonstrate that our gZCCL-accelerated collectives, including both collective computation (Allreduce) and collective data movement (Scatter), can outperform NCCL as well as Cray MPI by up to 4.5X and 28.7X, respectively. Furthermore, our accuracy evaluation with an image-stacking application confirms the high reconstructed data quality of our accuracy-aware framework.
[457] arXiv:2308.05335 (replaced) [pdf, ps, html, other]: Title: Match-based solution of general parametric eigenvalue problems

Davide Pradovera, Alessandro Borghi

Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)

We describe a novel algorithm for solving general parametric (nonlinear) eigenvalue problems. Our method has two steps: first, high-accuracy solutions of non-parametric versions of the problem are gathered at some values of the parameters; these are then combined to obtain global approximations of the parametric eigenvalues. To gather the non-parametric data, we use non-intrusive contour-integration-based methods, which, however, cannot track eigenvalues that migrate into/out of the contour as the parameter changes. Special strategies are described for performing the combination-over-parameter step despite having only partial information on such migrating eigenvalues. Moreover, we dedicate a special focus to the approximation of eigenvalues that undergo bifurcations. Finally, we propose an adaptive strategy that allows one to effectively apply our method even without any a priori information on the behavior of the sought-after eigenvalues. Numerical tests are performed, showing that our algorithm can achieve remarkably high approximation accuracy.
[458] arXiv:2308.05485 (replaced) [pdf, ps, html, other]: Title: Substitution for Non-Wellfounded Syntax with Binders through Monoidal Categories

Ralph Matthes, Kobe Wullaert, Benedikt Ahrens

Comments: v3: reworked introduction, expanded examples, changes in textual formulations; this is the full version for the upcoming FSCD'24 paper with the same title, the difference is the presence of Appendix B and Appendix D

Subjects: Programming Languages (cs.PL)

We describe a generic construction of non-wellfounded syntax involving variable binding and its monadic substitution operation. Our construction of the syntax and its substitution takes place in category theory, notably by using monoidal categories and strong functors between them. A language is specified by a multi-sorted binding signature, say {\Sigma}. First, we provide sufficient criteria for {\Sigma} to generate a language of possibly infinite terms, through {\omega}-continuity. Second, we construct a monadic substitution operation for the language generated by {\Sigma}. A cornerstone in this construction is a mild generalization of the notion of heterogeneous substitution systems developed by Matthes and Uustalu; such a system encapsulates the necessary corecursion scheme for implementing substitution. The results are formalized in the Coq proof assistant, through the UniMath library of univalent mathematics.
[459] arXiv:2308.05739 (replaced) [pdf, ps, other]: Title: Zero Grads: Learning Local Surrogate Losses for Non-Differentiable Graphics

Michael Fischer, Tobias Ritschel

Comments: Accepted at SIGGRAPH 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a ``surrogate'' that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to other derivative-free algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.
[460] arXiv:2308.09520 (replaced) [pdf, ps, other]: Title: Proceedings of the 2nd International Workshop on Adaptive Cyber Defense

Marco Carvalho, Damian Marriott, Mark Bilinski, Ahmad Ridley

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The 2nd International Workshop on Adaptive Cyber Defense was held at the Florida Institute of Technology, Florida. This workshop was organized to share research that explores unique applications of Artificial Intelligence (AI) and Machine Learning (ML) as foundational capabilities for the pursuit of adaptive cyber defense. The cyber domain cannot currently be reliably and effectively defended without extensive reliance on human experts. Skilled cyber defenders are in short supply and often cannot respond fast enough to cyber threats.
Building on recent advances in AI and ML the Cyber defense research community has been motivated to develop new dynamic and sustainable defenses through the adoption of AI and ML techniques to cyber settings. Bridging critical gaps between AI and Cyber researchers and practitioners can accelerate efforts to create semi-autonomous cyber defenses that can learn to recognize and respond to cyber attacks or discover and mitigate weaknesses in cooperation with other cyber operation systems and human experts. Furthermore, these defenses are expected to be adaptive and able to evolve over time to thwart changes in attacker behavior, changes in the system health and readiness, and natural shifts in user behavior over time.
The workshop was comprised of invited keynote talks, technical presentations and a panel discussion about how AI/ML can enable autonomous mitigation of current and future cyber attacks. Workshop submissions were peer reviewed by a panel of domain experts with a proceedings consisting of six technical articles exploring challenging problems of critical importance to national and global security. Participation in this workshop offered new opportunities to stimulate research and innovation in the emerging domain of adaptive and autonomous cyber defense.
[461] arXiv:2308.09764 (replaced) [pdf, ps, html, other]: Title: The Impact of Background Removal on Performance of Neural Networks for Fashion Image Classification and Segmentation

Junhui Liang, Ying Liu, Vladimir Vlassov

Comments: 9 pages, 9 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Fashion understanding is a hot topic in computer vision, with many applications having great business value in the market. Fashion understanding remains a difficult challenge for computer vision due to the immense diversity of garments and various scenes and backgrounds. In this work, we try removing the background from fashion images to boost data quality and increase model performance. Having fashion images of evident persons in fully visible garments, we can utilize Salient Object Detection to achieve the background removal of fashion data to our expectations. A fashion image with the background removed is claimed as the "rembg" image, contrasting with the original one in the fashion dataset. We conducted extensive comparative experiments with these two types of images on multiple aspects of model training, including model architectures, model initialization, compatibility with other training tricks and data augmentations, and target task types. Our experiments show that background removal can effectively work for fashion data in simple and shallow networks that are not susceptible to overfitting. It can improve model accuracy by up to 5% in the classification on the FashionStyle14 dataset when training models from scratch. However, background removal does not perform well in deep neural networks due to incompatibility with other regularization techniques like batch normalization, pre-trained initialization, and data augmentations introducing randomness. The loss of background pixels invalidates many existing training tricks in the model training, adding the risk of overfitting for deep models.
[462] arXiv:2308.10720 (replaced) [pdf, ps, html, other]: Title: On the accuracy of interpolation based on single-layer artificial neural networks with a focus on defeating the Runge phenomenon

Ferdinando Auricchio, Maria Roberta Belardo, Gianluca Fabiani, Francesco Calabrò, Ariel F. Pascaner

Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI)

In the present paper, we consider one-hidden layer ANNs with a feedforward architecture, also referred to as shallow or two-layer networks, so that the structure is determined by the number and types of neurons. The determination of the parameters that define the function, called training, is done via the resolution of the approximation problem, so by imposing the interpolation through a set of specific nodes. We present the case where the parameters are trained using a procedure that is referred to as Extreme Learning Machine (ELM) that leads to a linear interpolation problem. In such hypotheses, the existence of an ANN interpolating function is guaranteed. The focus is then on the accuracy of the interpolation outside of the given sampling interpolation nodes when they are the equispaced, the Chebychev, and the randomly selected ones. The study is motivated by the well-known bell-shaped Runge example, which makes it clear that the construction of a global interpolating polynomial is accurate only if trained on suitably chosen nodes, ad example the Chebychev ones. In order to evaluate the behavior when growing the number of interpolation nodes, we raise the number of neurons in our network and compare it with the interpolating polynomial. We test using Runge's function and other well-known examples with different regularities. As expected, the accuracy of the approximation with a global polynomial increases only if the Chebychev nodes are considered. Instead, the error for the ANN interpolating function always decays and in most cases we observe that the convergence follows what is observed in the polynomial case on Chebychev nodes, despite the set of nodes used for training.
[463] arXiv:2308.11905 (replaced) [pdf, ps, html, other]: Title: On Using Admissible Bounds for Learning Forward Search Heuristics

Carlos Núñez-Molina, Masataro Asai, Pablo Mesejo, Juan Fernández-Olivares

Comments: 19 pages, 2 figures

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In recent years, there has been growing interest in utilizing modern machine learning techniques to learn heuristic functions for forward search algorithms. Despite this, there has been little theoretical understanding of what they should learn, how to train them, and why we do so. This lack of understanding has resulted in the adoption of diverse training targets (suboptimal vs optimal costs vs admissible heuristics) and loss functions (e.g., square vs absolute errors) in the literature. In this work, we focus on how to effectively utilize the information provided by admissible heuristics in heuristic learning. We argue that learning from poly-time admissible heuristics by minimizing mean square errors (MSE) is not the correct approach, since its result is merely a noisy, inadmissible copy of an efficiently computable heuristic. Instead, we propose to model the learned heuristic as a truncated gaussian, where admissible heuristics are used not as training targets but as lower bounds of this distribution. This results in a different loss function from the MSE commonly employed in the literature, which implicitly models the learned heuristic as a gaussian distribution. We conduct experiments where both MSE and our novel loss function are applied to learning a heuristic from optimal plan costs. Results show that our proposed method converges faster during training and yields better heuristics.
[464] arXiv:2308.15755 (replaced) [pdf, ps, other]: Title: Density Stabilization Strategies for Nonholonomic Agents on Compact Manifolds

Karthik Elamvazhuthi, Spring Berman

Comments: Extended preprint. Detailed proofs regarding properties of the generators. An additional numerical example of a density control problem in SO(3)

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

In this article, we consider the problem of stabilizing stochastic processes, which are constrained to a bounded Euclidean domain or a compact smooth manifold, to a given target probability density. Most existing works on modeling and control of robotic swarms that use PDE models assume that the robots' dynamics are holonomic, and hence, the associated stochastic processes have generators that are elliptic. We relax this assumption on the ellipticity of the generator of the stochastic processes, and consider the more practical case of the stabilization problem for a swarm of agents whose dynamics are given by a controllable driftless control-affine system. We construct state-feedback control laws that exponentially stabilize a swarm of nonholonomic agents to a target probability density that is sufficiently regular. State-feedback laws can stabilize a swarm only to target probability densities that are positive everywhere. To stabilize the swarm to probability densities that possibly have disconnected supports, we introduce a semilinear PDE model of a collection of interacting agents governed by a hybrid switching diffusion process. The interaction between the agents is modeled using a (mean-field) feedback law that is a function of the local density of the swarm, with the switching parameters as the control inputs. We show that the semilinear PDE system is globally asymptotically stable about the given target probability density. The stabilization strategies are verified without inter-agent interactions is verified numerically for agents that evolve according to the Brockett integrator and a nonholonomic system on the special orthogonal group of 3-dimensional rotations $SO(3)$. The stabilization strategy with inter-agent interactions is verified numerically for agents that evolve according to the Brockett integrator and a holonomic system on the sphere $S^2$.
[465] arXiv:2309.00470 (replaced) [pdf, ps, other]: Title: Deep Joint Source-Channel Coding for Adaptive Image Transmission over MIMO Channels

Haotian Wu, Yulin Shao, Chenghong Bian, Krystian Mikolajczyk, Deniz Gündüz

Comments: arXiv admin note: text overlap with arXiv:2210.15347

Subjects: Information Theory (cs.IT); Image and Video Processing (eess.IV)

This paper introduces a vision transformer (ViT)-based deep joint source and channel coding (DeepJSCC) scheme for wireless image transmission over multiple-input multiple-output (MIMO) channels, denoted as DeepJSCC-MIMO. We consider DeepJSCC-MIMO for adaptive image transmission in both open-loop and closed-loop MIMO systems. The novel DeepJSCC-MIMO architecture surpasses the classical separation-based benchmarks with robustness to channel estimation errors and showcases remarkable flexibility in adapting to diverse channel conditions and antenna numbers without requiring retraining. Specifically, by harnessing the self-attention mechanism of ViT, DeepJSCC-MIMO intelligently learns feature mapping and power allocation strategies tailored to the unique characteristics of the source image and prevailing channel conditions. Extensive numerical experiments validate the significant improvements in transmission quality achieved by DeepJSCC-MIMO for both open-loop and closed-loop MIMO systems across a wide range of scenarios. Moreover, DeepJSCC-MIMO exhibits robustness to varying channel conditions, channel estimation errors, and different antenna numbers, making it an appealing solution for emerging semantic communication systems.
[466] arXiv:2309.05172 (replaced) [pdf, ps, html, other]: Title: 2-Approximation for Prize-Collecting Steiner Forest

Ali Ahmadi, Iman Gholami, MohammadTaghi Hajiaghayi, Peyman Jabbarzade, Mohammad Mahdavi

Subjects: Data Structures and Algorithms (cs.DS)

Approximation algorithms for the prize-collecting Steiner forest problem (PCSF) have been a subject of research for over three decades, starting with the seminal works of Agrawal, Klein, and Ravi and Goemans and Williamson on Steiner forest and prize-collecting problems. In this paper, we propose and analyze a natural deterministic algorithm for PCSF that achieves a $2$-approximate solution in polynomial time. This represents a significant improvement compared to the previously best known algorithm with a $2.54$-approximation factor developed by Hajiaghayi and Jain in 2006. Furthermore, K{ö}nemann, Olver, Pashkovich, Ravi, Swamy, and Vygen have established an integrality gap of at least $9/4$ for the natural LP relaxation for PCSF. However, we surpass this gap through the utilization of a combinatorial algorithm and a novel analysis technique. Since $2$ is the best known approximation guarantee for Steiner forest problem, which is a special case of PCSF, our result matches this factor and closes the gap between the Steiner forest problem and its generalized version, PCSF.
[467] arXiv:2309.05347 (replaced) [pdf, ps, html, other]: Title: Asynchrony-Resilient Sleepy Total-Order Broadcast Protocols

Francesco D'Amato, Giuliano Losa, Luca Zanolini

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Dynamically available total-order broadcast (TOB) protocols tolerate fluctuating participation, e.g., as high as 99% of their participants going offline, which is especially useful in permissionless blockchain environments. However, dynamically available TOB protocols are synchronous protocols, and they lose their safety guarantees during periods of asynchrony. This is a major issue in practice.
In this paper, we propose a simple but effective mechanism for tolerating bounded periods of asynchrony in dynamically available TOB protocols that ensure safety deterministically. We propose to trade off assumptions limiting the online/offline churn rate in exchange for tolerating bounded asynchronous periods through the use of a configurable message-expiration period.
In practice, this allows picking a small synchrony bound $\delta$, and therefore obtain a fast protocol in the common case, knowing that the protocol tolerates occasional periods of duration at most $\pi>\delta$ during which the bound does not hold. We show how to apply this idea to a state-of-the-art protocol to make it tolerate bounded periods of asynchrony.
[468] arXiv:2309.05950 (replaced) [pdf, ps, html, other]: Title: Language Models as Black-Box Optimizers for Vision-Language Models

Shihong Liu, Zhiqiu Lin, Samuel Yu, Ryan Lee, Tiffany Ling, Deepak Pathak, Deva Ramanan

Comments: Published at CVPR 2024. Project site: this https URL

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

Vision-language models (VLMs) pre-trained on web-scale datasets have demonstrated remarkable capabilities on downstream tasks when fine-tuned with minimal data. However, many VLMs rely on proprietary data and are not open-source, which restricts the use of white-box approaches for fine-tuning. As such, we aim to develop a black-box approach to optimize VLMs through natural language prompts, thereby avoiding the need to access model parameters, feature embeddings, or even output logits. We propose employing chat-based LLMs to search for the best text prompt for VLMs. Specifically, we adopt an automatic hill-climbing procedure that converges to an effective prompt by evaluating the performance of current prompts and asking LLMs to refine them based on textual feedback, all within a conversational process without human-in-the-loop. In a challenging 1-shot image classification setup, our simple approach surpasses the white-box continuous prompting method (CoOp) by an average of 1.5% across 11 datasets including ImageNet. Our approach also outperforms both human-engineered and LLM-generated prompts. We highlight the advantage of conversational feedback that incorporates both positive and negative prompts, suggesting that LLMs can utilize the implicit gradient direction in textual feedback for a more efficient search. In addition, we find that the text prompts generated through our strategy are not only more interpretable but also transfer well across different VLM architectures in a black-box manner. Lastly, we apply our framework to optimize the state-of-the-art black-box VLM (DALL-E 3) for text-to-image generation, prompt inversion, and personalization.
[469] arXiv:2309.11872 (replaced) [pdf, ps, html, other]: Title: Near-Field Beam Training: Joint Angle and Range Estimation with DFT Codebook

Xun Wu, Changsheng You, Jiapeng Li, Yunpu Zhang

Comments: This article has been accepted for publication in IEEE Transactions on Wireless Communications

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Prior works on near-field beam training have mostly assumed dedicated polar-domain codebook and on-grid range estimation, which, however, may suffer long training overhead and degraded estimation accuracy. To address these issues, we propose in this paper new and efficient beam training schemes with off-grid range estimation by using conventional discrete Fourier transform (DFT) codebook. Specifically, we first analyze the received beam pattern at the user when far-field beamforming vectors are used for beam scanning, and show an interesting result that this beam pattern contains useful user angle and range information. Then, we propose two efficient schemes to jointly estimate the user angle and range with the DFT codebook. The first scheme estimates the user angle based on a defined angular support and resolves the user range by leveraging an approximated angular support width, while the second scheme estimates the user range by minimizing a power ratio mean square error (MSE) to improve the range estimation accuracy. Finally, numerical simulations show that our proposed schemes greatly reduce the near-field beam training overhead and improve the range estimation accuracy as compared to various benchmark schemes.
[470] arXiv:2309.13579 (replaced) [pdf, ps, other]: Title: Seeing Is Not Always Believing: Invisible Collision Attack and Defence on Pre-Trained Models

Minghang Deng, Zhong Zhang, Junming Shao

Comments: 10 pages, 4 figures

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large-scale pre-trained models (PTMs) such as BERT and GPT have achieved great success in diverse fields. The typical paradigm is to pre-train a big deep learning model on large-scale data sets, and then fine-tune the model on small task-specific data sets for downstream tasks. Although PTMs have rapidly progressed with wide real-world applications, they also pose significant risks of potential attacks. Existing backdoor attacks or data poisoning methods often build up the assumption that the attacker invades the computers of victims or accesses the target data, which is challenging in real-world scenarios. In this paper, we propose a novel framework for an invisible attack on PTMs with enhanced MD5 collision. The key idea is to generate two equal-size models with the same MD5 checksum by leveraging the MD5 chosen-prefix collision. Afterwards, the two ``same" models will be deployed on public websites to induce victims to download the poisoned model. Unlike conventional attacks on deep learning models, this new attack is flexible, covert, and model-independent. Additionally, we propose a simple defensive strategy for recognizing the MD5 chosen-prefix collision and provide a theoretical justification for its feasibility. We extensively validate the effectiveness and stealthiness of our proposed attack and defensive method on different models and data sets.
[471] arXiv:2310.01556 (replaced) [pdf, ps, html, other]: Title: Second-order exponential splittings in the presence of unbounded and time-dependent operators: construction and convergence

Karolina Kropielnicka, Juan Carlos del Valle

Comments: 16 pages, 2 figures

Subjects: Numerical Analysis (math.NA)

For linear differential equations of the form $u'(t)=[A + B(t)] u(t)$, $t\geq0$, with a possibly unbounded operator $A$, we construct and deduce error bounds for two families of second-order exponential splittings. The role of quadratures when integrating the twice-iterated Duhamel's formula is reformulated: we show that their choice defines the structure of the splitting. Furthermore, the reformulation allows us to consider quadratures based on the Birkhoff interpolation to obtain not only pure-stages splittings but also those containing derivatives of $B(t)$ and commutators of $A$ and $B(t)$. In this approach, the construction and error analysis of the splittings are carried out simultaneously. We discuss the accuracy of the members of the families. Numerical experiments are presented to complement the theoretical consideration.
[472] arXiv:2310.03967 (replaced) [pdf, ps, html, other]: Title: Sub-token ViT Embedding via Stochastic Resonance Transformers

Dong Lao, Yangchao Wu, Tian Yu Liu, Alex Wong, Stefano Soatto

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Vision Transformer (ViT) architectures represent images as collections of high-dimensional vectorized tokens, each corresponding to a rectangular non-overlapping patch. This representation trades spatial granularity for embedding dimensionality, and results in semantically rich but spatially coarsely quantized feature maps. In order to retrieve spatial details beneficial to fine-grained inference tasks we propose a training-free method inspired by "stochastic resonance". Specifically, we perform sub-token spatial transformations to the input data, and aggregate the resulting ViT features after applying the inverse transformation. The resulting "Stochastic Resonance Transformer" (SRT) retains the rich semantic information of the original representation, but grounds it on a finer-scale spatial domain, partly mitigating the coarse effect of spatial tokenization. SRT is applicable across any layer of any ViT architecture, consistently boosting performance on several tasks including segmentation, classification, depth estimation, and others by up to 14.9% without the need for any fine-tuning.
[473] arXiv:2310.04003 (replaced) [pdf, ps, other]: Title: The Role of Federated Learning in a Wireless World with Foundation Models

Zihan Chen, Howard H. Yang, Y. C. Tay, Kai Fong Ernest Chong, Tony Q. S. Quek

Comments: 8 pages, 4 figures, 2 tables. This version has been accepted by IEEE Wireless Communications

Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Systems and Control (eess.SY)

Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interplay between FMs and FL is still in its nascent stage. Naturally, FMs are capable of boosting the performance of FL, and FL could also leverage decentralized data and computing resources to assist in the training of FMs. However, the exceptionally high requirements that FMs have for computing resources, storage, and communication overhead would pose critical challenges to FL-enabled wireless networks. In this article, we explore the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities. In particular, we discuss multiple new paradigms for realizing future intelligent networks that integrate FMs and FL. We also consolidate several broad research directions associated with these paradigms.
[474] arXiv:2310.04558 (replaced) [pdf, ps, html, other]: Title: VTON-IT: Virtual Try-On using Image Translation

Santosh Adhikari, Bishnu Bhusal, Prashant Ghimire, Anil Shrestha

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Virtual Try-On (trying clothes virtually) is a promising application of the Generative Adversarial Network (GAN). However, it is an arduous task to transfer the desired clothing item onto the corresponding regions of a human body because of varying body size, pose, and occlusions like hair and overlapped clothes. In this paper, we try to produce photo-realistic translated images through semantic segmentation and a generative adversarial architecture-based image translation network. We present a novel image-based Virtual Try-On application VTON-IT that takes an RGB image, segments desired body part, and overlays target cloth over the segmented body region. Most state-of-the-art GAN-based Virtual Try-On applications produce unaligned pixelated synthesis images on real-life test images. However, our approach generates high-resolution natural images with detailed textures on such variant images.
[475] arXiv:2310.05839 (replaced) [pdf, ps, other]: Title: Parameterized Complexity of MinCSP over the Point Algebra

George Osipov, Marcin Pilipczuk, Magnus Wahlström

Subjects: Data Structures and Algorithms (cs.DS)

The input in the Minimum-Cost Constraint Satisfaction Problem (MinCSP) over the Point Algebra contains a set of variables, a collection of constraints of the form $x < y$, $x = y$, $x \leq y$ and $x \neq y$, and a budget $k$. The goal is to check whether it is possible to assign rational values to the variables while breaking constraints of total cost at most $k$. This problem generalizes several prominent graph separation and transversal problems: MinCSP$(<)$ is equivalent to Directed Feedback Arc Set, MinCSP$(<,\leq)$ is equivalent to Directed Subset Feedback Arc Set, MinCSP$(=,\neq)$ is equivalent to Edge Multicut, and MinCSP$(\leq,\neq)$ is equivalent to Directed Symmetric Multicut. Apart from trivial cases, MinCSP$(\Gamma)$ for $\Gamma \subseteq \{<,=,\leq,\neq\}$ is NP-hard even to approximate within any constant factor under the Unique Games Conjecture. Hence, we study parameterized complexity of this problem under a natural parameterization by the solution cost $k$. We obtain a complete classification: if $\Gamma \subseteq \{<,=,\leq,\neq\}$ contains both $\leq$ and $\neq$, then MinCSP$(\Gamma)$ is W[1]-hard, otherwise it is fixed-parameter tractable. For the positive cases, we solve MinCSP$(<,=,\neq)$, generalizing the FPT results for Directed Feedback Arc Set and Edge Multicut as well as their weighted versions. Our algorithm works by reducing the problem into a Boolean MinCSP, which is in turn solved by flow augmentation. For the lower bounds, we prove that Directed Symmetric Multicut is W[1]-hard, solving an open problem.
[476] arXiv:2310.08842 (replaced) [pdf, ps, other]: Title: A Case-Based Persistent Memory for a Large Language Model

Ian Watson

Comments: 8 pages, 1 figure

Subjects: Artificial Intelligence (cs.AI)

Case-based reasoning (CBR) as a methodology for problem-solving can use any appropriate computational technique. This position paper argues that CBR researchers have somewhat overlooked recent developments in deep learning and large language models (LLMs). The underlying technical developments that have enabled the recent breakthroughs in AI have strong synergies with CBR and could be used to provide a persistent memory for LLMs to make progress towards Artificial General Intelligence.
[477] arXiv:2310.13023 (replaced) [pdf, ps, html, other]: Title: GraphGPT: Graph Instruction Tuning for Large Language Models

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang

Comments: Accepted by SIGIR'2024, full paper

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Graph Neural Networks (GNNs) have evolved to understand graph structures through recursive exchanges and aggregations among nodes. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Traditional methods often depend on fine-tuning with task-specific labels, limiting their effectiveness when labeled data is scarce. Our research tackles this by advancing graph model generalization in zero-shot learning environments. Inspired by the success of large language models (LLMs), we aim to create a graph-oriented LLM capable of exceptional generalization across various datasets and tasks without relying on downstream graph data. We introduce the GraphGPT framework, which integrates LLMs with graph structural knowledge through graph instruction tuning. This framework includes a text-graph grounding component to link textual and graph structures and a dual-stage instruction tuning approach with a lightweight graph-text alignment projector. These innovations allow LLMs to comprehend complex graph structures and enhance adaptability across diverse datasets and tasks. Our framework demonstrates superior generalization in both supervised and zero-shot graph learning tasks, surpassing existing benchmarks. The open-sourced model implementation of our GraphGPT is available at this https URL.
[478] arXiv:2310.16242 (replaced) [pdf, ps, html, other]: Title: ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality

Yonchanok Khaokaew, Kaixin Ji, Thuc Hanh Nguyen, Hiruni Kegalle, Marwah Alaofi, Hao Xue, Flora D. Salim

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

This paper explores the intersection of technology and sleep pattern comprehension, presenting a cutting-edge two-stage framework that harnesses the power of Large Language Models (LLMs). The primary objective is to deliver precise sleep predictions paired with actionable feedback, addressing the limitations of existing solutions. This innovative approach involves leveraging the GLOBEM dataset alongside synthetic data generated by LLMs. The results highlight significant improvements, underlining the efficacy of merging advanced machine-learning techniques with a user-centric design ethos. Through this exploration, we bridge the gap between technological sophistication and user-friendly design, ensuring that our framework yields accurate predictions and translates them into actionable insights.
[479] arXiv:2310.19453 (replaced) [pdf, ps, html, other]: Title: FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction

Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu

Comments: Under Review

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs. The code is at \url{this https URL}.
[480] arXiv:2311.00187 (replaced) [pdf, ps, other]: Title: Decodable and Sample Invariant Continuous Object Encoder

Dehao Yuan, Furong Huang, Cornelia Fermüller, Yiannis Aloimonos

Comments: ICLR2024 Conference Paper

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose Hyper-Dimensional Function Encoding (HDFE). Given samples of a continuous object (e.g. a function), HDFE produces an explicit vector representation of the given object, invariant to the sample distribution and density. Sample distribution and density invariance enables HDFE to consistently encode continuous objects regardless of their sampling, and therefore allows neural networks to receive continuous objects as inputs for machine learning tasks, such as classification and regression. Besides, HDFE does not require any training and is proved to map the object into an organized embedding space, which facilitates the training of the downstream tasks. In addition, the encoding is decodable, which enables neural networks to regress continuous objects by regressing their encodings. Therefore, HDFE serves as an interface for processing continuous objects.
We apply HDFE to function-to-function mapping, where vanilla HDFE achieves competitive performance as the state-of-the-art algorithm. We apply HDFE to point cloud surface normal estimation, where a simple replacement from PointNet to HDFE leads to immediate 12% and 15% error reductions in two benchmarks. In addition, by integrating HDFE into the PointNet-based SOTA network, we improve the SOTA baseline by 2.5% and 1.7% in the same benchmarks.
[481] arXiv:2311.00208 (replaced) [pdf, ps, html, other]: Title: What Formal Languages Can Transformers Express? A Survey

Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the power of transformers relative to other models of computation, their fundamental capabilities and limits, and the impact of architectural choices. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.
[482] arXiv:2311.00651 (replaced) [pdf, ps, other]: Title: Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning

Richard Bornemann, Gautier Hamon, Eleni Nisioti, Clément Moulin-Frier

Comments: Published at the 2nd Agent Learning in Open-Endedness Workshop at NeurIPS 2023

Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.
[483] arXiv:2311.03036 (replaced) [pdf, ps, other]: Title: On regularized polynomial functional regression

Markus Holzleitner, Sergei Pereverzyev

Comments: 26 pages

Journal-ref: Journal of Complexity, Volume 83, August 2024, 101853

Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Statistics Theory (math.ST)

This article offers a comprehensive treatment of polynomial functional regression, culminating in the establishment of a novel finite sample bound. This bound encompasses various aspects, including general smoothness conditions, capacity conditions, and regularization techniques. In doing so, it extends and generalizes several findings from the context of linear functional regression as well. We also provide numerical evidence that using higher order polynomial terms can lead to an improved performance.
[484] arXiv:2311.04502 (replaced) [pdf, ps, html, other]: Title: TADA: Making Node-link Diagrams Accessible to Blind and Low-Vision People

Yichun Zhao, Miguel A. Nacenta, Mahadeo A. Sukhai, Sowmya Somanath

Comments: Honourable Mention Award. ACM CHI 2024. DOI: https://doi.org/10.1145/3613904.3642222

Journal-ref: In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA, 20 pages

Subjects: Human-Computer Interaction (cs.HC)

Diagrams often appear as node-link representations in many contexts, such as taxonomies, mind maps and networks in textbooks. Despite their pervasiveness, they present significant accessibility challenges for blind and low-vision people. To address this challenge, we introduce Touch-and-Audio-based Diagram Access (TADA), a tablet-based interactive system that makes diagram exploration accessible through musical tones and speech. We designed and developed TADA informed by insights gained from an interview study with 15 participants who shared their challenges and strategies for accessing diagrams. TADA enables people to access a diagram by: i) engaging in open-ended touch-based explorations, ii) allowing searching of specific nodes, iii) navigating from one node to another and iv) filtering information. We evaluated TADA with 25 participants and found that it can be a useful tool for gaining different perspectives about the diagram and participants could complete several diagram-related tasks.
[485] arXiv:2311.04940 (replaced) [pdf, ps, other]: Title: Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

Jin-Jian Xu, Hao Zhang, Chao-Sheng Tang, Lin Li, Bin Shi

Comments: there are some erros in the results, and a newer revision is still preparing

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

As Earth science enters the era of big data, artificial intelligence (AI) not only offers great potential for solving geoscience problems, but also plays a critical role in accelerating the understanding of the complex, interactive, and multiscale processes of Earth's behavior. As geoscience AI models are progressively utilized for significant predictions in crucial situations, geoscience researchers are increasingly demanding their interpretability and versatility. This study proposes an interpretable geoscience artificial intelligence (XGeoS-AI) framework to unravel the mystery of image recognition in the Earth sciences, and its effectiveness and versatility is demonstrated by taking computed tomography (CT) image recognition as an example. Inspired by the mechanism of human vision, the proposed XGeoS-AI framework generates a threshold value from a local region within the whole image to complete the recognition. Different kinds of artificial intelligence (AI) methods, such as Support Vector Regression (SVR), Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), can be adopted as the AI engines of the proposed XGeoS-AI framework to efficiently complete geoscience image recognition tasks. Experimental results demonstrate that the effectiveness, versatility, and heuristics of the proposed framework have great potential in solving geoscience image recognition problems. Interpretable AI should receive more and more attention in the field of the Earth sciences, which is the key to promoting more rational and wider applications of AI in the field of Earth sciences. In addition, the proposed interpretable framework may be the forerunner of technological innovation in the Earth sciences.
[486] arXiv:2311.07427 (replaced) [pdf, ps, html, other]: Title: Boolean Variation and Boolean Logic BackPropagation

Van Minh Nguyen

Comments: v2

Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO); Optimization and Control (math.OC)

The notion of variation is introduced for the Boolean set and based on which Boolean logic backpropagation principle is developed. Using this concept, deep models can be built with weights and activations being Boolean numbers and operated with Boolean logic instead of real arithmetic. In particular, Boolean deep models can be trained directly in the Boolean domain without latent weights. No gradient but logic is synthesized and backpropagated through layers.
[487] arXiv:2311.07761 (replaced) [pdf, ps, html, other]: Title: Amodal Optical Flow

Maximilian Luz, Rohit Mohan, Ahmed Rida Sekkat, Oliver Sawade, Elmar Matthes, Thomas Brox, Abhinav Valada

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at this http URL.
[488] arXiv:2311.07790 (replaced) [pdf, ps, html, other]: Title: Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning

Paula Chen, Tingwei Meng, Zongren Zou, Jérôme Darbon, George Em Karniadakis

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

We address two major challenges in scientific machine learning (SciML): interpretability and computational efficiency. We increase the interpretability of certain learning processes by establishing a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula, which represents the viscosity solution to a Hamilton-Jacobi partial differential equation (HJ PDE) with time-dependent Hamiltonian. Namely, we show that when we solve certain regularized learning problems with integral-type losses, we actually solve an optimal control problem and its associated HJ PDE with time-dependent Hamiltonian. This connection allows us to reinterpret incremental updates to learned models as the evolution of an associated HJ PDE and optimal control problem in time, where all of the previous information is intrinsically encoded in the solution to the HJ PDE. As a result, existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches for SciML that naturally coincide with the continual learning framework, while avoiding catastrophic forgetting. As a first exploration of this connection, we consider the special case of linear regression and leverage our connection to develop a new Riccati-based methodology for solving these learning problems that is amenable to continual learning applications. We also provide some corresponding numerical examples that demonstrate the potential computational and memory advantages our Riccati-based approach can provide.
[489] arXiv:2311.14114 (replaced) [pdf, ps, html, other]: Title: SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions

Cyrus Zhou, Pedro Savarese, Vaughn Richard, Zack Hassman, Xin Yuan, Michael Maire, Michael DiBrino, Yanjing Li

Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)

Recent quantization techniques have enabled heterogeneous precisions at very fine granularity, e.g., each parameter/activation can take on a different precision, resulting in compact neural networks without sacrificing accuracy. However, there is a lack of efficient architectural support for such networks, which require additional hardware to decode the precision settings for individual variables, align the variables, and provide fine-grained mixed-precision compute capabilities. The complexity of these operations introduces high overheads. Thus, the improvements in inference latency/energy of these networks are not commensurate with the compression ratio, and may be inferior to larger quantized networks with uniform precisions.
We present an end-to-end co-design approach encompassing computer architecture, training algorithm, and inference optimization to efficiently execute networks with fine-grained heterogeneous precisions. The key to our approach is a novel training algorithm designed to accommodate hardware constraints and inference operation requirements, outputting networks with input-channel-wise heterogeneous precisions and at most three precision levels. Combined with inference optimization techniques, existing architectures with low-cost enhancements can support such networks efficiently, yielding optimized tradeoffs between accuracy, compression ratio and inference latency/energy.
We demonstrate the efficacy of our approach across CPU and GPU architectures. For various representative neural networks, our approach achieves >10x improvements in both compression ratio and inference latency, with negligible degradation in accuracy compared to full-precision networks.
[490] arXiv:2311.15963 (replaced) [pdf, ps, html, other]: Title: From Pixels to Titles: Video Game Identification by Screenshots using Convolutional Neural Networks

Fabricio Breve

Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

This paper investigates video game identification through single screenshots, utilizing five convolutional neural network (CNN) architectures (MobileNet, DenseNet, EfficientNetB0, EfficientNetB2, and EfficientNetB3) across 22 home console systems, spanning from Atari 2600 to PlayStation 5, totalling 8,796 games and 170,881 screenshots. Confirming the hypothesis, CNNs autonomously extract image features, enabling the identification of game titles from screenshots without additional features. Using ImageNet pre-trained weights as initial weights, EfficientNetB3 achieves the highest average accuracy (74.51%), while DenseNet169 excels in 14 of the 22 systems. Employing alternative initial weights trained in an arcade screenshots dataset boosts accuracy for EfficientNetB2 and EfficientNetB3, with the latter reaching a peak accuracy of 76.36% and demonstrating reduced convergence epochs from 23.7 to 20.5 on average. Overall, the combination of optimal architecture and weights attains 77.67% accuracy, primarily led by EfficientNetB3 in 19 systems. These findings underscore the efficacy of CNNs in video game identification through screenshots.
[491] arXiv:2311.16114 (replaced) [pdf, ps, html, other]: Title: Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios

Qi Fan (1), Haolin Zuo (1), Rui Liu (1), Zheng Lian (2), Guanglai Gao (1) ((1) Inner Mongolia University, Hohhot, China, (2) Institute of Automation, Chinese Academy of Sciences, Beijing, China)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete conditions during the training phase to enhance the system's overall robustness. Traditional methods have often involved discarding data or substituting data segments with zero vectors to approximate these incompletenesses. However, such approaches neither accurately represent real-world conditions nor adequately address the issue of noisy data availability. For instance, a blurry image cannot be simply replaced with zero vectors, and still retain information. To tackle this issue and develop a more precise MER system, we introduce a novel noise-robust MER model that effectively learns robust multimodal joint representations from noisy data. This approach includes two pivotal components: firstly, a noise scheduler that adjusts the type and level of noise in the data to emulate various realistic incomplete situations. Secondly, a Variational AutoEncoder (VAE)-based module is employed to reconstruct these robust multimodal joint representations from the noisy inputs. Notably, the introduction of the noise scheduler enables the exploration of an entirely new type of incomplete data condition, which is impossible with existing methods. Extensive experimental evaluations on the benchmark datasets IEMOCAP and CMU-MOSEI demonstrate the effectiveness of the noise scheduler and the excellent performance of our proposed model.
[492] arXiv:2311.16520 (replaced) [pdf, ps, html, other]: Title: Value Approximation for Two-Player General-Sum Differential Games with State Constraints

Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren

Comments: Published to T-RO as preprint

Subjects: Robotics (cs.RO); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

Solving Hamilton-Jacobi-Isaacs (HJI) PDEs numerically enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD). While physics-informed neural networks (PINNs) have shown promise in alleviating CoD in solving PDEs, vanilla PINNs fall short in learning discontinuous solutions due to their sampling nature, leading to poor safety performance of the resulting policies when values are discontinuous due to state or temporal logic constraints. In this study, we explore three potential solutions to this challenge: (1) a hybrid learning method that is guided by both supervisory equilibria and the HJI PDE, (2) a value-hardening method where a sequence of HJIs are solved with increasing Lipschitz constant on the constraint violation penalty, and (3) the epigraphical technique that lifts the value to a higher dimensional state space where it becomes continuous. Evaluations through 5D and 9D vehicle and 13D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance by taking advantage of both the supervisory equilibrium values and costates, and the low cost of PINN loss gradients.
[493] arXiv:2311.16854 (replaced) [pdf, ps, other]: Title: A Unified Approach for Text- and Image-guided 4D Scene Generation

Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar Hilliges, Shalini De Mello

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion guidance remains largely unexplored. We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D diffusion guidance to effectively learn a high-quality static 3D asset in the first stage; (2) a deformable neural radiance field that explicitly disentangles the learned static asset from its deformation, preserving quality during motion learning; and (3) a multi-resolution feature grid for the deformation field with a displacement total variation loss to effectively learn motion with video diffusion guidance in the second stage. Through a user preference study, we demonstrate that our approach significantly advances image and motion quality, 3D consistency and text fidelity for text-to-4D generation compared to baseline approaches. Thanks to its motion-disentangled representation, Dream-in-4D can also be easily adapted for controllable generation where appearance is defined by one or multiple images, without the need to modify the motion learning stage. Thus, our method offers, for the first time, a unified approach for text-to-4D, image-to-4D and personalized 4D generation tasks.
[494] arXiv:2311.16953 (replaced) [pdf, ps, html, other]: Title: Local certification of geometric graph classes

Oscar Defrain, Louis Esperet, Aurélie Lagoutte, Pat Morin, Jean-Florent Raymond

Comments: 36 pages, 16 figures; v3: revised version

Subjects: Discrete Mathematics (cs.DM); Computational Geometry (cs.CG); Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)

The goal of local certification is to locally convince the vertices of a graph $G$ that $G$ satisfies a given property. A prover assigns short certificates to the vertices of the graph, then the vertices are allowed to check their certificates and the certificates of their neighbors, and based only on this local view, they must decide whether $G$ satisfies the given property. If the graph indeed satisfies the property, all vertices must accept the instance, and otherwise at least one vertex must reject the instance (for any possible assignment of certificates). The goal is to minimize the size of the certificates.
In this paper we study the local certification of geometric and topological graph classes. While it is known that in $n$-vertex graphs, planarity can be certified locally with certificates of size $O(\log n)$, we show that several closely related graph classes require certificates of size $\Omega(n)$. This includes penny graphs, unit-distance graphs, (induced) subgraphs of the square grid, 1-planar graphs, and unit-square graphs. These bounds are tight up to a constant factor and give the first known examples of hereditary (and even monotone) graph classes for which the certificates must have linear size. For unit-disk graphs we obtain a lower bound of $\Omega(n^{1-\delta})$ for any $\delta>0$ on the size of the certificates, and an upper bound of $O(n \log n)$. The lower bounds are obtained by proving rigidity properties of the considered graphs, which might be of independent interest.
[495] arXiv:2312.00761 (replaced) [pdf, ps, other]: Title: Deep Unlearning: Fast and Efficient Training-free Approach to Class Forgetting

Sangamesh Kodge, Gobinda Saha, Kaushik Roy

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Machine unlearning is a prominent and challenging field, driven by regulatory demands for user data deletion and heightened privacy awareness. Existing approaches involve retraining model or multiple finetuning steps for each deletion request, often constrained by computational limits and restricted data access. In this work, we introduce a novel class unlearning algorithm designed to strategically eliminate specific classes from the learned model. Our algorithm first estimates the Retain and the Forget Spaces using Singular Value Decomposition on the layerwise activations for a small subset of samples from the retain and unlearn classes, respectively. We then compute the shared information between these spaces and remove it from the forget space to isolate class-discriminatory feature space. Finally, we obtain the unlearned model by updating the weights to suppress the class discriminatory features from the activation spaces. We demonstrate our algorithm's efficacy on ImageNet using a Vision Transformer with only $\sim 1.5\%$ drop in retain accuracy compared to the original model while maintaining under $1\%$ accuracy on the unlearned class samples. Further, our algorithm consistently performs well when subject to Membership Inference Attacks showing $7.8\%$ improvement on average across a variety of image classification datasets and network architectures, as compared to other baselines while being $\sim 6 \times$ more computationally efficient. Our code is available at this https URL.
[496] arXiv:2312.05429 (replaced) [pdf, ps, other]: Title: Mitigating Nonlinear Algorithmic Bias in Binary Classification

Wendy Hui, Wai Kwong Lau

Comments: 5 pages, 3 figures, 12 tables. arXiv admin note: text overlap with arXiv:2310.12421

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Applications (stat.AP)

This paper proposes the use of causal modeling to detect and mitigate algorithmic bias that is nonlinear in the protected attribute. We provide a general overview of our approach. We use the German Credit data set, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on age bias and the problem of binary classification. We show that the probability of getting correctly classified as "low risk" is lowest among young people. The probability increases with age nonlinearly. To incorporate the nonlinearity into the causal model, we introduce a higher order polynomial term. Based on the fitted causal model, the de-biased probability estimates are computed, showing improved fairness with little impact on overall classification accuracy. Causal modeling is intuitive and, hence, its use can enhance explicability and promotes trust among different stakeholders of AI.
[497] arXiv:2312.07661 (replaced) [pdf, ps, html, other]: Title: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li

Comments: To appear in CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions. To alleviate these issues, we introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality without training efforts. The recurrent unit is a two-stage segmenter built upon a frozen VLM. Thus, our model retains the VLM's broad vocabulary space and equips it with segmentation ability. Experiments show that our method outperforms not only the training-free counterparts, but also those fine-tuned with millions of data samples, and sets the new state-of-the-art records for both zero-shot semantic and referring segmentation. Concretely, we improve the current record by 28.8, 16.0, and 6.9 mIoU on Pascal VOC, COCO Object, and Pascal Context.
[498] arXiv:2312.11360 (replaced) [pdf, ps, other]: Title: Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

Comments: CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: this https URL
[499] arXiv:2312.11681 (replaced) [pdf, ps, html, other]: Title: Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows

Madeleine Grunde-McLaughlin, Michelle S. Lam, Ranjay Krishna, Daniel S. Weld, Jeffrey Heer

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

LLM chains enable complex tasks by decomposing work into a sequence of subtasks. Similarly, the more established techniques of crowdsourcing workflows decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space covers a designer's objectives and the tactics used to build workflows. We then surface strategies that mediate how workflows use tactics to achieve objectives. To explore how techniques from crowdsourcing may apply to chaining, we adapt crowdsourcing workflows to implement LLM chains across three case studies: creating a taxonomy, shortening text, and writing a short story. From the design space and our case studies, we identify takeaways for effective chain design and raise implications for future research and development.
[500] arXiv:2312.12422 (replaced) [pdf, ps, html, other]: Title: Terrapin Attack: Breaking SSH Channel Integrity By Sequence Number Manipulation

Fabian Bäumer, Marcus Brinkmann, Jörg Schwenk

Comments: 21 pages, 7 figures; minor revision; accepted at USENIX Security 2024; also added Artifact Evaluation badges and final Artifact Appendix

Subjects: Cryptography and Security (cs.CR)

The SSH protocol provides secure access to network services, particularly remote terminal login and file transfer within organizational networks and to over 15 million servers on the open internet. SSH uses an authenticated key exchange to establish a secure channel between a client and a server, which protects the confidentiality and integrity of messages sent in either direction. The secure channel prevents message manipulation, replay, insertion, deletion, and reordering. At the network level, SSH uses the Binary Packet Protocol over TCP. In this paper, we show that the SSH Binary Packet Protocol is no longer a secure channel: SSH channel integrity (INT-PST, aINT-PTXT, and INT-sfCTF) is broken for three widely used encryption modes. This allows prefix truncation attacks where encrypted packets at the beginning of the SSH channel can be deleted without the client or server noticing it. We demonstrate several real-world applications of this attack. We show that we can fully break SSH extension negotiation (RFC 8308), such that an attacker can downgrade the public key algorithms for user authentication or turn off a new countermeasure against keystroke timing attacks introduced in OpenSSH 9.5. Further, we identify an implementation flaw in AsyncSSH that, together with prefix truncation, allows an attacker to redirect the victim's login into a shell controlled by the attacker. We also performed an internet-wide scan and found that 71.6% of SSH servers support a vulnerable encryption mode, while 63.2% even list it as their preferred choice. We identify two root causes that enable these attacks: First, the SSH handshake supports optional messages that are not authenticated. Second, SSH does not reset message sequence numbers when activating encryption keys. Based on this analysis, we propose effective and backward-compatible changes to SSH that mitigate our attacks.
[501] arXiv:2312.13094 (replaced) [pdf, ps, html, other]: Title: Automated MPI code generation for scalable finite-difference solvers

George Bisbas, Rhodri Nelson, Mathias Louboutin, Paul H.J. Kelly, Fabio Luporini, Gerard Gorman

Comments: 11 pages, 12 figures (18 pages with References and Appendix)

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Performance (cs.PF)

Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to solve explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modelling simulations at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. While the contributions of this work are implemented and integrated within the Devito framework, the DMP concepts and the techniques applied are generally applicable to any FD solvers. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive weak and strong scaling on the Archer2 supercomputer, demonstrating the effectiveness of the proposed approach in meeting the demands of large-scale scientific simulations.
[502] arXiv:2312.15345 (replaced) [pdf, ps, html, other]: Title: RoboFiSense: Attention-Based Robotic Arm Activity Recognition with WiFi Sensing

Rojin Zandi, Kian Behzad, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami

Comments: 11 pages, 11 figures

Subjects: Robotics (cs.RO); Signal Processing (eess.SP)

Despite the current surge of interest in autonomous robotic systems, robot activity recognition within restricted indoor environments remains a formidable challenge. Conventional methods for detecting and recognizing robotic arms' activities often rely on vision-based or light detection and ranging (LiDAR) sensors, which require line-of-sight (LoS) access and may raise privacy concerns, for example, in nursing facilities. This research pioneers an innovative approach harnessing channel state information (CSI) measured from WiFi signals, subtly influenced by the activity of robotic arms. We developed an attention-based network to classify eight distinct activities performed by a Franka Emika robotic arm in different situations. Our proposed bidirectional vision transformer-concatenated (BiVTC) methodology aspires to predict robotic arm activities accurately, even when trained on activities with different velocities, all without dependency on external or internal sensors or visual aids. Considering the high dependency of CSI data on the environment motivated us to study the problem of sniffer location selection, by systematically changing the sniffer's location and collecting different sets of data. Finally, this paper also marks the first publication of the CSI data of eight distinct robotic arm activities, collectively referred to as RoboFiSense. This initiative aims to provide a benchmark dataset and baselines to the research community, fostering advancements in the field of robotics sensing.
[503] arXiv:2312.15974 (replaced) [pdf, ps, html, other]: Title: About rescaling, discretisation and linearisation of $\mathtt{RNN}$

Mariano Caruso, Cecilia Jarne

Comments: 1 Figure, 11 pages

Subjects: Neural and Evolutionary Computing (cs.NE)

We explored the mathematical foundations of Recurrent Neural Networks ($\mathtt{RNN}$s) and three fundamental procedures: temporal rescaling, discretisation and linearisation. These techniques provide essential tools for characterizing $\mathtt{RNN}$s behaviour, enabling insights into temporal dynamics, practical computational implementation, and linear approximations for analysis. We discuss the flexible order of application of these procedures, emphasizing their significance in modelling and analyzing $\mathtt{RNN}$s for neuroscience and machine learning applications. We explicitly describe here under what conditions these procedures can be interchangeable.
[504] arXiv:2312.17641 (replaced) [pdf, ps, html, other]: Title: Motion State: A New Benchmark Multiple Object Tracking

Yang Feng, Liao Pan, Wu Di, Liu Bo, Zhang Xingle

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the realm of video analysis, the field of multiple object tracking (MOT) assumes paramount importance, with the motion state of objects-whether static or dynamic relative to the ground-holding practical significance across diverse scenarios. However, the extant literature exhibits a notable dearth in the exploration of this aspect. Deep learning methodologies encounter challenges in accurately discerning object motion states, while conventional approaches reliant on comprehensive mathematical modeling may yield suboptimal tracking accuracy. To address these challenges, we introduce a Model-Data-Driven Motion State Judgment Object Tracking Method (MoD2T). This innovative architecture adeptly amalgamates traditional mathematical modeling with deep learning-based multi-object tracking frameworks. The integration of mathematical modeling and deep learning within MoD2T enhances the precision of object motion state determination, thereby elevating tracking accuracy. Our empirical investigations comprehensively validate the efficacy of MoD2T across varied scenarios, encompassing unmanned aerial vehicle surveillance and street-level tracking. Furthermore, to gauge the method's adeptness in discerning object motion states, we introduce the Motion State Validation F1 (MVF1) metric. This novel performance metric aims to quantitatively assess the accuracy of motion state classification, furnishing a comprehensive evaluation of MoD2T's performance. Elaborate experimental validations corroborate the rationality of MVF1. In order to holistically appraise MoD2T's performance, we meticulously annotate several renowned datasets and subject MoD2T to stringent testing. Remarkably, under conditions characterized by minimal or moderate camera motion, the achieved MVF1 values are particularly noteworthy, with exemplars including 0.774 for the KITTI dataset, 0.521 for MOT17, and 0.827 for UAVDT.
[505] arXiv:2401.01404 (replaced) [pdf, ps, other]: Title: Scalable network reconstruction in subquadratic time

Tiago P. Peixoto

Comments: 12 pages, 7 figures. Code and documentation available at this https URL

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO); Machine Learning (stat.ML)

Network reconstruction consists in determining the unobserved pairwise couplings between $N$ nodes given only observational data on the resulting behavior that is conditioned on those couplings -- typically a time-series or independent samples from a graphical model. A major obstacle to the scalability of algorithms proposed for this problem is a seemingly unavoidable quadratic complexity of $\Omega(N^2)$, corresponding to the requirement of each possible pairwise coupling being contemplated at least once, despite the fact that most networks of interest are sparse, with a number of non-zero couplings that is only $O(N)$. Here we present a general algorithm applicable to a broad range of reconstruction problems that significantly outperforms this quadratic baseline. Our algorithm relies on a stochastic second neighbor search (Dong et al., 2011) that produces the best edge candidates with high probability, thus bypassing an exhaustive quadratic search. If we rely on the conjecture that the second-neighbor search finishes in log-linear time (Baron & Darling, 2020; 2022), we demonstrate theoretically that our algorithm finishes in subquadratic time, with a data-dependent complexity loosely upper bounded by $O(N^{3/2}\log N)$, but with a more typical log-linear complexity of $O(N\log^2N)$. In practice, we show that our algorithm achieves a performance that is many orders of magnitude faster than the quadratic baseline -- in a manner consistent with our theoretical analysis -- allows for easy parallelization, and thus enables the reconstruction of networks with hundreds of thousands and even millions of nodes and edges.
[506] arXiv:2401.01722 (replaced) [pdf, ps, other]: Title: Splitting Methods for differential equations

Sergio Blanes, Fernando Casas, Ander Murua

Comments: Review paper to be published in Acta Numerica 2024

Subjects: Numerical Analysis (math.NA)

This overview is devoted to splitting methods, a class of numerical integrators intended for differential equations that can be subdivided into different problems easier to solve than the original system. Closely connected with this class of integrators are composition methods, in which one or several low-order schemes are composed to construct higher-order numerical approximations to the exact solution. We analyze in detail the order conditions that have to be satisfied by these classes of methods to achieve a given order, and provide some insight about their qualitative properties in connection with geometric numerical integration and the treatment of highly oscillatory problems. Since splitting methods have received considerable attention in the realm of partial differential equations, we also cover this subject in the present survey, with special attention to parabolic equations and their problems. An exhaustive list of methods of different orders is collected and tested on simple examples. Finally, some applications of splitting methods in different areas, ranging from celestial mechanics to statistics, are also provided.
[507] arXiv:2401.06692 (replaced) [pdf, ps, html, other]: Title: An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Gantavya Bhatt, Yifang Chen, Arnav M. Das, Jifan Zhang, Sang T. Truong, Stephen Mussmann, Yinglun Zhu, Jeffrey Bilmes, Simon S. Du, Kevin Jamieson, Jordan T. Ash, Robert D. Nowak

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
[508] arXiv:2401.08661 (replaced) [pdf, ps, other]: Title: Risk-anticipatory autonomous driving strategies considering vehicles' weights, based on hierarchical deep reinforcement learning

Di Chen, Hao Li, Zhicheng Jin, Huizhao Tu, Meixin Zhu

Comments: 14 pages, 5 figures, 6 tables

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Autonomous vehicles (AVs) have the potential to prevent accidents caused by drivers errors and reduce road traffic risks. Due to the nature of heavy vehicles, whose collisions cause more serious crashes, the weights of vehicles need to be considered when making driving strategies aimed at reducing the potential risks and their consequences in the context of autonomous driving. This study develops an autonomous driving strategy based on risk anticipation, considering the weights of surrounding vehicles and using hierarchical deep reinforcement learning. A risk indicator integrating surrounding vehicles weights, based on the risk field theory, is proposed and incorporated into autonomous driving decisions. A hybrid action space is designed to allow for left lane changes, right lane changes and car-following, which enables AVs to act more freely and realistically whenever possible. To solve the above hybrid decision-making problem, a hierarchical proximal policy optimization (HPPO) algorithm with an attention mechanism (AT-HPPO) is developed, providing great advantages in maintaining stable performance with high robustness and generalization. An indicator, potential collision energy in conflicts (PCEC), is newly proposed to evaluate the performance of the developed AV driving strategy from the perspective of the consequences of potential accidents. The performance evaluation results in simulation and dataset demonstrate that our model provides driving strategies that reduce both the likelihood and consequences of potential accidents, at the same time maintaining driving efficiency. The developed method is especially meaningful for AVs driving on highways, where heavy vehicles make up a high proportion of the traffic.
[509] arXiv:2401.08818 (replaced) [pdf, ps, html, other]: Title: Link Me Baby One More Time: Social Music Discovery on Spotify

Shazia'Ayn Babul, Desislava Hristova, Antonio Lima, Renaud Lambiotte, Mariano Beguerisse-Díaz

Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Physics and Society (physics.soc-ph)

We explore the social and contextual factors that influence the outcome of person-to-person music recommendations and discovery. Specifically, we use data from Spotify to investigate how a link sent from one user to another results in the receiver engaging with the music of the shared artist. We consider several factors that may influence this process, such as the strength of the sender-receiver relationship, the user's role in the Spotify social network, their music social cohesion, and how similar the new artist is to the receiver's taste. We find that the receiver of a link is more likely to engage with a new artist when (1) they have similar music taste to the sender and the shared track is a good fit for their taste, (2) they have a stronger and more intimate tie with the sender, and (3) the shared artist is popular amongst the receiver's connections. Finally, we use these findings to build a Random Forest classifier to predict whether a shared music track will result in the receiver's engagement with the shared artist. This model elucidates which type of social and contextual features are most predictive, although peak performance is achieved when a diverse set of features are included. These findings provide new insights into the multifaceted mechanisms underpinning the interplay between music discovery and social processes.
[510] arXiv:2401.09937 (replaced) [pdf, ps, html, other]: Title: From Cash to Cashless: UPI's Impact on Spending Behavior among Indian Users

Harshal Dev, Raj Gupta, Dhruv Kumar

Comments: Accepted to ACM CHI 2024 - Late Breaking Work Track

Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

The emergence of digital payment systems has transformed how individuals conduct financial transactions, offering convenience, security, and efficiency. One groundbreaking innovation making waves in the Indian financial landscape is the Unified Payments Interface (UPI). Existing work has explored how digital payments benefit a country's economy and GDP. However, our study explores how the introduction of UPI has influenced spending behavior among Indian users on an "individual" level. We gathered 235 valid survey responses encompassing diverse demographics and interviewed 20 survey respondents. Approximately 75\% of the survey respondents reported increased spending due to UPI, with only 7\% indicating reduced spending. Significantly, 91.5\% of the respondents reported satisfaction with their UPI usage. Also, 95.2\% of the survey respondents found making payments via UPI convenient. Our research also provides suggestions for UPI applications and various stakeholders to enhance digital payment systems, enabling users to make informed decisions and fostering responsible financial management.
[511] arXiv:2401.10731 (replaced) [pdf, ps, other]: Title: Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion

Tianyi Zhao, Maoxun Yuan, Feng Jiang, Nan Wang, Xingxing Wei

Comments: 11pages, 11figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. However, the RGB and IR features have modality-specific noise, these strategies will exacerbate the fused features along with the propagation. Inspired by the mechanism of the human brain processing multimodal information, in this paper, we introduce a new coarse-to-fine perspective to purify and fuse two modality features. Specifically, following this perspective, we design a Redundant Spectrum Removal module to coarsely remove interfering information within each modality and a Dynamic Feature Selection module to finely select the desired features for feature fusion. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called the Removal and Selection Detector (RSDet). Extensive experiments on three RGB-IR object detection datasets verify the superior performance of our method.
[512] arXiv:2401.12391 (replaced) [pdf, ps, other]: Title: Approximation of Pufferfish Privacy for Gaussian Priors

Ni Ding

Comments: 11 pages, 5 figures, accepted journal version

Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

This paper studies how to approximate pufferfish privacy when the adversary's prior belief of the published data is Gaussian distributed. Using Monge's optimal transport plan, we show that $(\epsilon, \delta)$-pufferfish privacy is attained if the additive Laplace noise is calibrated to the differences in mean and variance of the Gaussian distributions conditioned on every discriminative secret pair. A typical application is the private release of the summation (or average) query, for which sufficient conditions are derived for approximating $\epsilon$-statistical indistinguishability in individual's sensitive data. The result is then extended to arbitrary prior beliefs trained by Gaussian mixture models (GMMs): calibrating Laplace noise to a convex combination of differences in mean and variance between Gaussian components attains $(\epsilon,\delta)$-pufferfish privacy.
[513] arXiv:2401.12872 (replaced) [pdf, ps, other]: Title: FocusFlow: 3D Gaze-Depth Interaction in Virtual Reality Leveraging Active Visual Depth Manipulation

Chenyang Zhang, Tiansu Chen, Eric Shaffer, Elahe Soltanaghai

Comments: ACM CHI 2024 Paper

Subjects: Human-Computer Interaction (cs.HC)

Gaze interaction presents a promising avenue in Virtual Reality (VR) due to its intuitive and efficient user experience. Yet, the depth control inherent in our visual system remains underutilized in current methods. In this study, we introduce FocusFlow, a hands-free interaction method that capitalizes on human visual depth perception within the 3D scenes of Virtual Reality. We first develop a binocular visual depth detection algorithm to understand eye input characteristics. We then propose a layer-based user interface and introduce the concept of 'Virtual Window' that offers an intuitive and robust gaze-depth VR interaction, despite the constraints of visual depth accuracy and precision spatially at further distances. Finally, to help novice users actively manipulate their visual depth, we propose two learning strategies that use different visual cues to help users master visual depth control. Our user studies on 24 participants demonstrate the usability of our proposed virtual window concept as a gaze-depth interaction method. In addition, our findings reveal that the user experience can be enhanced through an effective learning process with adaptive visual cues, helping users to develop muscle memory for this brand-new input mechanism. We conclude the paper by discussing strategies to optimize learning and potential research topics of gaze-depth interaction.
[514] arXiv:2401.13858 (replaced) [pdf, ps, other]: Title: Graph Diffusion Transformer for Multi-Conditional Molecular Generation

Gang Liu, Jiaxin Xu, Tengfei Luo, Meng Jiang

Comments: 21 pages, 9 figures, 7 tables

Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Inverse molecular design with diffusion models holds great potential for advancements in material and drug discovery. Despite success in unconditional molecule generation, integrating multiple properties such as synthetic score and gas permeability as condition constraints into diffusion models remains unexplored. We present the Graph Diffusion Transformer (Graph DiT) for multi-conditional molecular generation. Graph DiT has a condition encoder to learn the representation of numerical and categorical properties and utilizes a Transformer-based graph denoiser to achieve molecular graph denoising under conditions. Unlike previous graph diffusion models that add noise separately on the atoms and bonds in the forward diffusion process, we propose a graph-dependent noise model for training Graph DiT, designed to accurately estimate graph-related noise in molecules. We extensively validate the Graph DiT for multi-conditional polymer and small molecule generation. Results demonstrate our superiority across metrics from distribution learning to condition control for molecular properties. A polymer inverse design task for gas separation with feedback from domain experts further demonstrates its practical utility.
[515] arXiv:2401.15089 (replaced) [pdf, ps, html, other]: Title: Accelerating Material Property Prediction using Generically Complete Isometry Invariants

Jonathan Balasingham, Viktor Zamaraev, Vitaliy Kurlin

Journal-ref: Sci Rep 14, 10132 (2024)

Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Computational Physics (physics.comp-ph)

Periodic material or crystal property prediction using machine learning has grown popular in recent years as it provides a computationally efficient replacement for classical simulation methods. A crucial first step for any of these algorithms is the representation used for a periodic crystal. While similar objects like molecules and proteins have a finite number of atoms and their representation can be built based upon a finite point cloud interpretation, periodic crystals are unbounded in size, making their representation more challenging. In the present work, we adapt the Pointwise Distance Distribution (PDD), a continuous and generically complete isometry invariant for periodic point sets, as a representation for our learning algorithm. The PDD distinguished all (more than 660 thousand) periodic crystals in the Cambridge Structural Database as purely periodic sets of points without atomic types. We develop a transformer model with a modified self-attention mechanism that combines PDD with compositional information via a spatial encoding method. This model is tested on the crystals of the Materials Project and Jarvis-DFT databases and shown to produce accuracy on par with state-of-the-art methods while being several times faster in both training and prediction time.
[516] arXiv:2401.15939 (replaced) [pdf, ps, html, other]: Title: Correcting a Single Deletion in Reads from a Nanopore Sequencer

Anisha Banerjee, Yonatan Yehezkeally, Antonia Wachter-Zeh, Eitan Yaakobi

Comments: Accepted at IEEE ISIT'24

Subjects: Information Theory (cs.IT)

Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to devise practical and efficient coding schemes that would allow accurate retrieval of the data stored. Our work takes a step in this direction by adopting a simplified model of the nanopore sequencer inspired by Mao \emph{et al.}, which incorporates some of its physical aspects. This channel model can be viewed as a sliding window of length $\ell$ that passes over the incoming input sequence and produces the Hamming weight of the enclosed $\ell$ bits, while shifting by one position at each time step. The resulting $(\ell+1)$-ary vector, referred to as the $\ell$-\emph{read vector}, is susceptible to deletion errors due to imperfections inherent in the sequencing process. We establish that at least $\log n - \ell$ bits of redundancy are needed to correct a single deletion. An error-correcting code that is optimal up to an additive constant, is also proposed. Furthermore, we find that for $\ell \geq 2$, reconstruction from two distinct noisy $\ell$-read vectors can be accomplished without any redundancy, and provide a suitable reconstruction algorithm to this effect.
[517] arXiv:2401.17252 (replaced) [pdf, ps, other]: Title: Quantum $X$-Secure $B$-Byzantine $T$-Colluding Private Information Retrieval

Mohamed Nomeir, Alptug Aytekin, Sennur Ulukus

Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP); Quantum Physics (quant-ph)

We consider the problems arising from the presence of Byzantine servers in a quantum private information retrieval (QPIR) setting. This is the first work to precisely define what the capabilities of Byzantine servers could be in a QPIR context. We show that quantum Byzantine servers have more capabilities than their classical counterparts due to the possibilities created by quantum encoding procedures. We focus on quantum Byzantine servers that can apply any reversible operation on their individual qudits. In this case, Byzantine servers can generate any error, i.e., this covers \emph{all} possible single qudit operations that can be applied by Byzantine servers on their qudits. We design a scheme based on cross-subspace alignment (CSA) and we show that this scheme achieves superdense coding gain in some cases.
[518] arXiv:2402.00027 (replaced) [pdf, ps, html, other]: Title: Perspectives on locally weighted ensemble Kalman methods

Philipp Wacker

Subjects: Numerical Analysis (math.NA); Probability (math.PR); Computation (stat.CO)

This manuscript derives locally weighted ensemble Kalman methods from the point of view of ensemble-based function approximation. This is done by using pointwise evaluations to build up a local linear or quadratic approximation of a function, tapering off the effect of distant particles via local weighting. This introduces a candidate method (the locally weighted Ensemble Kalman method for inversion) with the motivation of combining some of the strengths of the particle filter (ability to cope with nonlinear maps and non-Gaussian distributions) and the Ensemble Kalman filter (no filter degeneracy).
[519] arXiv:2402.00906 (replaced) [pdf, ps, other]: Title: BrainLeaks: On the Privacy-Preserving Properties of Neuromorphic Architectures against Model Inversion Attacks

Hamed Poursiami, Ihsen Alouani, Maryam Parsa

Comments: 7 pages, 4 figures, 4 tables

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

With the mainstream integration of machine learning into security-sensitive domains such as healthcare and finance, concerns about data privacy have intensified. Conventional artificial neural networks (ANNs) have been found vulnerable to several attacks that can leak sensitive data. Particularly, model inversion (MI) attacks enable the reconstruction of data samples that have been used to train the model. Neuromorphic architectures have emerged as a paradigm shift in neural computing, enabling asynchronous and energy-efficient computation. However, little to no existing work has investigated the privacy of neuromorphic architectures against model inversion. Our study is motivated by the intuition that the non-differentiable aspect of spiking neural networks (SNNs) might result in inherent privacy-preserving properties, especially against gradient-based attacks. To investigate this hypothesis, we propose a thorough exploration of SNNs' privacy-preserving capabilities. Specifically, we develop novel inversion attack strategies that are comprehensively designed to target SNNs, offering a comparative analysis with their conventional ANN counterparts. Our experiments, conducted on diverse event-based and static datasets, demonstrate the effectiveness of the proposed attack strategies and therefore questions the assumption of inherent privacy-preserving in neuromorphic architectures.
[520] arXiv:2402.01345 (replaced) [pdf, ps, other]: Title: Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after '\n\n' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '\n\n' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '\n\n'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '\n\n' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of '\n'.
[521] arXiv:2402.01705 (replaced) [pdf, ps, html, other]: Title: Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation

Jennifer Chien, David Danks

Comments: 23 pages, 7 figures

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Algorithmic harms are commonly categorized as either allocative or representational. This study specifically addresses the latter, focusing on an examination of current definitions of representational harms to discern what is included and what is not. This analysis motivates our expansion beyond behavioral definitions to encompass harms to cognitive and affective states. The paper outlines high-level requirements for measurement: identifying the necessary expertise to implement this approach and illustrating it through a case study. Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms, particularly when these harms go unmeasured and unmitigated. The work concludes by presenting proposed mitigations and delineating when to employ them. The overarching aim of this research is to establish a framework for broadening the definition of representational harms and to translate insights from fairness research into practical measurement and mitigation praxis.
[522] arXiv:2402.01801 (replaced) [pdf, ps, html, other]: Title: Large Language Models for Time Series: A Survey

Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang

Comments: GitHub repository: this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) aligning techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.
[523] arXiv:2402.03450 (replaced) [pdf, ps, html, other]: Title: Recommendation Fairness in Social Networks Over Time

Meng Cao, Hussain Hussain, Sandipan Sikdar, Denis Helic, Markus Strohmaier, Roman Kern

Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Information Retrieval (cs.IR)

In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation to dynamic network properties. We examine three real-world dynamic networks by evaluating the fairness of six recommendation algorithms and analyzing the association between fairness and network properties over time. We further study how interventions on network properties influence fairness by examining counterfactual scenarios with alternative evolution outcomes and differing network properties. Our results on empirical datasets suggest that recommendation fairness improves over time, regardless of the recommendation method. We also find that two network properties, minority ratio, and homophily ratio, exhibit stable correlations with fairness over time. Our counterfactual study further suggests that an extreme homophily ratio potentially contributes to unfair recommendations even with a balanced minority ratio. Our work provides insights into the evolution of fairness within dynamic networks in social science. We believe that our findings will help system operators and policymakers to better comprehend the implications of temporal changes and interventions targeting fairness in social networks.
[524] arXiv:2402.03554 (replaced) [pdf, ps, html, other]: Title: Explicit Formula for Partial Information Decomposition

Aobo Lyu, Andrew Clark, Netanel Raviv

Subjects: Information Theory (cs.IT); Probability (math.PR)

Mutual information between two random variables is a well-studied notion, whose understanding is fairly complete. Mutual information between one random variable and a pair of other random variables, however, is a far more involved notion. Specifically, Shannon's mutual information does not capture fine-grained interactions between those three variables, resulting in limited insights in complex systems. To capture these fine-grained interactions, in 2010 Williams and Beer proposed to decompose this mutual information to information atoms, called unique, redundant, and synergistic, and proposed several operational axioms that these atoms must satisfy. In spite of numerous efforts, a general formula which satisfies these axioms has yet to be found. Inspired by Judea Pearl's do-calculus, we resolve this open problem by introducing the do-operation, an operation over the variable system which sets a certain marginal to a desired value, which is distinct from any existing approaches. Using this operation, we provide the first explicit formula for calculating the information atoms so that Williams and Beer's axioms are satisfied, as well as additional properties from subsequent studies in the field.
[525] arXiv:2402.04890 (replaced) [pdf, ps, other]: Title: Marker+Codeword+Marker: A Coding Structure for Segmented Single-Insdel/-Edit Channels

Zhen Li, Xuan He, Xiaohu Tang

Comments: 20 pages

Subjects: Information Theory (cs.IT)

An insdel refers to a deletion or an insertion, and an edit refers to an insdel or a substitution. In this paper, we consider the segmented single-insdel (resp. single-edit) channel, where the channel's input bit stream is partitioned into segments of length $n$ and each segment can suffer from at most a single insdel (resp. edit) error. The value of $n$ is known to the receiver but the boundaries of segments are not. We propose to encode each segment following a marker+codeword+marker structure, where the two markers are carefully selected and the codewords are chosen from Varshamov-Tenegolts (VT) codes. In this way, we are able to construct a new class of binary codes that can correct segmented single-insdel errors. Our codes have the lowest redundancy of $\log_2(n-6)+7$ bits and are the first one that has linear-time encoder/decoder in the literature. Moreover, by enhancing the VT codes and one of the markers, we are able to construct the first class of binary codes that can correct segmented single-edit errors. This class of codes has redundancy $\log_2(n-9)+10$ bits and has linear-time encoder/decoder.
[526] arXiv:2402.06025 (replaced) [pdf, ps, html, other]: Title: Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning

Wasu Top Piriyakulkij, Kevin Ellis

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We build a computational model of how humans actively infer hidden rules by doing experiments. The basic principles behind the model is that, even if the rule is deterministic, the learner considers a broader space of fuzzy probabilistic rules, which it represents in natural language, and updates its hypotheses online after each experiment according to approximately Bayesian principles. In the same framework we also model experiment design according to information-theoretic criteria. We find that the combination of these three principles -- explicit hypotheses, probabilistic rules, and online updates -- can explain human performance on a Zendo-style task, and that removing any of these components leaves the model unable to account for the data.
[527] arXiv:2402.07060 (replaced) [pdf, ps, other]: Title: Spectral convergence of a semi-discretized numerical system for the spatially homogeneous Boltzmann equation with uncertainties

Liu Liu, Kunlun Qi

Comments: Revised version. To appear in SIAM/ASA Journal on Uncertainty Quantification

Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

In this paper, we study the Boltzmann equation with uncertainties and prove that the spectral convergence of the semi-discretized numerical system holds in a combined velocity and random space, where the Fourier-spectral method is applied for approximation in the velocity space whereas the generalized polynomial chaos (gPC)-based stochastic Galerkin (SG) method is employed to discretize the random variable. Our proof is based on a delicate energy estimate for showing the well-posedness of the numerical solution as well as a rigorous control of its negative part in our well-designed functional space that involves high-order derivatives of both the velocity and random variables. This paper rigorously justifies the statement proposed in [Remark 4.4, J. Hu and S. Jin, J. Comput. Phys., 315 (2016), pp. 150-168].
[528] arXiv:2402.08656 (replaced) [pdf, ps, html, other]: Title: NeuroIDBench: An Open-Source Benchmark Framework for the Standardization of Methodology in Brainwave-based Authentication Research

Avinash Kumar Chaurasia, Matin Fallahi, Thorsten Strufe, Philipp Terhörst, Patricia Arias Cabarcos

Comments: 21 pages, 5 Figures, 3 tables, Submitted to the Journal of Information Security and Applications

Subjects: Cryptography and Security (cs.CR)

Biometric systems based on brain activity have been proposed as an alternative to passwords or to complement current authentication techniques. By leveraging the unique brainwave patterns of individuals, these systems offer the possibility of creating authentication solutions that are resistant to theft, hands-free, accessible, and potentially even revocable. However, despite the growing stream of research in this area, faster advance is hindered by reproducibility problems. Issues such as the lack of standard reporting schemes for performance results and system configuration, or the absence of common evaluation benchmarks, make comparability and proper assessment of different biometric solutions challenging. Further, barriers are erected to future work when, as so often, source code is not published open access. To bridge this gap, we introduce NeuroIDBench, a flexible open source tool to benchmark brainwave-based authentication models. It incorporates nine diverse datasets, implements a comprehensive set of pre-processing parameters and machine learning algorithms, enables testing under two common adversary models (known vs unknown attacker), and allows researchers to generate full performance reports and visualizations. We use NeuroIDBench to investigate the shallow classifiers and deep learning-based approaches proposed in the literature, and to test robustness across multiple sessions. We observe a 37.6% reduction in Equal Error Rate (EER) for unknown attacker scenarios (typically not tested in the literature), and we highlight the importance of session variability to brainwave authentication. All in all, our results demonstrate the viability and relevance of NeuroIDBench in streamlining fair comparisons of algorithms, thereby furthering the advancement of brainwave-based authentication through robust methodological practices.
[529] arXiv:2402.10074 (replaced) [pdf, ps, html, other]: Title: Class-Balanced and Reinforced Active Learning on Graphs

Chengcheng Yu, Jiapeng Zhu, Xiang Li

Subjects: Machine Learning (cs.LG)

Graph neural networks (GNNs) have demonstrated significant success in various applications, such as node classification, link prediction, and graph classification. Active learning for GNNs aims to query the valuable samples from the unlabeled data for annotation to maximize the GNNs' performance at a lower cost. However, most existing algorithms for reinforced active learning in GNNs may lead to a highly imbalanced class distribution, especially in highly skewed class scenarios. GNNs trained with class-imbalanced labeled data are susceptible to bias toward majority classes, and the lower performance of minority classes may lead to a decline in overall performance. To tackle this issue, we propose a novel class-balanced and reinforced active learning framework for GNNs, namely, GCBR. It learns an optimal policy to acquire class-balanced and informative nodes for annotation, maximizing the performance of GNNs trained with selected labeled nodes. GCBR designs class-balance-aware states, as well as a reward function that achieves trade-off between model performance and class balance. The reinforcement learning algorithm Advantage Actor-Critic (A2C) is employed to learn an optimal policy stably and efficiently. We further upgrade GCBR to GCBR++ by introducing a punishment mechanism to obtain a more class-balanced labeled set. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed approaches, achieving superior performance over state-of-the-art baselines.
[530] arXiv:2402.10517 (replaced) [pdf, ps, other]: Title: Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho, Bonggeun Sim, Jae W. Lee

Comments: To appear at ICML 2024. Code is available at this https URL

Subjects: Machine Learning (cs.LG)

Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance. Thus, this paper introduces \emph{any-precision LLM}, extending the concept of any-precision DNN to LLMs. Addressing challenges in any-precision LLM, we propose a lightweight method for any-precision quantization of LLMs, leveraging a post-training quantization framework, and develop a specialized software engine for its efficient serving. As a result, our solution significantly reduces the high costs of deploying multiple, different-sized LLMs by overlaying LLMs quantized to varying bit-widths, such as 3, 4, ..., $n$ bits, into a memory footprint comparable to a single $n$-bit LLM. All the supported LLMs with varying bit-widths demonstrate state-of-the-art model quality and inference throughput, proving itself to be a compelling option for deployment of multiple, different-sized LLMs. Our code is open-sourced and available online.
[531] arXiv:2402.10665 (replaced) [pdf, ps, html, other]: Title: Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift

Bruno Laboissiere Camargos Borges, Bruno Machado Pacheco, Danilo Silva

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Semantic segmentation plays a crucial role in various computer vision applications, yet its efficacy is often hindered by the lack of high-quality labeled data. To address this challenge, a common strategy is to leverage models trained on data from different populations, such as publicly available datasets. This approach, however, leads to the distribution shift problem, presenting a reduced performance on the population of interest. In scenarios where model errors can have significant consequences, selective prediction methods offer a means to mitigate risks and reduce reliance on expert supervision. This paper investigates selective prediction for semantic segmentation in low-resource settings, thus focusing on post-hoc confidence estimators applied to pre-trained models operating under distribution shift. We propose a novel image-level confidence measure tailored for semantic segmentation and demonstrate its effectiveness through experiments on three medical imaging tasks. Our findings show that post-hoc confidence estimators offer a cost-effective approach to reducing the impacts of distribution shift.
[532] arXiv:2402.10977 (replaced) [pdf, ps, other]: Title: Generative AI and Process Systems Engineering: The Next Frontier

Benjamin Decardi-Nelson, Abdulelah S. Alshehri, Akshay Ajagekar, Fengqi You

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC)

This article explores how emerging generative artificial intelligence (GenAI) models, such as large language models (LLMs), can enhance solution methodologies within process systems engineering (PSE). These cutting-edge GenAI models, particularly foundation models (FMs), which are pre-trained on extensive, general-purpose datasets, offer versatile adaptability for a broad range of tasks, including responding to queries, image generation, and complex decision-making. Given the close relationship between advancements in PSE and developments in computing and systems technologies, exploring the synergy between GenAI and PSE is essential. We begin our discussion with a compact overview of both classic and emerging GenAI models, including FMs, and then dive into their applications within key PSE domains: synthesis and design, optimization and integration, and process monitoring and control. In each domain, we explore how GenAI models could potentially advance PSE methodologies, providing insights and prospects for each area. Furthermore, the article identifies and discusses potential challenges in fully leveraging GenAI within PSE, including multiscale modeling, data requirements, evaluation metrics and benchmarks, and trust and safety, thereby deepening the discourse on effective GenAI integration into systems analysis, design, optimization, operations, monitoring, and control. This paper provides a guide for future research focused on the applications of emerging GenAI in PSE.
[533] arXiv:2402.11305 (replaced) [pdf, ps, other]: Title: On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

Juliette Marrie, Michael Arbel, Julien Mairal, Diane Larlus

Journal-ref: Published in Transactions on Machine Learning Research (TMLR), 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large pretrained visual models exhibit remarkable generalization across diverse recognition tasks. Yet, real-world applications often demand compact models tailored to specific problems. Variants of knowledge distillation have been devised for such a purpose, enabling task-specific compact models (the students) to learn from a generic large pretrained one (the teacher). In this paper, we show that the excellent robustness and versatility of recent pretrained models challenge common practices established in the literature, calling for a new set of optimal guidelines for task-specific distillation. To address the lack of samples in downstream tasks, we also show that a variant of Mixup based on stable diffusion complements standard data augmentation. This strategy eliminates the need for engineered text prompts and improves distillation of generic models into streamlined specialized networks.
[534] arXiv:2402.11355 (replaced) [pdf, ps, other]: Title: Natural Language Counterfactuals through Representation Surgery

Matan Avitan, Ryan Cotterell, Yoav Goldberg, Shauli Ravfogel

Comments: Preprint

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)

Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information such as gender within the model's representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we give a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.
[535] arXiv:2402.12078 (replaced) [pdf, ps, other]: Title: Mirroring Call-by-Need, or Values Acting Silly

Beniamino Accattoli, Adrienne Lancelot

Comments: To be published in FSCD24

Subjects: Logic in Computer Science (cs.LO)

Call-by-need evaluation for the lambda-calculus can be seen as merging the best of call-by-name and call-by-value, namely the wise erasing behaviour of the former and the wise duplicating behaviour of the latter. To better understand how duplication and erasure can be combined, we design a degenerated calculus, dubbed call-by-silly, that is symmetric to call-by-need in that it merges the worst of call-by-name and call-by-value, namely silly duplications by-name and silly erasures by-value.
We validate the design of the call-by-silly calculus via rewriting properties and multi types. In particular, we mirror the main theorem about call-by-need -- that is, its operational equivalence with call-by-name -- showing that call-by-silly and call-by-value induce the same contextual equivalence. This fact shows the blindness with respect to efficiency of call-by-value contextual equivalence. We also define a call-by-silly strategy and measure its length via tight multi types. Lastly, we prove that the call-by-silly strategy computes evaluation sequences of maximal length in the calculus.
[536] arXiv:2402.12545 (replaced) [pdf, ps, html, other]: Title: TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Danna Zheng, Danyang Liu, Mirella Lapata, Jeff Z. Pan

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with fact-checking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics.
[537] arXiv:2402.12623 (replaced) [pdf, ps, other]: Title: Effective Edge Ranking via Random Walk with Restart

Renchi Yang

Comments: Incomplete theory and experiments. Will upload a new version later

Subjects: Social and Information Networks (cs.SI)

Given a network G, edge centrality is a metric used to evaluate the importance of edges in G, which is a key concept in analyzing networks and finds vast applications involving edge ranking. In spite of a wealth of research on devising edge centrality measures, they incur either prohibitively high computation costs or varied deficiencies that lead to sub-optimal ranking quality.
To overcome their limitations, this paper proposes EdgeRAKE, a new centrality measure for edge ranking that leverages the novel notion of the edgewise random walk with restart. Based thereon, we present a linear-complexity algorithm for EdgeRAKE approximation, followed by an in-depth theoretical analysis in terms of various aspects. Extensive experiments comparing EdgeRAKE against six edge centrality metrics in graph analytics tasks on real networks showcase that EdgeRAKE offers superior practical effectiveness without significantly reducing computation efficiency
[538] arXiv:2402.12670 (replaced) [pdf, ps, html, other]: Title: Towards Validation of Autonomous Vehicles Across Scales using an Integrated Digital Twin Framework

Tanmay Vilas Samak, Chinmay Vilas Samak, Venkat Narayan Krovi

Comments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) 2024

Subjects: Robotics (cs.RO)

Autonomous vehicle platforms of varying spatial scales are employed within the research and development spectrum based on space, safety and monetary constraints. However, deploying and validating autonomy algorithms across varying operational scales presents challenges due to scale-specific dynamics, sensor integration complexities, computational constraints, regulatory considerations, environmental variability, interaction with other traffic participants and scalability concerns. In such a milieu, this work focuses on developing a unified framework for modeling and simulating digital twins of autonomous vehicle platforms across different scales and operational design domains (ODDs) to help support the streamlined development and validation of autonomy software stacks. Particularly, this work discusses the development of digital twin representations of 4 autonomous ground vehicles, which span across 3 different scales and target 3 distinct ODDs. We study the adoption of these autonomy-oriented digital twins to deploy a common autonomy software stack with an aim of end-to-end map-based navigation to achieve the ODD-specific objective(s) for each vehicle. Finally, we also discuss the flexibility of the proposed framework to support virtual, hybrid as well as physical testing with seamless sim2real transfer.
[539] arXiv:2402.12875 (replaced) [pdf, ps, other]: Title: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

Comments: 38 pages. Accepted by ICLR 2024

Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Machine Learning (stat.ML)

Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
[540] arXiv:2402.13699 (replaced) [pdf, ps, html, other]: Title: Explainable Classification Techniques for Quantum Dot Device Measurements

Daniel Schug, Tyler J. Kovach, M. A. Wolfe, Jared Benson, Sanghyeok Park, J. P. Dodson, J. Corrigan, M. A. Eriksson, Justyna P. Zwolak

Comments: 5 pages, 3 figures

Journal-ref: Proceedings of the XAI4Sci: Explainable machine learning for sciences workshop at AAAI 2024, Vancouver, Canada

Subjects: Computer Vision and Pattern Recognition (cs.CV); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Machine Learning (cs.LG)

In the physical sciences, there is an increased need for robust feature representations of image data: image acquisition, in the generalized sense of two-dimensional data, is now widespread across a large number of fields, including quantum information science, which we consider here. While traditional image features are widely utilized in such cases, their use is rapidly being supplanted by Neural Network-based techniques that often sacrifice explainability in exchange for high accuracy. To ameliorate this trade-off, we propose a synthetic data-based technique that results in explainable features. We show, using Explainable Boosting Machines (EBMs), that this method offers superior explainability without sacrificing accuracy. Specifically, we show that there is a meaningful benefit to this technique in the context of quantum dot tuning, where human intervention is necessary at the current stage of development.
[541] arXiv:2402.17323 (replaced) [pdf, ps, other]: Title: SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

Junsu Kim, Hoseong Cho, Jihyeon Kim, Yihalem Yimolal Tiruneh, Seungryul Baek

Comments: Accept to CVPR 2024. The camera-ready version

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the field of class incremental learning (CIL), generative replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the complexities of scenes involving multiple labels. In this paper, we propose a novel approach called stable diffusion deep generative replay (SDDGR) for CIOD. Our method utilizes a diffusion-based generative model with pre-trained text-to-diffusion networks to generate realistic and diverse synthetic images. SDDGR incorporates an iterative refinement strategy to produce high-quality images encompassing old classes. Additionally, we adopt an L2 knowledge distillation technique to improve the retention of prior knowledge in synthetic images. Furthermore, our approach includes pseudo-labeling for old objects within new task images, preventing misclassification as background elements. Extensive experiments on the COCO 2017 dataset demonstrate that SDDGR significantly outperforms existing algorithms, achieving a new state-of-the-art in various CIOD scenarios. The source code will be made available to the public.
[542] arXiv:2402.18267 (replaced) [pdf, ps, other]: Title: A Survey on Neural Question Generation: Methods, Applications, and Prospects

Shasha Guo, Lizi Liao, Cuiping Li, Tat-Seng Chua

Comments: Accepted by IJCAI 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this survey, we present a detailed examination of the advancements in Neural Question Generation (NQG), a field leveraging neural network techniques to generate relevant questions from diverse inputs like knowledge bases, texts, and images. The survey begins with an overview of NQG's background, encompassing the task's problem formulation, prevalent benchmark datasets, established evaluation metrics, and notable applications. It then methodically classifies NQG approaches into three predominant categories: structured NQG, which utilizes organized data sources, unstructured NQG, focusing on more loosely structured inputs like texts or visual content, and hybrid NQG, drawing on diverse input modalities. This classification is followed by an in-depth analysis of the distinct neural network models tailored for each category, discussing their inherent strengths and potential limitations. The survey culminates with a forward-looking perspective on the trajectory of NQG, identifying emergent research trends and prospective developmental paths. Accompanying this survey is a curated collection of related research papers, datasets and codes, systematically organized on Github, providing an extensive reference for those delving into NQG.
[543] arXiv:2403.01606 (replaced) [pdf, ps, html, other]: Title: A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation

Yuxiang Huang, John Zelek

Comments: for the published version, see this https URL

Journal-ref: Journal of Computational Vision and Imaging Systems 9 (2023) 68-71

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Motion segmentation is a fundamental problem in computer vision and is crucial in various applications such as robotics, autonomous driving and action recognition. Recently, spectral clustering based methods have shown impressive results on motion segmentation in dynamic environments. These methods perform spectral clustering on motion affinity matrices to cluster objects or point trajectories in the scene into different motion groups. However, existing methods often need the number of motions present in the scene to be known, which significantly reduces their practicality. In this paper, we propose a unified model selection technique to automatically infer the number of motion groups for spectral clustering based motion segmentation methods by combining different existing model selection techniques together. We evaluate our method on the KT3DMoSeg dataset and achieve competitve results comparing to the baseline where the number of clusters is given as ground truth information.
[544] arXiv:2403.02215 (replaced) [pdf, ps, html, other]: Title: Joint Parameter and Parameterization Inference with Uncertainty Quantification through Differentiable Programming

Yongquan Qu, Mohamed Aziz Bhouri, Pierre Gentine

Comments: Accepted at ICLR 2024 Workshop on AI4Differential Equations in Science

Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD); Atmospheric and Oceanic Physics (physics.ao-ph)

Accurate representations of unknown and sub-grid physical processes through parameterizations (or closure) in numerical simulations with quantified uncertainty are critical for resolving the coarse-grained partial differential equations that govern many problems ranging from weather and climate prediction to turbulence simulations. Recent advances have seen machine learning (ML) increasingly applied to model these subgrid processes, resulting in the development of hybrid physics-ML models through the integration with numerical solvers. In this work, we introduce a novel framework for the joint estimation of physical parameters and machine learning parameterizations with uncertainty quantification. Our framework incorporates online training and efficient Bayesian inference within a high-dimensional parameter space, facilitated by differentiable programming. This proof of concept underscores the substantial potential of differentiable programming in synergistically combining machine learning with differential equations, thereby enhancing the capabilities of hybrid physics-ML modeling.
[545] arXiv:2403.02234 (replaced) [pdf, ps, other]: Title: 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Comments: Code available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation. To facilitate the training of the proposed system, we clean and caption the largest open-source 3D dataset, Objaverse, by combining the power of vision language models and large language models. Experiment results are reported qualitatively and quantitatively to show the performance of the proposed system. Our codes and models are available at this https URL
[546] arXiv:2403.02527 (replaced) [pdf, ps, other]: Title: A dataset of over one thousand computed tomography scans of battery cells

Amariah Condon, Bailey Buscarino, Eric Moch, William J. Sehnert, Owen Miles, Patrick K. Herring, Peter M. Attia

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)

Battery technology is increasingly important for global electrification efforts. However, batteries are highly sensitive to small manufacturing variations that can induce reliability or safety issues. An important technology for battery quality control is computed tomography (CT) scanning, which is widely used for non-destructive 3D inspection across a variety of clinical and industrial applications. Historically, however, the utility of CT scanning for high-volume manufacturing has been limited by its low throughput as well as the difficulty of handling its large file sizes. In this work, we present a dataset of over one thousand CT scans of as-produced commercially available batteries. The dataset spans various chemistries (lithium-ion and sodium-ion) as well as various battery form factors (cylindrical, pouch, and prismatic). We evaluate seven different battery types in total. The manufacturing variability and the presence of battery defects can be observed via this dataset. This dataset may be of interest to scientists and engineers working on battery technology, computer vision, or both.
[547] arXiv:2403.02622 (replaced) [pdf, ps, other]: Title: World Models for Autonomous Driving: An Initial Survey

Yanchen Guan, Haicheng Liao, Zhenning Li, Jia Hu, Runze Yuan, Yunjian Li, Guohui Zhang, Chengzhong Xu

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting potential future scenarios and compensating for information gaps. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving, spanning their theoretical underpinnings, practical applications, and the ongoing research efforts aimed at overcoming existing limitations. Highlighting the significant role of world models in advancing autonomous driving technologies, this survey aspires to serve as a foundational reference for the research community, facilitating swift access to and comprehension of this burgeoning field, and inspiring continued innovation and exploration.
[548] arXiv:2403.02654 (replaced) [pdf, ps, other]: Title: Restricted Isometry Property of Rank-One Measurements with Random Unit-Modulus Vectors

Wei Zhang, Zhenni Wang

Comments: AISTATS 2024

Subjects: Information Theory (cs.IT)

The restricted isometry property (RIP) is essential for the linear map to guarantee the successful recovery of low-rank matrices. The existing works show that the linear map generated by the measurement matrices with independent and identically distributed (i.i.d.) entries satisfies RIP with high probability. However, when dealing with non-i.i.d. measurement matrices, such as the rank-one measurements, the RIP compliance may not be guaranteed. In this paper, we show that the RIP can still be achieved with high probability, when the rank-one measurement matrix is constructed by the random unit-modulus vectors. Compared to the existing works, we first address the challenge of establishing RIP for the linear map in non-i.i.d. scenarios. As validated in the experiments, this linear map is memory-efficient, and not only satisfies the RIP but also exhibits similar recovery performance of the low-rank matrices to that of conventional i.i.d. measurement matrices.
[549] arXiv:2403.03173 (replaced) [pdf, ps, other]: Title: Solving the bongard-logo problem by modeling a probabilistic model

Ruizhuo Song, Beiming Yuan

Comments: 14 pages, 11 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Abstract reasoning problems challenge the perceptual and cognitive abilities of AI algorithms, demanding deeper pattern discernment and inductive reasoning beyond explicit image features. This study introduces PMoC, a tailored probability model for the Bongard-Logo problem, achieving high reasoning accuracy by constructing independent probability models. Additionally, we present Pose-Transformer, an enhanced Transformer-Encoder designed for complex abstract reasoning tasks, including Bongard-Logo, RAVEN, I-RAVEN, and PGM. Pose-Transformer incorporates positional information learning, inspired by capsule networks' pose matrices, enhancing its focus on local positional relationships in image data processing. When integrated with PMoC, it further improves reasoning accuracy. Our approach effectively addresses reasoning difficulties associated with abstract entities' positional changes, outperforming previous models on the OIG, D3$\times$3 subsets of RAVEN, and PGM databases. This research contributes to advancing AI's capabilities in abstract reasoning and cognitive pattern recognition.
[550] arXiv:2403.03506 (replaced) [pdf, ps, html, other]: Title: Detecting AI-Generated Sentences in Realistic Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

Zijie Zeng, Shiqi Liu, Lele Sha, Zhuang Li, Kaixun Yang, Sannyuya Liu, Dragan Gašević, Guanliang Chen

Comments: Accepted as a full paper on IJCAI 2024 (Special Track: AI and Social Good)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.
[551] arXiv:2403.03542 (replaced) [pdf, ps, other]: Title: DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu

Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{this https URL}.
[552] arXiv:2403.04872 (replaced) [pdf, ps, other]: Title: Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

Frances A. Laureano De Leon, Harish Tayyar Madabushi, Mark Lee

Comments: Accepted for publication at Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Data and code available at this https URL

Subjects: Computation and Language (cs.CL)

Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages. Despite its widespread use online and recent research trends in this area, research in code-switching presents unique challenges, primarily stemming from the scarcity of labelled data and available resources. In this study we investigate how pre-trained Language Models handle code-switched text in three dimensions: a) the ability of PLMs to detect code-switched text, b) variations in the structural information that PLMs utilise to capture code-switched text, and c) the consistency of semantic information representation in code-switched text. To conduct a systematic and controlled evaluation of the language models in question, we create a novel dataset of well-formed naturalistic code-switched text along with parallel translations into the source languages. Our findings reveal that pre-trained language models are effective in generalising to code-switched text, shedding light on the abilities of these models to generalise representations to CS corpora. We release all our code and data including the novel corpus at this https URL.
[553] arXiv:2403.06064 (replaced) [pdf, ps, html, other]: Title: L$^2$GC: Lorentzian Linear Graph Convolutional Networks For Node Classification

Qiuyu Liang, Weihua Wang, Feilong Bao, Guanglai Gao

Comments: Accepted by LREC-COLING 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN and propose a novel framework for Lorentzian linear GCN. Specifically, we map the learned features of graph nodes into hyperbolic space, and then perform a Lorentzian linear feature transformation to capture the underlying tree-like structure of data. Experimental results on standard citation networks datasets with semi-supervised learning show that our approach yields new state-of-the-art results of accuracy 74.7$\%$ on Citeseer and 81.3$\%$ on PubMed datasets. Furthermore, we observe that our approach can be trained up to two orders of magnitude faster than other nonlinear GCN models on PubMed dataset. Our code is publicly available at this https URL.
[554] arXiv:2403.06769 (replaced) [pdf, ps, html, other]: Title: Strength Lies in Differences! Towards Effective Non-collaborative Dialogues via Tailored Strategy Planning

Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua

Comments: V2: 20 pages, 8 figures, and 20 tables

Subjects: Computation and Language (cs.CL)

We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training strategic planners that can be generalized to diverse users. To address these challenges, we propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm. Through experiments on benchmark non-collaborative dialogue tasks, we demonstrate the effectiveness of Trip in catering to diverse users.
[555] arXiv:2403.06898 (replaced) [pdf, ps, other]: Title: SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions

Gang Liao, Ye Liu, Yonghua Ding, Le Cai, Jianjun Chen

Comments: DaMoN 2024

Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

The ubiquity of variable-length integers in data storage and communication necessitates efficient decoding techniques. In this paper, we present SFVInt, a simple and fast approach to decode the prevalent Little Endian Base-128 (LEB128) varints. Our approach effectively utilizes the Bit Manipulation Instruction Set 2 (BMI2) in modern Intel and AMD processors, achieving significant performance improvement while maintaining simplicity and avoiding overengineering. SFVInt, with its generic design, effectively processes both 32-bit and 64-bit unsigned integers using a unified code template, marking a significant leap forward in varint decoding efficiency. We thoroughly evaluate SFVInt's performance across various datasets and scenarios, demonstrating that it achieves up to a 2x increase in decoding speed when compared to varint decoding methods used in established frameworks like Facebook Folly and Google Protobuf.
[556] arXiv:2403.08411 (replaced) [pdf, ps, html, other]: Title: Robust Distributed Compression with Learned Heegard-Berger Scheme

Eyyup Tasci, Ezgi Ozyilkan, Oguzhan Kubilay Ulger, Elza Erkip

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We consider lossy compression of an information source when decoder-only side information may be absent. This setup, also referred to as the Heegard-Berger or Kaspi problem, is a special case of robust distributed source coding. Building upon previous works on neural network-based distributed compressors developed for the decoder-only side information (Wyner-Ziv) case, we propose learning-based schemes that are amenable to the availability of side information. We find that our learned compressors mimic the achievability part of the Heegard-Berger theorem and yield interpretable results operating close to information-theoretic bounds. Depending on the availability of the side information, our neural compressors recover characteristics of the point-to-point (i.e., with no side information) and the Wyner-Ziv coding strategies that include binning in the source space, although no structure exploiting knowledge of the source and side information was imposed into the design.
[557] arXiv:2403.09275 (replaced) [pdf, ps, html, other]: Title: Static Grouping Strategy Design for Beyond Diagonal Reconfigurable Intelligent Surfaces

Matteo Nerini, Shanpu Shen, Bruno Clerckx

Comments: Accepted by IEEE for publication

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Beyond diagonal reconfigurable intelligent surface (BD-RIS) extends conventional RIS through novel architectures, such as group-connected RIS, with scattering matrix not restricted to being diagonal. However, it remains unexplored how to optimally group the elements in group-connected RISs to maximize the performance while maintaining a low-complexity circuit. In this study, we propose and model BD-RIS with a static grouping strategy optimized based on the channel statistics. After formulating the corresponding problems, we design the grouping in single- and multi-user systems. Numerical results reveal the benefits of grouping optimization, i.e., up to 60% sum rate improvement, especially in highly correlated channels.
[558] arXiv:2403.10087 (replaced) [pdf, ps, other]: Title: Monkeypox disease recognition model based on improved SE-InceptionV3

Junzhuo Chen, Zonghan Lu, Shitong Kang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the wake of the global spread of monkeypox, accurate disease recognition has become crucial. This study introduces an improved SE-InceptionV3 model, embedding the SENet module and incorporating L2 regularization into the InceptionV3 framework to enhance monkeypox disease detection. Utilizing the Kaggle monkeypox dataset, which includes images of monkeypox and similar skin conditions, our model demonstrates a noteworthy accuracy of 96.71% on the test set, outperforming conventional methods and deep learning models. The SENet modules channel attention mechanism significantly elevates feature representation, while L2 regularization ensures robust generalization. Extensive experiments validate the models superiority in precision, recall, and F1 score, highlighting its effectiveness in differentiating monkeypox lesions in diverse and complex cases. The study not only provides insights into the application of advanced CNN architectures in medical diagnostics but also opens avenues for further research in model optimization and hyperparameter tuning for enhanced disease recognition. this https URL
[559] arXiv:2403.11674 (replaced) [pdf, ps, html, other]: Title: Towards Generalizing to Unseen Domains with Few Labels

Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana, Muhammad Haris Khan

Comments: Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We approach the challenge of addressing semi-supervised domain generalization (SSDG). Specifically, our aim is to obtain a model that learns domain-generalizable features by leveraging a limited subset of labelled data alongside a substantially larger pool of unlabeled data. Existing domain generalization (DG) methods which are unable to exploit unlabeled data perform poorly compared to semi-supervised learning (SSL) methods under SSDG setting. Nevertheless, SSL methods have considerable room for performance improvement when compared to fully-supervised DG training. To tackle this underexplored, yet highly practical problem of SSDG, we make the following core contributions. First, we propose a feature-based conformity technique that matches the posterior distributions from the feature space with the pseudo-label from the model's output space. Second, we develop a semantics alignment loss to learn semantically-compatible representations by regularizing the semantic structure in the feature space. Our method is plug-and-play and can be readily integrated with different SSL-based SSDG baselines without introducing any additional parameters. Extensive experimental results across five challenging DG benchmarks with four strong SSL baselines suggest that our method provides consistent and notable gains in two different SSDG settings.
[560] arXiv:2403.15246 (replaced) [pdf, ps, other]: Title: FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)

Modern Language Models (LMs) are capable of following long and complex instructions that enable a large and diverse set of user requests. While Information Retrieval (IR) models use these LMs as the backbone of their architectures, virtually none of them allow users to provide detailed instructions alongside queries, thus limiting their ability to satisfy complex information needs. In this work, we study the use of instructions in IR systems. First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions. FollowIR repurposes detailed instructions -- also known as narratives -- developed for professional assessors to evaluate retrieval systems. In particular, we build our benchmark from three collections curated for shared tasks at the Text REtrieval Conference (TREC). These collections contains hundreds to thousands of labeled documents per query, making them suitable for our exploration. Through this process, we can measure how well IR models follow instructions, through a new pairwise evaluation framework. Our results indicate that existing retrieval models fail to correctly use instructions, using them for basic keywords and struggling to understand long-form information. However, we show that it is possible for IR models to learn to follow complex instructions: our new FollowIR-7B model has significant improvements after fine-tuning on our training set.
[561] arXiv:2403.15676 (replaced) [pdf, ps, html, other]: Title: AC4: Algebraic Computation Checker for Circuit Constraints in ZKPs

Hao Chen, Minyu Chen, Ruibang Liu, Guoqiang Li, Sinka Gao

Comments: 20 pages, 4 figures

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Cryptography and Security (cs.CR)

ZKP systems have surged attention and held a fundamental role in contemporary cryptography. Zk-SNARK protocols dominate the ZKP usage, often implemented through arithmetic circuit programming paradigm. However, underconstrained or overconstrained circuits may lead to bugs. Underconstrained circuits refer to circuits that lack the necessary constraints, resulting in unexpected solutions in the circuit and causing the verifier to accept a bogus witness. Overconstrained circuits refer to circuits that are constrained excessively, resulting in the circuit lacking necessary solutions and causing the verifier to accept no witness, rendering the circuit meaningless. This paper introduces a novel approach for pinpointing two distinct types of bugs in ZKP circuits. The method involves encoding the arithmetic circuit constraints to polynomial equation systems and solving polynomial equation systems over a finite field by algebraic computation. The classification of verification results is refined, greatly enhancing the expressive power of the system. We proposed a tool, AC4, to represent the implementation of this method. Experiments demonstrate that AC4 represents a substantial 29% increase in the checked ratio compared to prior work. Within a solvable range, the checking time of AC4 has also exhibited noticeable improvement, demonstrating a magnitude increase compared to previous efforts.
[562] arXiv:2403.16689 (replaced) [pdf, ps, html, other]: Title: Synapse: Learning Preferential Concepts from Visual Demonstrations

Sadanand Modak, Noah Patton, Isil Dillig, Joydeep Biswas

Comments: 25 pages, 7 tables, 9 figures; Preprint; Updated figures and appendix, added VLM ablations

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Programming Languages (cs.PL)

This paper addresses the problem of preference learning, which aims to learn user-specific preferences (e.g., "good parking spot", "convenient drop-off location") from visual input. Despite its similarity to learning factual concepts (e.g., "red cube"), preference learning is a fundamentally harder problem due to its subjective nature and the paucity of person-specific training data. We address this problem using a new framework called Synapse, which is a neuro-symbolic approach designed to efficiently learn preferential concepts from limited demonstrations. Synapse represents preferences as neuro-symbolic programs in a domain-specific language (DSL) that operates over images, and leverages a novel combination of visual parsing, large language models, and program synthesis to learn programs representing individual preferences. We evaluate Synapse through extensive experimentation including a user case study focusing on mobility-related concepts in mobile robotics and autonomous driving. Our evaluation demonstrates that Synapse significantly outperforms existing baselines as well as its own ablations. The code and other details can be found on the project website this https URL .
[563] arXiv:2403.17691 (replaced) [pdf, ps, html, other]: Title: Not All Similarities Are Created Equal: Leveraging Data-Driven Biases to Inform GenAI Copyright Disputes

Uri Hacohen, Adi Haviv, Shahar Sarfaty, Bruria Friedman, Niva Elkin-Koren, Roi Livni, Amit H Bermano

Comments: Presented at ACM CSLAW 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, and Stable Diffusion, has revolutionized content creation, enabling non-professionals to produce high-quality content across various domains. This transformative technology has led to a surge of synthetic content and sparked legal disputes over copyright infringement. To address these challenges, this paper introduces a novel approach that leverages the learning capacity of GenAI models for copyright legal analysis, demonstrated with GPT2 and Stable Diffusion models. Copyright law distinguishes between original expressions and generic ones (Scènes à faire), protecting the former and permitting reproduction of the latter. However, this distinction has historically been challenging to make consistently, leading to over-protection of copyrighted works. GenAI offers an unprecedented opportunity to enhance this legal analysis by revealing shared patterns in preexisting works. We propose a data-driven approach to identify the genericity of works created by GenAI, employing "data-driven bias" to assess the genericity of expressive compositions. This approach aids in copyright scope determination by utilizing the capabilities of GenAI to identify and prioritize expressive elements and rank them according to their frequency in the model's dataset. The potential implications of measuring expressive genericity for copyright law are profound. Such scoring could assist courts in determining copyright scope during litigation, inform the registration practices of Copyright Offices, allowing registration of only highly original synthetic works, and help copyright owners signal the value of their works and facilitate fairer licensing deals. More generally, this approach offers valuable insights to policymakers grappling with adapting copyright law to the challenges posed by the era of GenAI.
[564] arXiv:2404.00306 (replaced) [pdf, ps, other]: Title: Leveraging Intelligent Recommender system as a first step resilience measure -- A data-driven supply chain disruption response framework

Yang Hu

Comments: Manuscript submitted for WSC2024 Conference

Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)

Interests in the value of digital technologies for its potential uses to increase supply chain resilience (SCRes) are increasing in light to the industry 4.0 and the global pandemic. Utilization of Recommender systems (RS) as a supply chain (SC) resilience measure is neglected although RS is a capable tool to enhance SC resilience from a reactive aspect. To address this problem, this research proposed a novel data-driven supply chain disruption response framework based on the intelligent recommender system techniques and validated the conceptual model through a practical use case. Results show that our framework can be implemented as an effective SC disruption mitigation measure in the very first response phrase and help SC participants get better reaction performance after the SC disruption.
[565] arXiv:2404.00506 (replaced) [pdf, ps, html, other]: Title: Label-Agnostic Forgetting: A Supervision-Free Unlearning in Deep Models

Shaofei Shen, Chenhao Zhang, Yawen Zhao, Alina Bialkowski, Weitong Tony Chen, Miao Xu

Subjects: Machine Learning (cs.LG)

Machine unlearning aims to remove information derived from forgotten data while preserving that of the remaining dataset in a well-trained model. With the increasing emphasis on data privacy, several approaches to machine unlearning have emerged. However, these methods typically rely on complete supervision throughout the unlearning process. Unfortunately, obtaining such supervision, whether for the forgetting or remaining data, can be impractical due to the substantial cost associated with annotating real-world datasets. This challenge prompts us to propose a supervision-free unlearning approach that operates without the need for labels during the unlearning process. Specifically, we introduce a variational approach to approximate the distribution of representations for the remaining data. Leveraging this approximation, we adapt the original model to eliminate information from the forgotten data at the representation level. To further address the issue of lacking supervision information, which hinders alignment with ground truth, we introduce a contrastive loss to facilitate the matching of representations between the remaining data and those of the original model, thus preserving predictive performance. Experimental results across various unlearning tasks demonstrate the effectiveness of our proposed method, Label-Agnostic Forgetting (LAF) without using any labels, which achieves comparable performance to state-of-the-art methods that rely on full supervision information. Furthermore, our approach excels in semi-supervised scenarios, leveraging limited supervision information to outperform fully supervised baselines. This work not only showcases the viability of supervision-free unlearning in deep models but also opens up a new possibility for future research in unlearning at the representation level.
[566] arXiv:2404.00552 (replaced) [pdf, ps, html, other]: Title: Comparison of Methods in Skin Pigment Decomposition

Hao Gong, Michel Desvignes

Comments: 5 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Decomposition of skin pigment plays an important role in medical fields. Human skin can be decomposed into two primitive components, hemoglobin and melanin. It is our goal to apply these results for diagnosis of skin cancer. In this paper, various methods for skin pigment decomposition are reviewed comparatively and the performance of each method is evaluated both theoretically and experimentally. In addition, isometric feature mapping (Isomap) is introduced in order to improve the dimensionality reduction performance in context of skin pigment decomposition.
[567] arXiv:2404.01568 (replaced) [pdf, ps, html, other]: Title: A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)

Dehao Yuan, Cornelia Fermüller, Tahseen Rabbani, Furong Huang, Yiannis Aloimonos

Comments: ICML2024 Conference Paper

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)

We propose VecKM, a local point cloud geometry encoder that is descriptive and efficient to compute. VecKM leverages a unique approach by vectorizing a kernel mixture to represent the local point cloud. Such representation's descriptiveness is supported by two theorems that validate its ability to reconstruct and preserve the similarity of the local shape. Unlike existing encoders downsampling the local point cloud, VecKM constructs the local geometry encoding using all neighboring points, producing a more descriptive encoding. Moreover, VecKM is efficient to compute and scalable to large point cloud inputs: VecKM reduces the memory cost from $(n^2+nKd)$ to $(nd+np)$; and reduces the major runtime cost from computing $nK$ MLPs to $n$ MLPs, where $n$ is the size of the point cloud, $K$ is the neighborhood size, $d$ is the encoding dimension, and $p$ is a marginal factor. The efficiency is due to VecKM's unique factorizable property that eliminates the need of explicitly grouping points into neighbors. In the normal estimation task, VecKM demonstrates not only 100x faster inference speed but also highest accuracy and strongest robustness. In classification and segmentation tasks, integrating VecKM as a preprocessing module achieves consistently better performance than the PointNet, PointNet++, and point transformer baselines, and runs consistently faster by up to 10 times.
[568] arXiv:2404.03216 (replaced) [pdf, ps, html, other]: Title: Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

Jianming Tong, Jingtian Dang, Anupam Golder, Callie Hao, Arijit Raychowdhury, Tushar Krishna

Comments: Proceedings of the 5th MLSys Conference, Santa Clara, CA, USA, 2024. Copyright 2024 by the author(s)

Subjects: Cryptography and Security (cs.CR)

As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU and MaxPooling) with high-degree Polynomial Approximated Function (PAF). We propose SmartPAF, a framework to replace non-polynomial operators with low-degree PAF and then recover the accuracy of PAF-approximated model through four techniques: (1) Coefficient Tuning (CT) -- adjust PAF coefficients based on the input distributions before training, (2) Progressive Approximation (PA) -- progressively replace one non-polynomial operator at a time followed by a fine-tuning, (3) Alternate Training (AT) -- alternate the training between PAFs and other linear operators in the decoupled manner, and (4) Dynamic Scale (DS) / Static Scale (SS) -- dynamically scale PAF input value within (-1, 1) in training, and fix the scale as the running max value in FHE deployment. The synergistic effect of CT, PA, AT, and DS/SS enables SmartPAF to enhance the accuracy of the various models approximated by PAFs with various low degrees under multiple datasets. For ResNet-18 under ImageNet-1k, the Pareto-frontier spotted by SmartPAF in latency-accuracy tradeoff space achieves 1.42x ~ 13.64x accuracy improvement and 6.79x ~ 14.9x speedup than prior works. Further, SmartPAF enables a 14-degree PAF (f1^2 g_1^2) to achieve 7.81x speedup compared to the 27-degree PAF obtained by minimax approximation with the same 69.4% post-replacement accuracy. Our code is available at this https URL.
[569] arXiv:2404.04057 (replaced) [pdf, ps, html, other]: Title: Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at this https URL
[570] arXiv:2404.04059 (replaced) [pdf, ps, other]: Title: On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives

Sarah Sterz, Kevin Baum, Sebastian Biewer, Holger Hermanns, Anne Lauber-Rönsberg, Philip Meinel, Markus Langer

Comments: 13 pages, 1 figure, 1 table; ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2024

Subjects: Computers and Society (cs.CY)

Human oversight is currently discussed as a potential safeguard to counter some of the negative aspects of high-risk AI applications. This prompts a critical examination of the role and conditions necessary for what is prominently termed effective or meaningful human oversight of these systems. This paper investigates effective human oversight by synthesizing insights from psychological, legal, philosophical, and technical domains. Based on the claim that the main objective of human oversight is risk mitigation, we propose a viable understanding of effectiveness in human oversight: for human oversight to be effective, the oversight person has to have (a) sufficient causal power with regard to the system and its effects, (b) suitable epistemic access to relevant aspects of the situation, (c) self-control, and (d) fitting intentions for their role. Furthermore, we argue that this is equivalent to saying that an oversight person is effective if and only if they are morally responsible and have fitting intentions. Against this backdrop, we suggest facilitators and inhibitors of effectiveness in human oversight when striving for practical applicability. We discuss factors in three domains, namely, the technical design of the system, individual factors of oversight persons, and the environmental circumstances in which they operate. Finally, this paper scrutinizes the upcoming AI Act of the European Union -- in particular Article 14 on Human Oversight -- as an exemplary regulatory framework in which we study the practicality of our understanding of effective human oversight. By analyzing the provisions and implications of the European AI Act proposal, we pinpoint how far that proposal aligns with our analyses regarding effective human oversight as well as how it might get enriched by our conceptual understanding of effectiveness in human oversight.
[571] arXiv:2404.06407 (replaced) [pdf, ps, other]: Title: Rethinking How to Evaluate Language Model Jailbreak

Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Large language models (LLMs) have become increasingly integrated with various applications. To ensure that LLMs do not generate unsafe responses, they are aligned with safeguards that specify what content is restricted. However, such alignment can be bypassed to produce prohibited content using a technique commonly referred to as jailbreak. Different systems have been proposed to perform the jailbreak automatically. These systems rely on evaluation methods to determine whether a jailbreak attempt is successful. However, our analysis reveals that current jailbreak evaluation methods have two limitations. (1) Their objectives lack clarity and do not align with the goal of identifying unsafe responses. (2) They oversimplify the jailbreak result as a binary outcome, successful or not. In this paper, we propose three metrics, safeguard violation, informativeness, and relative truthfulness, to evaluate language model jailbreak. Additionally, we demonstrate how these metrics correlate with the goal of different malicious actors. To compute these metrics, we introduce a multifaceted approach that extends the natural language generation evaluation method after preprocessing the response. We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems. The benchmark dataset is labeled by three annotators. We compare our multifaceted approach with three existing jailbreak evaluation methods. Experiments demonstrate that our multifaceted evaluation outperforms existing methods, with F1 scores improving on average by 17% compared to existing baselines. Our findings motivate the need to move away from the binary view of the jailbreak problem and incorporate a more comprehensive evaluation to ensure the safety of the language model.
[572] arXiv:2404.08232 (replaced) [pdf, ps, other]: Title: Navigating Quantum Security Risks in Networked Environments: A Comprehensive Study of Quantum-Safe Network Protocols

Yaser Baseri, Vikas Chouhan, Abdelhakim Hafid

Journal-ref: Computers & Security, Volume 142, July 2024, 103883

Subjects: Cryptography and Security (cs.CR)

The emergence of quantum computing poses a formidable security challenge to network protocols traditionally safeguarded by classical cryptographic algorithms. This paper provides an exhaustive analysis of vulnerabilities introduced by quantum computing in a diverse array of widely utilized security protocols across the layers of the TCP/IP model, including TLS, IPsec, SSH, PGP, and more. Our investigation focuses on precisely identifying vulnerabilities susceptible to exploitation by quantum adversaries at various migration stages for each protocol while also assessing the associated risks and consequences for secure communication. We delve deep into the impact of quantum computing on each protocol, emphasizing potential threats posed by quantum attacks and scrutinizing the effectiveness of post-quantum cryptographic solutions. Through carefully evaluating vulnerabilities and risks that network protocols face in the post-quantum era, this study provides invaluable insights to guide the development of appropriate countermeasures. Our findings contribute to a broader comprehension of quantum computing's influence on network security and offer practical guidance for protocol designers, implementers, and policymakers in addressing the challenges stemming from the advancement of quantum computing. This comprehensive study is a crucial step toward fortifying the security of networked environments in the quantum age.
[573] arXiv:2404.09173 (replaced) [pdf, ps, other]: Title: TransformerFAM: Feedback attention is working memory

Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

Comments: 26 pages, 12 figures, 14 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.
[574] arXiv:2404.09236 (replaced) [pdf, ps, html, other]: Title: The complexity of convexity number and percolation time in the cycle convexity

Carlos V.G.C. Lima, Thiago Marcilon, Pedro Paulo de Medeiros

Subjects: Computational Complexity (cs.CC)

The subject of graph convexity is well explored in the literature, the so-called interval convexities above all. In this work, we explore the cycle convexity, an interval convexity whose interval function is $I(S) = S \cup \{u \mid G[S \cup \{u\}]$ has a cycle containing $u\}$. In this convexity, we prove that determine whether the convexity number of a graph $G$ is at least $k$ is \NP-complete and \W[1]-hard when parameterized by the size of the solution when $G$ is a thick spider, but polynomial when $G$ is an extended $P_4$-laden graph. We also prove that determining whether the percolation time of a graph is at least $k$ is \NP-complete even for fixed $k \geq 9$, but polynomial for cacti or for fixed $k\leq2$.
[575] arXiv:2404.10362 (replaced) [pdf, ps, html, other]: Title: 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers

Sarah Fakhoury, Markus Kuppe, Shuvendu K. Lahiri, Tahina Ramananandro, Nikhil Swamy

Subjects: Software Engineering (cs.SE)

Improper parsing of attacker-controlled input is a leading source of software security vulnerabilities, especially when programmers transcribe informal format descriptions in RFCs into efficient parsing logic in low-level, memory unsafe languages. Several researchers have proposed formal specification languages for data formats from which efficient code can be extracted. However, distilling informal requirements into formal specifications is challenging and, despite their benefits, new, formal languages are hard for people to learn and use.
In this work, we present 3DGen, a framework that makes use of AI agents to transform mixed informal input, including natural language documents (i.e., RFCs) and example inputs into format specifications in a language called 3D. To support humans in understanding and trusting the generated specifications, 3DGen uses symbolic methods to also synthesize test inputs that can be validated against an external oracle. Symbolic test generation also helps in distinguishing multiple plausible solutions. Through a process of repeated refinement, 3DGen produces a 3D specification that conforms to a test suite, and which yields safe, efficient, provably correct, parsing code in C.
We have evaluated 3DGen on 20 Internet standard formats, demonstrating the potential for AI-agents to produce formally verified C code at a non-trivial scale. A key enabler is the use of a domain-specific language to limit AI outputs to a class for which automated, symbolic analysis is tractable.
[576] arXiv:2404.10550 (replaced) [pdf, ps, other]: Title: Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem

Roumen Nikolaev Popov

Comments: 19 pages, 5 figures, supporting code available at this https URL

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems where the statistical model is a Bayesian network consisting of observations drawn from a mixture of a Gaussian distribution embedded in unrelated clutter, known as the clutter problem. The method employs the reparameterization trick to move the gradient operator inside the expectation and relies on the assumption that, because the likelihood factorizes over the observed data, the variational distribution is generally more compactly supported than the Gaussian distribution in the likelihood factors. This allows efficient local approximation of the individual likelihood factors, which leads to an analytical solution for the integral defining the gradient expectation. We integrate the proposed gradient approximation as the expectation step in an EM (Expectation Maximization) algorithm for maximizing ELBO and test against classical deterministic approaches in Bayesian inference, such as the Laplace approximation, Expectation Propagation and Mean-Field Variational Inference. The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
[577] arXiv:2404.10966 (replaced) [pdf, ps, other]: Title: Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation

Yeonguk Yu, Sungho Shin, Seunghyeok Back, Minhwan Ko, Sangjun Noh, Kyoobin Lee

Comments: Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Test-time adaptation (TTA) aims to adapt a pre-trained model to a new test domain without access to source data after deployment. Existing approaches typically rely on self-training with pseudo-labels since ground-truth cannot be obtained from test data. Although the quality of pseudo labels is important for stable and accurate long-term adaptation, it has not been previously addressed. In this work, we propose DPLOT, a simple yet effective TTA framework that consists of two components: (1) domain-specific block selection and (2) pseudo-label generation using paired-view images. Specifically, we select blocks that involve domain-specific feature extraction and train these blocks by entropy minimization. After blocks are adjusted for current test domain, we generate pseudo-labels by averaging given test images and corresponding flipped counterparts. By simply using flip augmentation, we prevent a decrease in the quality of the pseudo-labels, which can be caused by the domain gap resulting from strong augmentation. Our experimental results demonstrate that DPLOT outperforms previous TTA methods in CIFAR10-C, CIFAR100-C, and ImageNet-C benchmarks, reducing error by up to 5.4%, 9.1%, and 2.9%, respectively. Also, we provide an extensive analysis to demonstrate effectiveness of our framework. Code is available at this https URL.
[578] arXiv:2404.12149 (replaced) [pdf, ps, html, other]: Title: AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Yihua Shao, Hongyi Cai, Xinwei Long, Weiyi Lang, Zhe Wang, Haoran Wu, Yan Wang, Jiayi Yin, Yang Yang, Yisheng Lv, Zhen Lei

Subjects: Artificial Intelligence (cs.AI)

Intelligent vehicles have demonstrated excellent capabilities in many transportation scenarios. The inference capabilities of neural networks using cameras limit the accuracy of accident detection in complex transportation systems. This paper presents AccidentBlip2, a pure vision-based multi-modal large model Blip2 for accident detection. Our method first processes the multi-view images through ViT-14g and sends the multi-view features into the cross-attention layer of Q-Former. Different from Blip2's Q-Former, our Motion Q-Former extends the self-attention layer with the temporal-attention layer. In the inference process, the queries generated from previous frames are input into Motion Q-Former to aggregate temporal information. Queries are updated with an auto-regressive strategy and are sent to a MLP to detect whether there is an accident in the surrounding environment. Our AccidentBlip2 can be extended to a multi-vehicle cooperative system by deploying Motion Q-Former on each vehicle and simultaneously fusing the generated queries into the MLP for auto-regressive inference. Our approach outperforms existing video large language models in detection accuracy in both single-vehicle and multi-vehicle systems.
[579] arXiv:2404.12489 (replaced) [pdf, ps, html, other]: Title: Grammatical Error Correction for Code-Switched Sentences by Learners of English

Kelvin Wey Han Chan, Christopher Bryant, Li Nguyen, Andrew Caines, Zheng Yuan

Subjects: Computation and Language (cs.CL)

Code-switching (CSW) is a common phenomenon among multilingual speakers where multiple languages are used in a single discourse or utterance. Mixed language utterances may still contain grammatical errors however, yet most existing Grammar Error Correction (GEC) systems have been trained on monolingual data and not developed with CSW in mind. In this work, we conduct the first exploration into the use of GEC systems on CSW text. Through this exploration, we propose a novel method of generating synthetic CSW GEC datasets by translating different spans of text within existing GEC corpora. We then investigate different methods of selecting these spans based on CSW ratio, switch-point factor and linguistic constraints, and identify how they affect the performance of GEC systems on CSW text. Our best model achieves an average increase of 1.57 $F_{0.5}$ across 3 CSW test sets (English-Chinese, English-Korean and English-Japanese) without affecting the model's performance on a monolingual dataset. We furthermore discovered that models trained on one CSW language generalise relatively well to other typologically similar CSW languages.
[580] arXiv:2404.12917 (replaced) [pdf, ps, other]: Title: Zero-Shot Stitching in Reinforcement Learning using Relative Representations

Antonio Pio Ricciardi, Valentino Maiorca, Luca Moschella, Riccardo Marin, Emanuele Rodolà

Comments: 13 pages, 10 figures, 4 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Visual Reinforcement Learning is a popular and powerful framework that takes full advantage of the Deep Learning breakthrough. However, it is also known that variations in the input (e.g., different colors of the panorama due to the season of the year) or the task (e.g., changing the speed limit for a car to respect) could require complete retraining of the agents. In this work, we leverage recent developments in unifying latent representations to demonstrate that it is possible to combine the components of an agent, rather than retrain it from scratch. We build upon the recent relative representations framework and adapt it for Visual RL. This allows us to create completely new agents capable of handling environment-task combinations never seen during training. Our work paves the road toward a more accessible and flexible use of reinforcement learning.
[581] arXiv:2404.12953 (replaced) [pdf, ps, other]: Title: Low-Depth Spatial Tree Algorithms

Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski

Comments: to appear at IPDPS 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.
[582] arXiv:2404.13288 (replaced) [pdf, ps, other]: Title: PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks

Zirui Zang, Ahmad Amine, Rahul Mangharam

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality. While SOTA models are becoming increasingly accurate, they can still be unwieldy due to high computational costs. In this paper, we propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and poses for a given scene. Our model achieves similar performance to the SOTA while being faster to train and only requiring offline rendering of low-resolution synthetic data. By using normalizing flows, the proposed method also provides uncertainty estimation for the output. We also demonstrated the efficiency of this method by deploying the model on a mobile robot.
[583] arXiv:2404.13376 (replaced) [pdf, ps, html, other]: Title: Cross-Forming Control and Fault Current Limiting for Grid-Forming Inverters

Xiuqiang He, Maitraya Avadhut Desai, Linbin Huang, Florian Dörfler

Subjects: Systems and Control (eess.SY)

We propose a novel "cross-forming" control concept for grid-forming inverters operating against grid faults. Cross-forming refers to voltage angle forming and current magnitude forming, differing from classical grid-forming and grid-following concepts, i.e., voltage magnitude-and-angle forming and current magnitude-and-angle forming, respectively. Unlike purely grid-forming or grid-following paradigms, the cross-forming concept is motivated by device security requirements for fault current limitation and meanwhile grid code requirements for voltage angle forming preserving. We propose two feasible cross-forming control implementations, enabling inverters to quickly limit fault currents at a prescribed level and preserve voltage angle forming for grid-forming synchronization and dynamic ancillary services provision, during symmetrical or asymmetrical fault ride-through. Moreover, the cross-forming control yields an equivalent system featuring a constant virtual impedance and a "normal form" of representation, allowing us to extend previously established transient stability results to encompass scenarios of current saturation. Simulations and hardware experiments validate the efficacy of the proposed cross-forming control.
[584] arXiv:2404.13904 (replaced) [pdf, ps, html, other]: Title: Deep Regression Representation Learning with Topology

Shihao Zhang, kenji kawaguchi, Angela Yao

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Most works studying representation learning focus only on classification and neglect regression. Yet, the learning objectives and therefore the representation topologies of the two tasks are fundamentally different: classification targets class separation, leading to disconnected representations, whereas regression requires ordinality with respect to the target, leading to continuous representations. We thus wonder how the effectiveness of a regression representation is influenced by its topology, with evaluation based on the Information Bottleneck (IB) principle.
The IB principle is an important framework that provides principles for learning effectiveness representations. We establish two connections between it and the topology of regression representations. The first connection reveals that a lower intrinsic dimension of the feature space implies a reduced complexity of the representation Z. This complexity can be quantified as the conditional entropy of Z on the target space Y and serves as an upper bound on the generalization error. The second connection suggests learning a feature space that is topologically similar to the target space will better align with the IB principle. Based on these two connections, we introduce PH-Reg, a regularizer specific to regression that matches the intrinsic dimension and topology of the feature space with the target space. Experiments on synthetic and real-world regression tasks demonstrate the benefits of PH-Reg.
[585] arXiv:2404.15867 (replaced) [pdf, ps, other]: Title: A Generalization of Relative Entropy to Count Vectors and its Concentration Property

Kostas N. Oikonomou

Comments: 38 pages, fixed a typo on p. 3 top

Subjects: Information Theory (cs.IT)

We introduce a new generalization of relative entropy to non-negative vectors with sums $\gt 1$. We show in a purely combinatorial setting, with no probabilistic considerations, that in the presence of linear constraints defining a convex polytope, a concentration phenomenon arises for this generalized relative entropy, and we quantify the concentration precisely. We also present a probabilistic formulation, and extend the concentration results to it. In addition, we provide a number of simplifications and improvements to our previous work, notably in dualizing the optimization problem, in the concentration with respect to $\ell_{\infty}$ distance, and in the relationship to generalized KL-divergence. A number of our results apply to general compact convex sets, not necessarily polyhedral.
[586] arXiv:2404.15888 (replaced) [pdf, ps, other]: Title: Near-Optimal Wafer-Scale Reduce

Piotr Luczynski, Lukas Gianinazzi, Patrick Iff, Leighton Wilson, Daniele De Sensi, Torsten Hoefler

Comments: To appear at HPDC 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Efficient Reduce and AllReduce communication collectives are a critical cornerstone of high-performance computing (HPC) applications. We present the first systematic investigation of Reduce and AllReduce on the Cerebras Wafer-Scale Engine (WSE). This architecture has been shown to achieve unprecedented performance both for machine learning workloads and other computational problems like FFT. We introduce a performance model to estimate the execution time of algorithms on the WSE and validate our predictions experimentally for a wide range of input sizes. In addition to existing implementations, we design and implement several new algorithms specifically tailored to the architecture. Moreover, we establish a lower bound for the runtime of a Reduce operation on the WSE. Based on our model, we automatically generate code that achieves near-optimal performance across the whole range of input sizes. Experiments demonstrate that our new Reduce and AllReduce algorithms outperform the current vendor solution by up to 3.27x. Additionally, our model predicts performance with less than 4% error. The proposed communication collectives increase the range of HPC applications that can benefit from the high throughput of the WSE. Our model-driven methodology demonstrates a disciplined approach that can lead the way to further algorithmic advancements on wafer-scale architectures.
[587] arXiv:2404.16044 (replaced) [pdf, ps, other]: Title: Toward the Categorical Data Map

Frederik L. Dennig, Lucas Joos, Patrick Paetzold, Daniela Blumberg, Oliver Deussen, Daniel A. Keim, Maximilian T. Fischer

Comments: 12 pages, 10 figures, LaTeX; formatting

Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)

Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, encoding attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot's visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based approach, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and show its benefits through an expert study with five data scientists analyzing the Titanic and Mushroom datasets with up to 23 attributes and 8124 category combinations. Our results indicate that the Categorical Data Map offers an effective analysis method, especially for large datasets with a high number of category combinations.
[588] arXiv:2404.16436 (replaced) [pdf, ps, other]: Title: Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

Ben Williams, Bart van Merriënboer, Vincent Dumoulin, Jenny Hamer, Eleni Triantafillou, Abram B. Fleishman, Matthew McKown, Jill E. Munger, Aaron N. Rice, Ashlee Lillis, Clemency E. White, Catherine A. D. Hobbs, Tries B. Razak, Kate E. Jones, Tom Denton

Comments: 18 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pretraining strategy for a data-deficient domain using coral reef bioacoustics. We assemble ReefSet, a large annotated library of reef sounds, though modest compared to bird libraries at 2% of the sample count. Through testing few-shot transfer learning performance, we observe that pretraining on bird audio provides notably superior generalizability compared to pretraining on ReefSet or unrelated audio alone. However, our key findings show that cross-domain mixing which leverages bird, reef and unrelated audio during pretraining maximizes reef generalizability. SurfPerch, our pretrained network, provides a strong foundation for automated analysis of marine PAM data with minimal annotation and compute costs.
[589] arXiv:2404.16571 (replaced) [pdf, ps, html, other]: Title: MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

Zhiwei Wang, Ying Zhou, Shiquan He, Ting Li, Fan Huang, Qiang Ding, Xinxia Feng, Mei Liu, Qiang Li

Comments: 11 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on four endoscopic datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%, 9.38%, 9.90% and 3.17%, respectively.
[590] arXiv:2404.16687 (replaced) [pdf, ps, other]: Title: NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng, Jianquan Yang, Weigang Wang, Xi Fang, Xiaoxin Lv, Jun Yan, Tianwu Zhi, Yabin Zhang, Yaohui Li, Yang Li, Jingwen Xu, Jianzhao Liu, Yiting Liao, Junlin Li, Zihao Yu, Yiting Lu, Xin Li, Hossein Motamednia, S.Farhad Hosseini-Benvidi, Fengbin Guan, Ahmad Mahmoudi-Aznaveh, Azadeh Mansouri, Ganzorig Gankhuyag, Kihwan Yoon, Yifang Xu, Haotian Fan, Fangyuan Kong, Shiling Zhao, Weifeng Dong, Haibing Yin, Li Zhu, Zhiling Wang, Bingchen Huang, Avinab Saha, Sandeep Mishra, Shashank Gupta, Rajesh Sureddi, Oindrila Saha, Luigi Celona, Simone Bianco, Paolo Napoletano, Raimondo Schettini, Junfeng Yang, Jing Fu, Wei Zhang, Wenzhi Cao, Limei Liu, Han Peng, Weijun Yuan, Zhan Li, Yihang Cheng, Yifan Deng, Haohui Li, Bowen Qu, Yao Li, Shuqing Luo, Shunzhou Wang, Wei Gao, Zihao Lu, Marcos V.Conde, Xinrui Wang, Zhibo Chen, Ruling Liao, Yan Ye, Qiulin Wang, Bing Li, Zhaokun Zhou, Miao Geng, Rui Chen, Xin Tao, Xiaoyu Liang, Shangkun Sun, Xingyuan Ma, Jiaze Li, Mengduo Yang, Haoran Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC.
[591] arXiv:2404.17039 (replaced) [pdf, ps, html, other]: Title: Differentiating Through Linear Solvers

Paul Hovland, Jan Hückelheim

Subjects: Mathematical Software (cs.MS); Numerical Analysis (math.NA)

Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we are not aware of prior work comparing the accuracy of both approaches. With this article we thus empirically study a simple question: What happens if we ignore common wisdom, and differentiate through linear solvers?
[592] arXiv:2404.17589 (replaced) [pdf, ps, other]: Title: An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However, the off-policy RL algorithms used for MTF so far have the following severe problems: 1) to avoid out-of-distribution (OOD) problem, their constraints are overly strict, which seriously damage their performance; 2) they are unaware of the exploration policy used for producing training data and never interact with real environment, so only suboptimal policy can be learned; 3) the traditional exploration policies are inefficient and hurt user experience. To solve the above problems, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs. IntegratedRL-MTF integrates off-policy RL model with our online exploration policy to relax overstrict and complicated constraints, which significantly improves its performance. We also design an extremely efficient exploration policy, which eliminates low-value exploration space and focuses on exploring potential high-value state-action pairs. Moreover, we adopt progressive training mode to further enhance our model's performance with the help of our exploration policy. We conduct extensive offline and online experiments in the short video channel of Tencent News. The results demonstrate that our model outperforms other models remarkably. IntegratedRL-MTF has been fully deployed in our RS and other large-scale RSs in Tencent, which have achieved significant improvements.
[593] arXiv:2404.17626 (replaced) [pdf, ps, other]: Title: Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank

Thomas Le Menestrel, Erin Craig, Robert Tibshirani, Trevor Hastie, Manuel Rivas

Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)

Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores (p-value < 0.05), found for diabetes, arthritis, gall stones, cystitis, asthma and osteoarthritis. For the interaction and pretrained models that outperformed the baseline, the PRS score was the primary driver behind prediction. Our findings indicate that both interaction terms and pre-training can enhance prediction accuracy but for a limited set of diseases and moderate improvements in accuracy
[594] arXiv:2404.18255 (replaced) [pdf, ps, other]: Title: PatentGPT: A Large Language Model for Intellectual Property

Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang, Weilei Wang, Changyang Tu

Comments: 19 pages, 9 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a low-cost, standardized procedure for training IP-oriented LLMs, meeting the unique requirements of the IP domain. Using this standard process, we have trained the PatentGPT series models based on open-source pretrained models. By evaluating them on the open-source IP-oriented benchmark MOZIP, our domain-specific LLMs outperforms GPT-4, indicating the effectiveness of the proposed training procedure and the expertise of the PatentGPT models in the IP domain. Remarkably, our model surpassed GPT-4 on the 2019 China Patent Agent Qualification Examination, scoring 65 and matching human expert levels. Additionally, the PatentGPT model, which utilizes the SMoE architecture, achieves performance comparable to that of GPT-4 in the IP domain and demonstrates a better cost-performance ratio on long-text tasks, potentially serving as an alternative to GPT-4 within the IP domain.
[595] arXiv:2404.18465 (replaced) [pdf, ps, other]: Title: M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive multi-domain multi-task mixture-of-experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.
[596] arXiv:2404.18539 (replaced) [pdf, ps, html, other]: Title: Enhancing Boundary Segmentation for Topological Accuracy with Skeleton-based Methods

Chuni Liu, Boyuan Ma, Xiaojuan Ban, Yujie Xie, Hao Wang, Weihua Xue, Jingchao Ma, Ke Xu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Topological consistency plays a crucial role in the task of boundary segmentation for reticular images, such as cell membrane segmentation in neuron electron microscopic images, grain boundary segmentation in material microscopic images and road segmentation in aerial images. In these fields, topological changes in segmentation results have a serious impact on the downstream tasks, which can even exceed the misalignment of the boundary itself. To enhance the topology accuracy in segmentation results, we propose the Skea-Topo Aware loss, which is a novel loss function that takes into account the shape of each object and topological significance of the pixels. It consists of two components. First, a skeleton-aware weighted loss improves the segmentation accuracy by better modeling the object geometry with skeletons. Second, a boundary rectified term effectively identifies and emphasizes topological critical pixels in the prediction errors using both foreground and background skeletons in the ground truth and predictions. Experiments prove that our method improves topological consistency by up to 7 points in VI compared to 13 state-of-art methods, based on objective and subjective assessments across three different boundary segmentation datasets. The code is available at this https URL.
[597] arXiv:2404.18572 (replaced) [pdf, ps, other]: Title: Learning Governing Equations of Unobserved States in Dynamical Systems

Gevik Grigorian, Sandip V. George, Simon Arridge

Subjects: Machine Learning (cs.LG)

Data-driven modelling and scientific machine learning have been responsible for significant advances in determining suitable models to describe data. Within dynamical systems, neural ordinary differential equations (ODEs), where the system equations are set to be governed by a neural network, have become a popular tool for this challenge in recent years. However, less emphasis has been placed on systems that are only partially-observed. In this work, we employ a hybrid neural ODE structure, where the system equations are governed by a combination of a neural network and domain-specific knowledge, together with symbolic regression (SR), to learn governing equations of partially-observed dynamical systems. We test this approach on two case studies: A 3-dimensional model of the Lotka-Volterra system and a 5-dimensional model of the Lorenz system. We demonstrate that the method is capable of successfully learning the true underlying governing equations of unobserved states within these systems, with robustness to measurement noise.
[598] arXiv:2404.18810 (replaced) [pdf, ps, other]: Title: Unknown Script: Impact of Script on Cross-Lingual Transfer

Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Comments: Paper accepted to NAACL Student Research Workshop (SRW) 2024

Subjects: Computation and Language (cs.CL)

Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.
[599] arXiv:2404.19290 (replaced) [pdf, ps, html, other]: Title: Efficient inverse $Z$-transform and Wiener-Hopf factorization

Svetlana Boyarchenko, Sergei Levendorskiĭ

Subjects: Numerical Analysis (math.NA); Computational Finance (q-fin.CP)

We suggest new closely related methods for numerical inversion of $Z$-transform and Wiener-Hopf factorization of functions on the unit circle, based on sinh-deformations of the contours of integration, corresponding changes of variables and the simplified trapezoid rule. As applications, we consider evaluation of high moments of probability distributions and construction of causal filters. Programs in Matlab running on a Mac with moderate characteristics achieves the precision E-14 in several dozen of microseconds and E-11 in several milliseconds, respectively.
[600] arXiv:2404.19475 (replaced) [pdf, ps, other]: Title: TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models

Teng Zhou, Yongchuan Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have emerged as effective tools for generating diverse and high-quality content. However, their capability in high-resolution image generation, particularly for panoramic images, still faces challenges such as visible seams and incoherent transitions. In this paper, we propose TwinDiffusion, an optimized framework designed to address these challenges through two key innovations: Crop Fusion for quality enhancement and Cross Sampling for efficiency optimization. We introduce a training-free optimizing stage to refine the similarity of the adjacent image areas, as well as an interleaving sampling strategy to yield dynamic patches during the cropping process. A comprehensive evaluation is conducted to compare TwinDiffusion with the existing methods, considering factors including coherence, fidelity, compatibility, and efficiency. The results demonstrate the superior performance of our approach in generating seamless and coherent panoramas, setting a new standard in quality and efficiency for panoramic image generation.
[601] arXiv:2404.19705 (replaced) [pdf, ps, html, other]: Title: When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively

Tiziano Labruna, Jon Ander Campos, Gorka Azkune

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answer a given question. Given the performance of IR systems, the optimal strategy for question answering does not always entail external information retrieval; rather, it often involves leveraging the parametric memory of the LLM itself. Prior research has identified this phenomenon in the PopQA dataset, wherein the most popular questions are effectively addressed using the LLM's parametric memory, while less popular ones require IR system usage. Following this, we propose a tailored training approach for LLMs, leveraging existing open-domain question answering datasets. Here, LLMs are trained to generate a special token, <RET>, when they do not know the answer to a question. Our evaluation of the Adaptive Retrieval LLM (Adapt-LLM) on the PopQA dataset showcases improvements over the same LLM under three configurations: (i) retrieving information for all the questions, (ii) using always the parametric memory of the LLM, and (iii) using a popularity threshold to decide when to use a retriever. Through our analysis, we demonstrate that Adapt-LLM is able to generate the <RET> token when it determines that it does not know how to answer a question, indicating the need for IR, while it achieves notably high accuracy levels when it chooses to rely only on its parametric memory.
[602] arXiv:2404.19733 (replaced) [pdf, ps, other]: Title: Iterative Reasoning Preference Optimization

Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy on GSM8K, MATH, and ARC-Challenge for Llama-2-70B-Chat, outperforming other Llama-2-based models not relying on additionally sourced datasets. For example, we see a large improvement from 55.6% to 81.6% on GSM8K and an accuracy of 88.7% with majority voting out of 32 samples.
[603] arXiv:2405.00016 (replaced) [pdf, ps, other]: Title: HPX with Spack and Singularity Containers: Evaluating Overheads for HPX/Kokkos using an astrophysics application

Patrick Diehl, Steven R. Brandt, Gregor Daiß, Hartmut Kaiser

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Methods for Astrophysics (astro-ph.IM)

Cloud computing for high performance computing resources is an emerging topic. This service is of interest to researchers who care about reproducible computing, for software packages with complex installations, and for companies or researchers who need the compute resources only occasionally or do not want to run and maintain a supercomputer on their own. The connection between HPC and containers is exemplified by the fact that Microsoft Azure's Eagle cloud service machine is number three on the November 23 Top 500 list. For cloud services, the HPC application and dependencies are installed in containers, e.g. Docker, Singularity, or something else, and these containers are executed on the physical hardware. Although containerization leverages the existing Linux kernel and should not impose overheads on the computation, there is the possibility that machine-specific optimizations might be lost, particularly machine-specific installs of commonly used packages. In this paper, we will use an astrophysics application using HPX-Kokkos and measure overheads on homogeneous resources, e.g. Supercomputer Fugaku, using CPUs only and on heterogenous resources, e.g. LSU's hybrid CPU and GPU system. We will report on challenges in compiling, running, and using the containers as well as performance performance differences.
[604] arXiv:2405.00285 (replaced) [pdf, ps, html, other]: Title: iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

Yifan Guo, Zhongqiang Ren, Chen Wang

Comments: 8 pages, 3 figures, 3 tables

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).
[605] arXiv:2405.00322 (replaced) [pdf, ps, html, other]: Title: Characterizing Information Seeking Processes with Multiple Physiological Signals

Kaixin Ji, Danula Hettiachchi, Flora D. Salim, Falk Scholer, Damiano Spina

Journal-ref: In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, Washington, DC, USA. ACM, New York, NY, USA, 12 pages

Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

Information access systems are getting complex, and our understanding of user behavior during information seeking processes is mainly drawn from qualitative methods, such as observational studies or surveys. Leveraging the advances in sensing technologies, our study aims to characterize user behaviors with physiological signals, particularly in relation to cognitive load, affective arousal, and valence. We conduct a controlled lab study with 26 participants, and collect data including Electrodermal Activities, Photoplethysmogram, Electroencephalogram, and Pupillary Responses. This study examines informational search with four stages: the realization of Information Need (IN), Query Formulation (QF), Query Submission (QS), and Relevance Judgment (RJ). We also include different interaction modalities to represent modern systems, e.g., QS by text-typing or verbalizing, and RJ with text or audio information. We analyze the physiological signals across these stages and report outcomes of pairwise non-parametric repeated-measure statistical tests. The results show that participants experience significantly higher cognitive loads at IN with a subtle increase in alertness, while QF requires higher attention. QS involves demanding cognitive loads than QF. Affective responses are more pronounced at RJ than QS or IN, suggesting greater interest and engagement as knowledge gaps are resolved. To the best of our knowledge, this is the first study that explores user behaviors in a search process employing a more nuanced quantitative analysis of physiological signals. Our findings offer valuable insights into user behavior and emotional responses in information seeking processes. We believe our proposed methodology can inform the characterization of more complex processes, such as conversational information seeking.
[606] arXiv:2405.00423 (replaced) [pdf, ps, html, other]: Title: $\alpha$-leakage by R\'{e}nyi Divergence and Sibson Mutual Information

Ni Ding, Mohammad Amin Zarrabian, Parastoo Sadeghi

Comments: This paper has been reduced greatly from its original sole auther version due to a peculiar authorship issue

Subjects: Information Theory (cs.IT)

For $\tilde{f}(t) = \exp(\frac{\alpha-1}{\alpha}t)$, this paper proposes a $\tilde{f}$-mean information gain measure. Rényi divergence is shown to be the maximum $\tilde{f}$-mean information gain incurred at each elementary event $y$ of channel output $Y$ and Sibson mutual information is the $\tilde{f}$-mean of this $Y$-elementary information gain. Both are proposed as $\alpha$-leakage measures, indicating the most information an adversary can obtain on sensitive data. It is shown that the existing $\alpha$-leakage by Arimoto mutual information can be expressed as $\tilde{f}$-mean measures by a scaled probability. Further, Sibson mutual information is interpreted as the maximum $\tilde{f}$-mean information gain over all estimation decisions applied to channel output. This reveals that the exiting generalized Blahut-Arimoto method for computing Rényi capacity (or Gallager's error exponent) in fact maximizes a $\tilde{f}$-mean information gain iteratively over estimation decision and channel input. This paper also derives a decomposition of $\tilde{f}$-mean information gain, analogous to the Sibson identity for Rényi divergence.
[607] arXiv:2405.00596 (replaced) [pdf, ps, html, other]: Title: Unbundle-Rewrite-Rebundle: Runtime Detection and Rewriting of Privacy-Harming Code in JavaScript Bundles

Mir Masood Ali, Peter Snyder, Chris Kanich, Hamed Haddadi

Subjects: Cryptography and Security (cs.CR)

This work presents Unbundle-Rewrite-Rebundle (URR), a system for detecting privacy-harming portions of bundled JavaScript code, and rewriting that code at runtime to remove the privacy harming behavior without breaking the surrounding code or overall application. URR is a novel solution to the problem of JavaScript bundles, where websites pre-compile multiple code units into a single file, making it impossible for content filters and ad-blockers to differentiate between desired and unwanted resources. Where traditional content filtering tools rely on URLs, URR analyzes the code at the AST level, and replaces harmful AST sub-trees with privacy-and-functionality maintaining alternatives.
We present an open-sourced implementation of URR as a Firefox extension, and evaluate it against JavaScript bundles generated by the most popular bundling system (Webpack) deployed on the Tranco 10k. We measure the performance, measured by precision (1.00), recall (0.95), and speed (0.43s per-script) when detecting and rewriting three representative privacy harming libraries often included in JavaScript bundles, and find URR to be an effective approach to a large-and-growing blind spot unaddressed by current privacy tools.
[608] arXiv:2405.01273 (replaced) [pdf, ps, html, other]: Title: Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration

Praveen Kumar Chandaliya, Kiran Raja, Raghavendra Ramachandra, Zahid Akhtar, Christoph Busch

Comments: 8 Pages

Journal-ref: Automatic Face and Gesture Recognition 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Numerous studies have shown that existing Face Recognition Systems (FRS), including commercial ones, often exhibit biases toward certain ethnicities due to under-represented data. In this work, we explore ethnicity alteration and skin tone modification using synthetic face image generation methods to increase the diversity of datasets. We conduct a detailed analysis by first constructing a balanced face image dataset representing three ethnicities: Asian, Black, and Indian. We then make use of existing Generative Adversarial Network-based (GAN) image-to-image translation and manifold learning models to alter the ethnicity from one to another. A systematic analysis is further conducted to assess the suitability of such datasets for FRS by studying the realistic skin-tone representation using Individual Typology Angle (ITA). Further, we also analyze the quality characteristics using existing Face image quality assessment (FIQA) approaches. We then provide a holistic FRS performance analysis using four different systems. Our findings pave the way for future research works in (i) developing both specific ethnicity and general (any to any) ethnicity alteration models, (ii) expanding such approaches to create databases with diverse skin tones, (iii) creating datasets representing various ethnicities which further can help in mitigating bias while addressing privacy concerns.
[609] arXiv:2405.01507 (replaced) [pdf, ps, other]: Title: Accelerating Convergence in Bayesian Few-Shot Classification

Tianjun Ke, Haoqun Cao, Feng Zhou

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at this https URL.
[610] arXiv:2405.01510 (replaced) [pdf, ps, html, other]: Title: Reverse Influential Community Search Over Social Networks (Technical Report)

Qi Wen, Nan Zhang, Yutong Ye, Xiang Lian, Mingsong Chen

Subjects: Social and Information Networks (cs.SI); Databases (cs.DB)

As an important fundamental task of numerous real-world applications such as social network analysis and online advertising/marketing, several prior works studied influential community search, which retrieves a community with high structural cohesiveness and maximum influences on other users in social networks. However, previous works usually considered the influences of the community on arbitrary users in social networks, rather than specific groups (e.g., customer groups, or senior communities). Inspired by this, we propose a novel Reverse Influential Community Search (RICS) problem, which obtains a seed community with the maximum influence on a user-specified target community, satisfying both structural and keyword constraints. To efficiently tackle the RICS problem, we design effective pruning strategies to filter out false alarms of candidate seed communities, and propose an effective index mechanism to facilitate the community retrieval. We also formulate and tackle an RICS variant, named Relaxed Reverse Influential Community Search (R2ICS), which returns a subgraph with the relaxed structural constraints and having the maximum influence on a user-specified target community. Comprehensive experiments have been conducted to verify the efficiency and effectiveness of our RICS and R2ICS approaches on both real-world and synthetic social networks under various parameter settings.
[611] arXiv:2405.01649 (replaced) [pdf, ps, html, other]: Title: Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, Dacheng Tao

Comments: arXiv admin note: text overlap with arXiv:2305.01157, arXiv:2212.09567 by other authors

Subjects: Computation and Language (cs.CL)

Answering complex logical queries over incomplete knowledge graphs (KGs) is challenging. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propose a complex logical reasoning schema over knowledge graphs upon large language models (LLMs), containing a curriculum-based logical-aware instruction tuning framework, named LACT. Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs. To address the difficulty gap among different types of complex queries, we design a simple and flexible logic-aware curriculum learning framework. Experiments across widely used datasets demonstrate that LACT has substantial improvements~(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art. Our code and model will be released at GitHub and huggingface soon.
[612] arXiv:2405.01714 (replaced) [pdf, ps, html, other]: Title: Interpretable Vital Sign Forecasting with Model Agnostic Attention Maps

Yuwei Liu, Chen Dan, Anubhav Bhatti, Bingjie Shen, Divij Gupta, Suraj Parmar, San Lee

Comments: 8 pages, 4 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like ICUs. This paper introduces a framework that combines a deep learning model with an attention mechanism that highlights the critical time steps in the forecasting process, thus improving model interpretability and supporting clinical decision-making. We show that the attention mechanism could be adapted to various black box time series forecasting models such as N-HiTS and N-BEATS. Our method preserves the accuracy of conventional deep learning models while enhancing interpretability through attention-weight-generated heatmaps. We evaluated our model on the eICU-CRD dataset, focusing on forecasting vital signs for sepsis patients. We assessed its performance using mean squared error (MSE) and dynamic time warping (DTW) metrics. We explored the attention maps of N-HiTS and N-BEATS, examining the differences in their performance and identifying crucial factors influencing vital sign forecasting.
[613] arXiv:2405.01937 (replaced) [pdf, ps, other]: Title: An Attention Based Pipeline for Identifying Pre-Cancer Lesions in Head and Neck Clinical Images

Abdullah Alsalemi, Anza Shakeel, Mollie Clark, Syed Ali Khurram, Shan E Ahmed Raza

Comments: 5 pages, 3 figures, accepted in ISBI 2024, update: corrected typos

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Early detection of cancer can help improve patient prognosis by early intervention. Head and neck cancer is diagnosed in specialist centres after a surgical biopsy, however, there is a potential for these to be missed leading to delayed diagnosis. To overcome these challenges, we present an attention based pipeline that identifies suspected lesions, segments, and classifies them as non-dysplastic, dysplastic and cancerous lesions. We propose (a) a vision transformer based Mask R-CNN network for lesion detection and segmentation of clinical images, and (b) Multiple Instance Learning (MIL) based scheme for classification. Current results show that the segmentation model produces segmentation masks and bounding boxes with up to 82% overlap accuracy score on unseen external test data and surpassing reviewed segmentation benchmarks. Next, a classification F1-score of 85% on the internal cohort test set. An app has been developed to perform lesion segmentation taken via a smart device. Future work involves employing endoscopic video data for precise early detection and prognosis.
[614] arXiv:2405.01975 (replaced) [pdf, ps, html, other]: Title: Introducing a microstructure-embedded autoencoder approach for reconstructing high-resolution solution field data from a reduced parametric space

Rasoul Najafi Koopas, Shahed Rezaei, Natalie Rauter, Richard Ostwald, Rolf Lammering

Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

In this study, we develop a novel multi-fidelity deep learning approach that transforms low-fidelity solution maps into high-fidelity ones by incorporating parametric space information into a standard autoencoder architecture. This method's integration of parametric space information significantly reduces the need for training data to effectively predict high-fidelity solutions from low-fidelity ones. In this study, we examine a two-dimensional steady-state heat transfer analysis within a highly heterogeneous materials microstructure. The heat conductivity coefficients for two different materials are condensed from a 101 x 101 grid to smaller grids. We then solve the boundary value problem on the coarsest grid using a pre-trained physics-informed neural operator network known as Finite Operator Learning (FOL). The resulting low-fidelity solution is subsequently upscaled back to a 101 x 101 grid using a newly designed enhanced autoencoder. The novelty of the developed enhanced autoencoder lies in the concatenation of heat conductivity maps of different resolutions to the decoder segment in distinct steps. Hence the developed algorithm is named microstructure-embedded autoencoder (MEA). We compare the MEA outcomes with those from finite element methods, the standard U-Net, and various other upscaling techniques, including interpolation functions and feedforward neural networks (FFNN). Our analysis shows that MEA outperforms these methods in terms of computational efficiency and error on test cases. As a result, the MEA serves as a potential supplement to neural operator networks, effectively upscaling low-fidelity solutions to high fidelity while preserving critical details often lost in traditional upscaling methods, particularly at sharp interfaces like those seen with interpolation.
[615] arXiv:2405.02180 (replaced) [pdf, ps, html, other]: Title: A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction

Weijie Xia, Chenguang Wang, Peter Palensky, Pedro P. Vergara

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, particularly as diverse low-carbon technologies are increasingly integrated. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it shows superior scalability in different datasets compared to traditional statistical models, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.
[616] arXiv:2405.02299 (replaced) [pdf, ps, other]: Title: Deep Reinforcement Learning for Modelling Protein Complexes

Ziqi Gao, Tao Feng, Jiaxuan You, Chenyi Zi, Yan Zhou, Chen Zhang, Jia Li

Comments: International Conference on Learning Representations (ICLR 2024)

Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modelling, PCM). However, there are still two challenges: 1) The huge combinatorial optimization space of $N^{N-2}$ ($N$ is the number of chains) for the PCM problem can easily lead to high computational cost. 2) The scales of protein complexes exhibit distribution shift due to variance in chain numbers, which calls for the generalization in modelling complexes of various scales. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PCM prediction. Specifically, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we design an adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of complexes and the global assembly rules learned from complexes with varied chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading PCM softwares.
[617] arXiv:2405.02349 (replaced) [pdf, ps, other]: Title: Explainable Multi-Label Classification of MBTI Types

Siana Kong, Marina Sokolova

Comments: 22 pages, 12 tables, 2 figure

Subjects: Machine Learning (cs.LG)

In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.
[618] arXiv:2405.02358 (replaced) [pdf, ps, other]: Title: A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model

Jiexia Ye, Weiqi Zhang, Ke Yi, Yongzi Yu, Ziyue Li, Jia Li, Fugee Tsung

Comments: 5 figures, 6 tables, 41 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. This success has sparked interest in the exploration of foundation models to solve multiple time series challenges simultaneously. There are two main research lines, namely pre-training foundation models from scratch for time series and adapting large language foundation models for time series. They both contribute to the development of a unified model that is highly generalizable, versatile, and comprehensible for time series analysis. This survey offers a 3E analytical framework for comprehensive examination of related research. Specifically, we examine existing works from three dimensions, namely Effectiveness, Efficiency and Explainability. In each dimension, we focus on discussing how related works devise tailored solution by considering unique challenges in the realm of time series. Furthermore, we provide a domain taxonomy to help followers keep up with the domain-specific advancements. In addition, we introduce extensive resources to facilitate the field's development, including datasets, open-source, time series libraries. A GitHub repository is also maintained for resource updates (this https URL).
[619] arXiv:2405.02431 (replaced) [pdf, ps, other]: Title: Delphi: Efficient Asynchronous Approximate Agreement for Distributed Oracles

Akhil Bandarupalli, Adithya Bhat, Saurabh Bagchi, Aniket Kate, Chen-Da Liu-Zhang, Michael K. Reiter

Comments: 14 pages, 8 figures, Accepted to DSN 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

Agreement protocols are crucial in various emerging applications, spanning from distributed (blockchains) oracles to fault-tolerant cyber-physical systems. In scenarios where sensor/oracle nodes measure a common source, maintaining output within the convex range of correct inputs, known as convex validity, is imperative. Present asynchronous convex agreement protocols employ either randomization, incurring substantial computation overhead, or approximate agreement techniques, leading to high $\mathcal{\tilde{O}}(n^3)$ communication for an $n$-node system.
This paper introduces Delphi, a deterministic protocol with $\mathcal{\tilde{O}}(n^2)$ communication and minimal computation overhead. Delphi assumes that honest inputs are bounded, except with negligible probability, and integrates agreement primitives from literature with a novel weighted averaging technique. Experimental results highlight Delphi's superior performance, showcasing a significantly lower latency compared to state-of-the-art protocols. Specifically, for an $n=160$-node system, Delphi achieves an 8x and 3x improvement in latency within CPS and AWS environments, respectively.
[620] arXiv:2405.02801 (replaced) [pdf, ps, html, other]: Title: Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

Tianze Xu, Jiajun Li, Xuesong Chen, Xinrui Yao, Shuchang Liu

Comments: 7 pages, 2 figures, submitted to ACM MM 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the cross-modality inputs, such as images, videos and text. Mozart's Touch is composed of three main components: Multi-modal Captioning Module, Large Language Model (LLM) Understanding & Bridging Module, and Music Generation Module. Unlike traditional approaches, Mozart's Touch requires no training or fine-tuning pre-trained models, offering efficiency and transparency through clear, interpretable prompts. We also introduce "LLM-Bridge" method to resolve the heterogeneous representation problems between descriptive texts of different modalities. We conduct a series of objective and subjective evaluations on the proposed model, and results indicate that our model surpasses the performance of current state-of-the-art models. Our codes and examples is availble at: this https URL
[621] arXiv:2405.02824 (replaced) [pdf, ps, other]: Title: Adaptive Guidance Learning for Camouflaged Object Detection

Zhennan Chen, Xuying Zhang, Tian-Zhu Xiang, Ying Tai

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Camouflaged object detection (COD) aims to segment objects visually embedded in their surroundings, which is a very challenging task due to the high similarity between the objects and the background. To address it, most methods often incorporate additional information (e.g., boundary, texture, and frequency clues) to guide feature learning for better detecting camouflaged objects from the background. Although progress has been made, these methods are basically individually tailored to specific auxiliary cues, thus lacking adaptability and not consistently achieving high segmentation performance. To this end, this paper proposes an adaptive guidance learning network, dubbed \textit{AGLNet}, which is a unified end-to-end learnable model for exploring and adapting different additional cues in CNN models to guide accurate camouflaged feature learning. Specifically, we first design a straightforward additional information generation (AIG) module to learn additional camouflaged object cues, which can be adapted for the exploration of effective camouflaged features. Then we present a hierarchical feature combination (HFC) module to deeply integrate additional cues and image features to guide camouflaged feature learning in a multi-level fusion manner.Followed by a recalibration decoder (RD), different features are further aggregated and refined for accurate object prediction. Extensive experiments on three widely used COD benchmark datasets demonstrate that the proposed method achieves significant performance improvements under different additional cues, and outperforms the recent 20 state-of-the-art methods by a large margin. Our code will be made publicly available at: \textcolor{blue}{this https URL}.
[622] arXiv:2405.02929 (replaced) [pdf, ps, other]: Title: Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models

Fares Abawi, Di Fu, Stefan Wermter

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Previous research on scanpath prediction has mainly focused on group models, disregarding the fact that the scanpaths and attentional behaviors of individuals are diverse. The disregard of these differences is especially detrimental to social human-robot interaction, whereby robots commonly emulate human gaze based on heuristics or predefined patterns. However, human gaze patterns are heterogeneous and varying behaviors can significantly affect the outcomes of such human-robot interactions. To fill this gap, we developed a deep learning-based social cue integration model for saliency prediction to instead predict scanpaths in videos. Our model learned scanpaths by recursively integrating fixation history and social cues through a gating mechanism and sequential attention. We evaluated our approach on gaze datasets of dynamic social scenes, observed under the free-viewing condition. The introduction of fixation history into our models makes it possible to train a single unified model rather than the resource-intensive approach of training individual models for each set of scanpaths. We observed that the late neural integration approach surpasses early fusion when training models on a large dataset, in comparison to a smaller dataset with a similar distribution. Results also indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models. We hypothesize that this outcome is a result of the group saliency representations instilling universal attention in the model, while the supervisory signal and fixation history guide it to learn personalized attentional behaviors, providing the unified model a benefit over individual models due to its implicit representation of universal attention.
[623] arXiv:2405.02937 (replaced) [pdf, ps, other]: Title: Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study

Fatema Tuj Johora Faria, Mukaffi Bin Moin, Asif Iftekher Fahim, Pronay Debnath, Faisal Muhammad Shah

Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

Subjects: Computation and Language (cs.CL)

Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two statements, known as the premise and hypothesis. When the premise logically implies the hypothesis, the pair is labeled "entailment". If the hypothesis contradicts the premise, the pair receives the "contradiction" label. When there is insufficient evidence to establish a connection, the pair is described as "neutral". Despite the success of Large Language Models (LLMs) in various tasks, their effectiveness in NLI remains constrained by issues like low-resource domain accuracy, model overconfidence, and difficulty in capturing human judgment disagreements. This study addresses the underexplored area of evaluating LLMs in low-resourced languages such as Bengali. Through a comprehensive evaluation, we assess the performance of prominent LLMs and state-of-the-art (SOTA) models in Bengali NLP tasks, focusing on natural language inference. Utilizing the XNLI dataset, we conduct zero-shot and few-shot evaluations, comparing LLMs like GPT-3.5 Turbo and Gemini 1.5 Pro with models such as BanglaBERT, Bangla BERT Base, DistilBERT, mBERT, and sahajBERT. Our findings reveal that while LLMs can achieve comparable or superior performance to fine-tuned SOTA models in few-shot scenarios, further research is necessary to enhance our understanding of LLMs in languages with modest resources like Bengali. This study underscores the importance of continued efforts in exploring LLM capabilities across diverse linguistic contexts.
[624] arXiv:2405.02968 (replaced) [pdf, ps, html, other]: Title: CoverLib: Classifiers-equipped Experience Library by Iterative Problem Distribution Coverage Maximization for Domain-tuned Motion Planning

Hirokazu Ishida, Naoki Hiraoka, Kei Okada, Masayuki Inaba

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Library-based methods are known to be very effective for fast motion planning by adapting an experience retrieved from a precomputed library. This article presents CoverLib, a principled approach for constructing and utilizing such a library. CoverLib iteratively adds an experience-classifier-pair to the library, where each classifier corresponds to an adaptable region of the experience within the problem space. This iterative process is an active procedure, as it selects the next experience based on its ability to effectively cover the uncovered region. During the query phase, these classifiers are utilized to select an experience that is expected to be adaptable for a given problem. Experimental results demonstrate that CoverLib effectively mitigates the trade-off between plannability and speed observed in global (e.g. sampling-based) and local (e.g. optimization-based) methods. As a result, it achieves both fast planning and high success rates over the problem domain. Moreover, due to its adaptation-algorithm-agnostic nature, CoverLib seamlessly integrates with various adaptation methods, including nonlinear programming-based and sampling-based algorithms.
[625] arXiv:2405.03101 (replaced) [pdf, ps, other]: Title: Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications

Ji Wang, Suhong Luo, Yixuan Li, Wenwu Xie, Xingwang Li, Arumugam Nallanathan

Subjects: Information Theory (cs.IT)

A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meeting the power consumption requirements of the RISs. A block coordinate descent (BCD) algorithm based on the penalty-based method and successive convex approximation (SCA) is employed to alternatively optimize the active beamforming at the BS and the phase shifts, as well as amplitude coefficients of two RISs. Simulation results show that the required power consumption at the BS for the proposed double self-sustainable RISs system is significantly reduced compared to conventional RIS systems.
[626] arXiv:2405.03267 (replaced) [pdf, ps, html, other]: Title: Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie Shen, Haibo Chen

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Information Retrieval (cs.IR)

Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$\times$ and 1.7$\times$, these indexes have to pay a 5.8$\times$ storage amplification and 7.7$\times$ with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification.
This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices.
Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs.
[627] arXiv:2405.03357 (replaced) [pdf, ps, html, other]: Title: A Game Theoretic Analysis of Validator Strategies in Ethereum 2.0

Chien-Chih Chen, Wojciech Golab

Comments: This work has been accepted for publication in BSCI 2024

Subjects: Computer Science and Game Theory (cs.GT)

Ethereum 2.0 is the second-largest cryptocurrency by market capitalization and a widely used smart contract platform. Therefore, examining the reliability of Ethereum 2.0's incentive mechanism is crucial, particularly its effectiveness in encouraging validators to adhere to the Ethereum 2.0's protocol. This paper studies the incentive mechanism of Ethereum 2.0 and evaluates its robustness by analyzing the interaction between block proposers and attesters in a single slot. To this end, we use Bayesian games to model the strategies of block proposers and attesters and calculate their expected utilities. Our results demonstrate that the Ethereum 2.0 incentive mechanism is incentive-compatible and promotes cooperation among validators. We prove that a Bayesian Nash equilibrium and an ex ante dominant strategy exist between the block proposer and attesters in a single slot. Our research provides a solid foundation for further analysis of Ethereum 2.0's incentive mechanism and insights for individuals considering participation as a validator in Ethereum 2.0.
[628] arXiv:2405.03452 (replaced) [pdf, ps, other]: Title: Large Language Models (LLMs) as Agents for Augmented Democracy

Jairo Gudiño-Rosero, Umberto Grandi, César A. Hidalgo

Comments: 15 pages main manuscript with 3 figures. 12 pages of supplementary material

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We explore the capabilities of an augmented democracy system built on off-the-shelf LLMs fine-tuned on data summarizing individual preferences across 67 policy proposals collected during the 2022 Brazilian presidential elections. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants. At the individual level, the accuracy of the out of sample predictions lie in the range 69%-76% and are significantly better at predicting the preferences of liberal and college educated participants. At the population level, we aggregate preferences using an adaptation of the Borda score and compare the ranking of policy proposals obtained from a probabilistic sample of participants and from data augmented using LLMs. We find that the augmented data predicts the preferences of the full population of participants better than probabilistic samples alone when these represent less than 30% to 40% of the total population. These results indicate that LLMs are potentially useful for the construction of systems of augmented democracy.
[629] arXiv:2405.03555 (replaced) [pdf, ps, other]: Title: A Comprehensive Overview and Survey of O-RAN: Exploring Slicing-aware Architecture, Deployment Options, and Use Cases

Khurshid Alam, Mohammad Asif Habibi, Matthias Tammen, Dennis Krummacker, Walid Saad, Marco Di Renzo, Tommaso Melodia, Xavier Costa-Pérez, Mérouane Debbah, Ashutosh Dutta, Hans D. Schotten

Comments: 45 pages, 12 figures, 4 tables, submitted to the IEEE for possible publication

Subjects: Networking and Internet Architecture (cs.NI)

Open-radio access network (O-RAN) seeks to establish principles of openness, programmability, automation, intelligence, and hardware-software disaggregation with interoperable interfaces. It advocates for multi-vendorism and multi-stakeholderism within a cloudified and virtualized wireless infrastructure, aimed at enhancing the deployment, operation, and maintenance of RAN architecture. This enhancement promises increased flexibility, performance optimization, service innovation, energy efficiency, and cost efficiency in fifth-generation (5G), sixth-generation (6G), and future networks. One of the key features of the O-RAN architecture is its support for network slicing, which entails interaction with other slicing domains within a mobile network, notably the transport network (TN) domain and the core network (CN) domain, to realize end-to-end (E2E) network slicing. The study of this feature requires exploring the stances and contributions of diverse standards development organizations (SDOs). In this context, we note that despite the ongoing industrial deployments and standardization efforts, the research and standardization communities have yet to comprehensively address network slicing in O-RAN. To address this gap, this survey paper provides a comprehensive exploration of network slicing in O-RAN through an in-depth review of specification documents from O-RAN Alliance and research papers from leading industry and academic institutions. The paper commences with an overview of the ongoing standardization efforts and open-source contributions associated with O-RAN, subsequently delving into the latest O-RAN architecture with an emphasis on its slicing aspects. Further, the paper explores deployment scenarios for network slicing within O-RAN, examining options for the deployment and orchestration of O-RAN and TN network slice subnets...
[630] arXiv:2405.03562 (replaced) [pdf, ps, html, other]: Title: ID-centric Pre-training for Recommendation

Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

Subjects: Information Retrieval (cs.IR)

Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered universal across domains via PLM. Unfortunately, the behavioral information in ID embeddings is still verified to be dominating in PLM-based recommendation models compared to modality information and thus limits these models' performance. In this work, we propose a novel ID-centric recommendation pre-training paradigm (IDP), which directly transfers informative ID embeddings learned in pre-training domains to item representations in new domains. Specifically, in pre-training stage, besides the ID-based sequential model for recommendation, we also build a Cross-domain ID-matcher (CDIM) learned by both behavioral and modality information. In the tuning stage, modality information of new domain items is regarded as a cross-domain bridge built by CDIM. We first leverage the textual information of downstream domain items to retrieve behaviorally and semantically similar items from pre-training domains using CDIM. Next, these retrieved pre-trained ID embeddings, rather than certain textual embeddings, are directly adopted to generate downstream new items' embeddings. Through extensive experiments on real-world datasets, both in cold and warm settings, we demonstrate that our proposed model significantly outperforms all baselines. Codes will be released upon acceptance.
[631] arXiv:2405.03573 (replaced) [pdf, ps, other]: Title: Understanding Read-Write Wait-Free Coverings in the Fully-Anonymous Shared-Memory Model

Giuliano Losa, Eli Gafni

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In the fully-anonymous (shared-memory) model, inspired by a biological setting, processors have no identifiers and memory locations are anonymous. This means that there is no pre-existing agreement among processors on any naming of the memory locations. In this work, we ask fundamental questions about the fully-anonymous model in the hope to obtain a better understanding of the role of naming and anonymity in distributed computing.
First, we ask what it means to solve a task under processor anonymity. With tasks such as renaming, the traditional notion obviously does not apply. Instead of restricting ourselves to colorless tasks, we propose using the notion of group solvability, which allows transferring any task to processor-anonymous models.
Second, the difficulty with anonymity is that processors can hardly avoid covering and then overwriting each other's writes, erasing information written by their predecessors. To get to the bottom of this phenomenon, we ask what system configurations are stable when processors keep reading and writing ad infinitum. Resolving this question leads us to a wait-free solution to the snapshot task, which then allows us to solve renaming and obstruction-free consensus.
[632] arXiv:2405.03654 (replaced) [pdf, ps, other]: Title: Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks.
[633] arXiv:2008.08718 (replaced) [pdf, ps, other]: Title: Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression

Yaroslav Averyanov, Alain Celisse

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria or Lepskii principle.
[634] arXiv:2205.07739 (replaced) [pdf, ps, other]: Title: The Role of Pseudo-labels in Self-training Linear Classifiers on High-dimensional Gaussian Mixture Data

Takashi Takahashi

Comments: 65 pages, 13 figures

Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Statistics Theory (math.ST)

Self-training (ST) is a simple yet effective semi-supervised learning method. However, why and how ST improves generalization performance by using potentially erroneous pseudo-labels is still not well understood. To deepen the understanding of ST, we derive and analyze a sharp characterization of the behavior of iterative ST when training a linear classifier by minimizing the ridge-regularized convex loss on binary Gaussian mixtures, in the asymptotic limit where input dimension and data size diverge proportionally. The results show that ST improves generalization in different ways depending on the number of iterations. When the number of iterations is small, ST improves generalization performance by fitting the model to relatively reliable pseudo-labels and updating the model parameters by a large amount at each iteration. This suggests that ST works intuitively. On the other hand, with many iterations, ST can gradually improve the direction of the classification plane by updating the model parameters incrementally, using soft labels and small regularization. It is argued that this is because the small update of ST can extract information from the data in an almost noiseless way. However, in the presence of label imbalance, the generalization performance of ST underperforms supervised learning with true labels. To overcome this, two heuristics are proposed to enable ST to achieve nearly compatible performance with supervised learning even with significant label imbalance.
[635] arXiv:2212.11246 (replaced) [pdf, ps, html, other]: Title: Material Property Prediction using Graphs based on Generically Complete Isometry Invariants

Jonathan Balasingham, Viktor Zamaraev, Vitaliy Kurlin

Comments: 8 pages, 5 tables, 5 figures (4 single column, 1 double column)

Journal-ref: Integrating Materials and Manufacturing Innovation, 1-14 (2024)

Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)

The structure-property hypothesis says that the properties of all materials are determined by an underlying crystal structure. The main obstacle was the ambiguity of conventional crystal representations based on incomplete or discontinuous descriptors that allow false negatives or false positives. This ambiguity was resolved by the ultra-fast Pointwise Distance Distribution (PDD), which distinguished all periodic structures in the world's largest collection of real materials (Cambridge Structural Database). The state-of-the-art results in property predictions were previously achieved by graph neural networks based on various graph representations of periodic crystals, including the Crystal Graph with vertices at all atoms in a crystal unit cell. This work adapts the Pointwise Distance Distribution for a simpler graph whose vertex set is not larger than the asymmetric unit of a crystal structure. The new Distribution Graph reduces mean-absolute-error by 0.6\%-12\% while having 44\%-88\% of the number of vertices when compared to the crystal graph when applied on the Materials Project and Jarvis-DFT datasets using CGCNN and ALIGNN. Methods for hyper-parameters selection for the graph are backed by the theoretical results of the Pointwise Distance Distribution and are then experimentally justified.
[636] arXiv:2301.00326 (replaced) [pdf, ps, html, other]: Title: Yuille-Poggio's Flow and Global Minimizer of Polynomials through Convexification by Heat Evolution

Qiao Wang

Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)

This study examines the convexification version of the backward differential flow algorithm for the global minimization of polynomials, introduced by O. Arikan \textit{et al} in \cite{ABK}. It investigates why this approach might fail with high-degree polynomials yet succeeds with quartic polynomials. We employ the heat evolution method for convexification combined with Gaussian filtering, which acts as a cumulative form of Steklov's regularization. In this context, we apply the fingerprint theory from computer vision. Originally developed by A.L. Yuille and T. Poggio in the 1980s for computer vision, the fingerprint theory, particularly the fingerprint trajectory equation, is used to illustrate the scaling (temporal) evolution of minimizers. In the case of general polynomials, our research has led to the creation of the Yuille-Poggio flow and a broader interpretation of the fingerprint concepts, in particular we establish the condition both sufficient and necessary for the convexified backward differential flow algorithms to successfully achieve global minimization. For quartic polynomials, our analysis not only reflects the results of O. Arikan et al. \cite{ABK} but also presents a significantly simpler version of Newton's method that can always globally minimize quartic polynomials without convexification.
[637] arXiv:2302.11454 (replaced) [pdf, ps, other]: Title: Quantum complexity of the Kronecker coefficients

Sergey Bravyi, Anirban Chowdhury, David Gosset, Vojtech Havlicek, Guanyu Zhu

Comments: Added Journal Reference

Journal-ref: PRX Quantum, vol. 5, issue 1, (Feb 2024), pp 010329

Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Combinatorics (math.CO)

Whether or not the Kronecker coefficients of the symmetric group count some set of combinatorial objects is a longstanding open question. In this work we show that a given Kronecker coefficient is proportional to the rank of a projector that can be measured efficiently using a quantum computer. In other words a Kronecker coefficient counts the dimension of the vector space spanned by the accepting witnesses of a QMA verifier, where QMA is the quantum analogue of NP. This implies that approximating the Kronecker coefficients to within a given relative error is not harder than a certain natural class of quantum approximate counting problems that captures the complexity of estimating thermal properties of quantum many-body systems. A second consequence is that deciding positivity of Kronecker coefficients is contained in QMA, complementing a recent NP-hardness result of Ikenmeyer, Mulmuley and Walter. We obtain similar results for the related problem of approximating row sums of the character table of the symmetric group. Finally, we discuss an efficient quantum algorithm that approximates normalized Kronecker coefficients to inverse-polynomial additive error.
[638] arXiv:2303.11833 (replaced) [pdf, ps, html, other]: Title: Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry

Hyunseung Kim (1), Haeyeon Choi (2), Dongju Kang (1), Won Bo Lee (1), Jonggeol Na (2) ((1) Seoul National University, (2) Ewha Womans University)

Comments: 18 pages, 8 figures

Journal-ref: Chemical Science, 2024

Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)

The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.
[639] arXiv:2304.12238 (replaced) [pdf, ps, html, other]: Title: On polynomials associated to Voronoi diagrams of point sets and crossing numbers

Mercè Claverol, Andrea de las Heras-Parrilla, David Flores-Peñaloza, Clemens Huemer, David Orden

Comments: 18 pages, 3 figures, to appear in Discrete Mathematics and Theoretical Computer Science

Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)

Three polynomials are defined for given sets $S$ of $n$ points in general position in the plane: The Voronoi polynomial with coefficients the numbers of vertices of the order-$k$ Voronoi diagrams of $S$, the circle polynomial with coefficients the numbers of circles through three points of $S$ enclosing $k$ points of $S$, and the $E_{\leq k}$ polynomial with coefficients the numbers of (at most $k$)-edges of $S$. We present several formulas for the rectilinear crossing number of $S$ in terms of these polynomials and their roots. We also prove that the roots of the Voronoi polynomial lie on the unit circle if, and only if, $S$ is in convex position. Further, we present bounds on the location of the roots of these polynomials.
[640] arXiv:2306.11023 (replaced) [pdf, ps, other]: Title: PINQI: An End-to-End Physics-Informed Approach to Learned Quantitative MRI Reconstruction

Felix F Zimmermann, Christoph Kolbitsch, Patrick Schuenke, Andreas Kofler

Comments: This work has been accepted for publication in IEEE Transactions on Computational Imaging. Changes were made to this version by the publisher before publication. IEEE Transactions on Computational Imaging (2024)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)

Quantitative Magnetic Resonance Imaging (qMRI) enables the reproducible measurement of biophysical parameters in tissue. The challenge lies in solving a nonlinear, ill-posed inverse problem to obtain the desired tissue parameter maps from acquired raw data. While various learned and non-learned approaches have been proposed, the existing learned methods fail to fully exploit the prior knowledge about the underlying MR physics, i.e. the signal model and the acquisition model. In this paper, we propose PINQI, a novel qMRI reconstruction method that integrates the knowledge about the signal, acquisition model, and learned regularization into a single end-to-end trainable neural network. Our approach is based on unrolled alternating optimization, utilizing differentiable optimization blocks to solve inner linear and non-linear optimization tasks, as well as convolutional layers for regularization of the intermediate qualitative images and parameter maps. This design enables PINQI to leverage the advantages of both the signal model and learned regularization. We evaluate the performance of our proposed network by comparing it with recently published approaches in the context of highly undersampled $T_1$-mapping, using both a simulated brain dataset, as well as real scanner data acquired from a physical phantom and in-vivo data from healthy volunteers. The results demonstrate the superiority of our proposed solution over existing methods and highlight the effectiveness of our method in real-world scenarios.
[641] arXiv:2306.14207 (replaced) [pdf, ps, html, other]: Title: Sampling discretization of the uniform norm and applications

E. D. Kosov, V. N. Temlyakov

Journal-ref: Journal of Mathematical Analysis and Applications, 538(2), 2024, 128431

Subjects: Functional Analysis (math.FA); Numerical Analysis (math.NA)

Discretization of the uniform norm of functions from a given finite dimensional subspace of continuous functions is studied. Previous known results show that for any $N$-dimensional subspace of the space of continuous functions it is sufficient to use $e^{CN}$ sample points for an accurate upper bound for the uniform norm by the discrete norm and that one cannot improve on the exponential growth of the number of sampling points for a good discretization theorem in the uniform norm. In this paper we focus on two types of results, which allow us to obtain good discretization of the uniform norm with polynomial in $N$ number of points. In the first way we weaken the discretization inequality by allowing a bound of the uniform norm by the discrete norm multiplied by an extra factor, which may depend on $N$. In the second way we impose restrictions on the finite dimensional subspace under consideration. In particular, we prove a general result, which connects the upper bound on the number of sampling points in the discretization theorem for the uniform norm with the best $m$-term bilinear approximation of the Dirichlet kernel associated with the given subspace.
[642] arXiv:2307.06700 (replaced) [pdf, ps, html, other]: Title: Redicolouring digraphs: directed treewidth and cycle-degeneracy

Nicolas Nisse, Lucas Picasarri-Arrieta, Ignasi Sau

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

Given a digraph $D=(V,A)$ on $n$ vertices and a vertex $v\in V$, the cycle-degree of $v$ is the minimum size of a set $S \subseteq V(D) \setminus \{v\}$ intersecting every directed cycle of $D$ containing $v$. From this definition of cycle-degree, we define the $c$-degeneracy (or cycle-degeneracy) of $D$, which we denote by $\delta^*_c(D)$. It appears to be a nice generalisation of the undirected degeneracy. In this work, using this new definition of cycle-degeneracy, we extend several evidences for Cereceda's conjecture to digraphs. The $k$-dicolouring graph of $D$, denoted by $\mathcal{D}_k(D)$, is the undirected graph whose vertices are the $k$-dicolourings of $D$ and in which two $k$-dicolourings are adjacent if they differ on the colour of exactly one vertex.
We show that $\mathcal{D}_k(D)$ has diameter at most $O_{\delta^*_c(D)}(n^{\delta^*_c(D) + 1})$ (respectively $O(n^2)$ and $(\delta^*_c(D)+1)n$) when $k$ is at least $\delta^*_c(D)+2$ (respectively $\frac{3}{2}(\delta^*_c(D)+1)$ and $2(\delta^*_c(D)+1)$). This improves known results on digraph redicolouring (Bousquet et al.).
Next, we extend a result due to Feghali to digraphs, showing that $\mathcal{D}_{d+1}(D)$ has diameter at most $O_{d,\epsilon}(n(\log n)^{d-1})$ when $D$ has maximum average cycle-degree at most $d-\epsilon$. We then show that two proofs of Bonamy and Bousquet for undirected graphs can be extended to digraphs. The first one uses the digrundy number of a digraph and the second one uses the $\mathscr{D}$-width.
Finally, we give a general theorem which makes a connection between the recolourability of a digraph $D$ and the recolourability of its underlying graph $UG(D)$. This result directly extends a number of results on planar graph recolouring to planar digraph redicolouring.
[643] arXiv:2307.16262 (replaced) [pdf, ps, html, other]: Title: Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges

Debesh Jha, Vanshali Sharma, Debapriya Banik, Debayan Bhattacharya, Kaushiki Roy, Steven A. Hicks, Nikhil Kumar Tomar, Vajira Thambawita, Adrian Krenzer, Ge-Peng Ji, Sahadev Poudel, George Batchkala, Saruar Alam, Awadelrahman M. A. Ahmed, Quoc-Huy Trinh, Zeshan Khan, Tien-Phat Nguyen, Shruti Shrestha, Sabari Nathan, Jeonghwan Gwak, Ritika K. Jha, Zheyuan Zhang, Alexander Schlaefer, Debotosh Bhattacharjee, M.K. Bhuyan, Pradip K. Das, Deng-Ping Fan, Sravanthi Parsa, Sharib Ali, Michael A. Riegler, Pål Halvorsen, Thomas De Lange, Ulas Bagci

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.
[644] arXiv:2311.16190 (replaced) [pdf, ps, html, other]: Title: Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

Hanrui Wang, Daniel Bochen Tan, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jason Cong, Song Han

Comments: 10 pages, 16 figures; Published as a conference paper at DAC 2024

Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)

Neutral atom arrays have become a promising platform for quantum computing, especially the field programmable qubit array (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges in circuit compilation. Inspired by the placement and routing strategies for FPGAs, we propose to map all data qubits to fixed atoms while utilizing movable atoms to route for 2-qubit gates between data qubits. Coined flying ancillas, these mobile atoms function as ancilla qubits, dynamically generated and recycled during execution. We present Q-Pilot, a scalable compiler for FPQA employing flying ancillas to maximize circuit parallelism. For two important quantum applications, quantum simulation and the Quantum Approximate Optimization Algorithm (QAOA), we devise domain-specific routing strategies. In comparison to alternative technologies such as superconducting devices or fixed atom arrays, Q-Pilot effectively harnesses the flexibility of FPQA, achieving reductions of 1.4x, 27.7x, and 6.3x in circuit depth for 100-qubit random, quantum simulation, and QAOA circuits, respectively.
[645] arXiv:2312.03919 (replaced) [pdf, ps, html, other]: Title: Indivisibility and uniform computational strength

Kenneth Gill

Comments: 22 pages, 4 figures. This work extends the results of Sections 1.2 and 1.3 of the author's Ph.D. thesis at Penn State University. Version 5: various corrections and improvements

Subjects: Logic (math.LO); Logic in Computer Science (cs.LO); Combinatorics (math.CO)

A countable structure is indivisible if for every coloring with finite range there is a monochromatic isomorphic subcopy of the structure. Each indivisible structure naturally corresponds to an indivisibility problem which outputs such a subcopy given a presentation and coloring. We investigate the Weihrauch complexity of the indivisibility problems for two structures: the rational numbers $\mathbb{Q}$ as a linear order, and the equivalence relation $\mathscr{E}$ with countably many equivalence classes each having countably many members. We separate the Weihrauch degrees of both corresponding indivisibility problems from several benchmarks, showing in particular that the indivisibility problem for $\mathbb{Q}$ cannot solve the problem of finding a monochromatic rational interval given a coloring for which there is one; and that the Weihrauch degree of the indivisibility problem for $\mathscr{E}$ is strictly between those of $\mathsf{RT}^2$ and $\mathsf{SRT}^2$, two widely studied variants of Ramsey's theorem for pairs whose reverse-mathematical separation was open until recently.
[646] arXiv:2312.07186 (replaced) [pdf, ps, other]: Title: Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton

Comments: arXiv admin note: text overlap with arXiv:2208.01711

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space. These rates allow to treat the misspecified case in which the true regression function is not contained in the hypothesis space. We combine standard assumptions on the capacity of the hypothesis space with a novel tensor product construction of vector-valued interpolation spaces in order to characterize the smoothness of the regression function. Our upper bound not only attains the same rate as real-valued kernel ridge regression, but also removes the assumption that the target regression function is bounded. For the lower bound, we reduce the problem to the scalar setting using a projection argument. We show that these rates are optimal in most cases and independent of the dimension of the output space. We illustrate our results for the special case of vector-valued Sobolev spaces.
[647] arXiv:2312.08927 (replaced) [pdf, ps, other]: Title: Limit Order Book Dynamics and Order Size Modelling Using Compound Hawkes Process

Konark Jain, Nick Firoozye, Jonathan Kochems, Philip Treleaven

Comments: Presented at Market Microstructure 2023, Accepted at Quantitative Finance Workshop 2024. To be submitted for publication to a journal

Subjects: Trading and Market Microstructure (q-fin.TR); Computational Engineering, Finance, and Science (cs.CE); Computational Finance (q-fin.CP); Applications (stat.AP)

Hawkes Process has been used to model Limit Order Book (LOB) dynamics in several ways in the literature however the focus has been limited to capturing the inter-event times while the order size is usually assumed to be constant. We propose a novel methodology of using Compound Hawkes Process for the LOB where each event has an order size sampled from a calibrated distribution. The process is formulated in a novel way such that the spread of the process always remains positive. Further, we condition the model parameters on time of day to support empirical observations. We make use of an enhanced non-parametric method to calibrate the Hawkes kernels and allow for inhibitory cross-excitation kernels. We showcase the results and quality of fits for an equity stock's LOB in the NASDAQ exchange and compare them against several baselines. Finally, we conduct a market impact study of the simulator and show the empirical observation of a concave market impact function is indeed replicated.
[648] arXiv:2312.09663 (replaced) [pdf, ps, html, other]: Title: Toward Deep Drum Source Separation

Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti

Comments: 9 pages, 2 figures, 3 tables. Published in Pattern Recognition Letters, 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.
[649] arXiv:2401.00097 (replaced) [pdf, ps, html, other]: Title: Recursive identification with regularization and on-line hyperparameters estimation

Bernard Vau, Tudor-Bogdan Airimitoaie

Comments: https://hal.science/hal-04337419

Subjects: Methodology (stat.ME); Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper presents a regularized recursive identification algorithm with simultaneous on-line estimation of both the model parameters and the algorithms hyperparameters. A new kernel is proposed to facilitate the algorithm development. The performance of this novel scheme is compared with that of the recursive least squares algorithm in simulation.
[650] arXiv:2401.15235 (replaced) [pdf, ps, other]: Title: CascadedGaze: Efficiency in Global Context Extraction for Image Restoration

Amirhosein Ghasemabadi, Muhammad Kamran Janjua, Mohammad Salameh, Chunhua Zhou, Fengyu Sun, Di Niu

Comments: Published in Transactions on Machine Learning Research (TMLR), 2024. 20 pages

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image restoration tasks traditionally rely on convolutional neural networks. However, given the local nature of the convolutional operator, they struggle to capture global information. The promise of attention mechanisms in Transformers is to circumvent this problem, but it comes at the cost of intensive computational overhead. Many recent studies in image restoration have focused on solving the challenge of balancing performance and computational cost via Transformer variants. In this paper, we present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE), a novel and efficient way to capture global information for image restoration. The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention. Extensive experimental results show that our computationally efficient approach performs competitively to a range of state-of-the-art methods on synthetic image denoising and single image deblurring tasks, and pushes the performance boundary further on the real image denoising task.
[651] arXiv:2402.01810 (replaced) [pdf, ps, html, other]: Title: Parameter uncertainties for imperfect surrogate models in the low-noise regime

Thomas D Swinburne, Danny Perez

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, the loss ignores misspecification, where models are imperfect. Parameter uncertainties from Bayesian regression are thus significantly underestimated and vanish in the large data limit. This is particularly problematic when building models of low-noise, or near-deterministic, calculations, as the main source of uncertainty is neglected. We analyze the generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show posterior distributions must cover every training point to avoid a divergent generalization error and design an ansatz that respects this constraint, which for linear models incurs minimal overhead. This is demonstrated on model problems before application to thousand dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors where existing schemes fail, allowing this important source of uncertainty to be incorporated in computational workflows.
[652] arXiv:2402.10980 (replaced) [pdf, ps, html, other]: Title: ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback

Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, Sutanay Choudhury

Comments: 8 pages, accepted by ICML 2024

Subjects: Chemical Physics (physics.chem-ph); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.
[653] arXiv:2404.06087 (replaced) [pdf, ps, html, other]: Title: The Overlap Gap Property limits limit swapping in QAOA

Mark Xin Hong Goh

Comments: 22 pages, 2 figures

Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Data Structures and Algorithms (cs.DS)

The Quantum Approximate Optimization Algorithm (QAOA) is a quantum algorithm designed for Combinatorial Optimization Problem (COP). We show that if a COP with an underlying Erdös--Rényi hypergraph exhibits the Overlap Gap Property (OGP), then a random regular hypergraph exhibits it as well. Given that Max-$q$-XORSAT on an Erdös--Rényi hypergraph is known to exhibit the OGP, and since the performance of QAOA for the pure $q$-spin model matches asymptotically for Max-$q$-XORSAT on large-girth regular hypergraph, we show that the average-case value obtained by QAOA for the pure $q$-spin model for even $q\ge 4$ is bounded away from optimality even when the algorithm runs indefinitely. This suggests that a necessary condition for the validity of limit swapping in QAOA is the absence of OGP in a given combinatorial optimization problem. Furthermore, the results suggests that even when sub-optimised, the performance of QAOA on spin glass is equal in performance to classical algorithms in solving the mean field spin glass problem providing further evidence that the conjecture of getting the exact solution under limit swapping for the Sherrington--Kirkpatrick model to be true.
[654] arXiv:2404.14322 (replaced) [pdf, ps, other]: Title: A Novel Approach to Chest X-ray Lung Segmentation Using U-net and Modified Convolutional Block Attention Module

Mohammad Ali Labbaf Khaniki, Mohammad Manthouri

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Lung segmentation in chest X-ray images is of paramount importance as it plays a crucial role in the diagnosis and treatment of various lung diseases. This paper presents a novel approach for lung segmentation in chest X-ray images by integrating U-net with attention mechanisms. The proposed method enhances the U-net architecture by incorporating a Convolutional Block Attention Module (CBAM), which unifies three distinct attention mechanisms: channel attention, spatial attention, and pixel attention. The channel attention mechanism enables the model to concentrate on the most informative features across various channels. The spatial attention mechanism enhances the model's precision in localization by focusing on significant spatial locations. Lastly, the pixel attention mechanism empowers the model to focus on individual pixels, further refining the model's focus and thereby improving the accuracy of segmentation. The adoption of the proposed CBAM in conjunction with the U-net architecture marks a significant advancement in the field of medical imaging, with potential implications for improving diagnostic precision and patient outcomes. The efficacy of this method is validated against contemporary state-of-the-art techniques, showcasing its superiority in segmentation performance.
[655] arXiv:2405.00230 (replaced) [pdf, ps, other]: Title: A decomposition-based approach for large-scale pickup and delivery problems

G. Hiermann, M. Schiffer

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

With the advent of self-driving cars, experts envision autonomous mobility-on-demand services in the near future to cope with overloaded transportation systems in cities worldwide. Efficient operations are imperative to unlock such a system's maximum improvement potential. Existing approaches either consider a narrow planning horizon or ignore essential characteristics of the underlying problem. In this paper, we develop an algorithmic framework that allows the study of very large-scale pickup and delivery routing problems with more than 20 thousand requests, which arise in the context of integrated request pooling and vehicle-to-request dispatching. We conduct a computational study and present comparative results showing the characteristics of the developed approaches. Furthermore, we apply our algorithm to related benchmark instances from the literature to show the efficacy. Finally, we solve very large-scale instances and derive insights on upper-bound improvements regarding fleet sizing and customer delay acceptance from a practical perspective.
[656] arXiv:2405.01015 (replaced) [pdf, ps, other]: Title: Network reconstruction via the minimum description length principle

Tiago P. Peixoto

Comments: 17 pages, 10 figures. Code and documentation are available at this https URL

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Populations and Evolution (q-bio.PE)

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.
[657] arXiv:2405.03141 (replaced) [pdf, ps, other]: Title: Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-Ping Lam, Yong-Ping Zheng

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.

Total of 657 entries

Showing up to 2000 entries per page: fewer | more | all

Computer Science

New submissions for Wednesday, 8 May 2024 (showing 371 of 371 entries )

Cross submissions for Wednesday, 8 May 2024 (showing 37 of 37 entries )

Replacement submissions for Wednesday, 8 May 2024 (showing 249 of 249 entries )