We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 1195 entries: 1-1195 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 7 May 24

[1]  arXiv:2405.02288 [pdf, other]
Title: Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Comments: 36 pages,5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, CLIP, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhance scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action commands for driving decisions and planning. Furthermore, FMs can augment data based on its understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to the improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs applications lies in the development of World Models, exemplified by the DREAMER series, which showcase the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environment, facilitating the enhancement in the prediction of road users behavior and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.

[2]  arXiv:2405.02289 [pdf, other]
Title: TSDiT: Traffic Scene Diffusion Models With Transformers
Authors: Chen Yang, Tianyu Shi
Subjects: Robotics (cs.RO)

In this paper, we introduce a novel approach to trajectory generation for autonomous driving, combining the strengths of Diffusion models and Transformers. First, we use the historical trajectory data for efficient preprocessing and generate action latent using a diffusion model with DiT(Diffusion with Transformers) Blocks to increase scene diversity and stochasticity of agent actions. Then, we combine action latent, historical trajectories and HD Map features and put them into different transformer blocks. Finally, we use a trajectory decoder to generate future trajectories of agents in the traffic scene. The method exhibits superior performance in generating smooth turning trajectories, enhancing the model's capability to fit complex steering patterns. The experimental results demonstrate the effectiveness of our method in producing realistic and diverse trajectories, showcasing its potential for application in autonomous vehicle navigation systems.

[3]  arXiv:2405.02290 [pdf, ps, other]
Title: Wheel Odometry-Based Localization for Autonomous Wheelchair
Comments: 6 pages, 10 figures, 3 tables
Journal-ref: 2023 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET), Bandung, Indonesia, 2023, pp. 357-362
Subjects: Robotics (cs.RO)

Localization is a fundamental requirement for an autonomous vehicle system. One of the most often used systems for autonomous vehicle localization is the global positioning system (GPS). Nevertheless, the functionality of GPS is strongly dependent on the availability of satellites, making it unreliable in some situations. As a result, autonomous vehicles must possess autonomous self-localization capabilities to ensure their independent operation. Odometry techniques are employed to achieve vehicle localization by predicting the vehicle position and orientation based on sensor measurements of the vehicle motion. One of the approaches employed in odometry is known as wheel odometry. Wheel odometry has a lower degree of reliance on the surrounding environment than visual odometry and laser odometry. This study aims to evaluate the performance of wheel odometry implementation for an autonomous wheelchair in the context of the localization process. The differential drive kinematic model is employed to determine the predicted pose of a wheelchair. This prediction is derived from the measurement of the linear and angular velocity of the wheelchair. Several experiments have been conducted to evaluate the performance of wheel odometry-based localization. Prior to experimenting, calibration procedures have also been performed to ensure accurate measurements of the sensor.

[4]  arXiv:2405.02291 [pdf, other]
Title: Bundling and Tumbling in Bacterial-inspired Bi-flagellated Soft Robots for Attitude Adjustment
Subjects: Robotics (cs.RO)

We create a mechanism inspired by bacterial swimmers, featuring two flexible flagella with individual control over rotation speed and direction in viscous fluid environments. Using readily available materials, we design and fabricate silicone-based helical flagella. To simulate the robot's motion, we develop a physics-based computational tool, drawing inspiration from computer graphics. The framework incorporates the Discrete Elastic Rod method, modeling the flagella as Kirchhoff's elastic rods, and couples it with the Regularized Stokeslet Segments method for hydrodynamics, along with the Implicit Contact Model to handle contact. This approach effectively captures polymorphic phenomena like bundling and tumbling. Our study reveals how these emergent behaviors affect the robot's attitude angles, demonstrating its ability to self-reorient in both simulations and experiments. We anticipate that this framework will enhance our understanding of the directional change capabilities of flagellated robots, potentially stimulating further exploration on microscopic robot mobility.

[5]  arXiv:2405.02292 [pdf, other]
Title: ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation
Comments: Project website: aloha-2.github.io
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Diverse demonstration datasets have powered significant advances in robot learning, but the dexterity and scale of such data can be limited by the hardware cost, the hardware robustness, and the ease of teleoperation. We introduce ALOHA 2, an enhanced version of ALOHA that has greater performance, ergonomics, and robustness compared to the original design. To accelerate research in large-scale bimanual manipulation, we open source all hardware designs of ALOHA 2 with a detailed tutorial, together with a MuJoCo model of ALOHA 2 with system identification. See the project website at aloha-2.github.io.

[6]  arXiv:2405.02293 [pdf, ps, other]
Title: Modified OSD Algorithm with Reduced Gaussian Elimination
Comments: 2 figures
Subjects: Information Theory (cs.IT)

In this paper, the OSD algorithm is modified to perform a limited GE with $O(N^3 \min\{R, 1-R\}^3)$ complexity for an $(N,K)$ linear block code of rate $R=K/N$.

[7]  arXiv:2405.02295 [pdf, other]
Title: Neural Additive Image Model: Interpretation through Interpolation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Understanding how images influence the world, interpreting which effects their semantics have on various quantities and exploring the reasons behind changes in image-based predictions are highly difficult yet extremely interesting problems. By adopting a holistic modeling approach utilizing Neural Additive Models in combination with Diffusion Autoencoders, we can effectively identify the latent hidden semantics of image effects and achieve full intelligibility of additional tabular effects. Our approach offers a high degree of flexibility, empowering us to comprehensively explore the impact of various image characteristics. We demonstrate that the proposed method can precisely identify complex image effects in an ablation study. To further showcase the practical applicability of our proposed model, we conduct a case study in which we investigate how the distinctive features and attributes captured within host images exert influence on the pricing of Airbnb rentals.

[8]  arXiv:2405.02296 [pdf, other]
Title: Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of M\"obius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms on existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Further, our method shows improved performance on three PD-affected real-world applications: crowd counting, fisheye image recognition, and person re-identification. We will release source code, dataset, and models for foster further research.

[9]  arXiv:2405.02297 [pdf, other]
Title: Employing Universal Voting Schemes for Improved Visual Place Recognition Performance
Comments: arXiv admin note: substantial text overlap with arXiv:2305.05705
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual Place Recognition has been the subject of many endeavours utilizing different ensemble approaches to improve VPR performance. Ideas like multi-process fusion, Fly-Inspired Voting Units, SwitchHit or Switch-Fuse involve combining different VPR techniques together, utilizing different strategies. However, a major aspect often common to many of these strategies is voting. Voting is an extremely relevant topic to explore in terms of its application and significance for any ensemble VPR setup. This paper analyses several voting schemes to maximise the place detection accuracy of a VPR ensemble set up and determine the optimal voting schemes for selection. We take inspiration from a variety of voting schemes that are widely employed in fields such as politics and sociology and it is evident via empirical data that the selection of the voting method influences the results drastically. The paper tests a wide variety of voting schemes to present the improvement in the VPR results for several data sets. We aim to determine whether a single optimal voting scheme exists or, much like in other fields of research, the selection of a voting technique is relative to its application and environment. We propose a ranking of these different voting methods from best to worst which allows for better selection. While presenting our results in terms of voting method's performance bounds, in form of radar charts, PR curves to showcase the difference in performance and a comparison methodology using a McNemar test variant to determine the statistical significance of the differences. This test is performed to further confirm the reliability of outcomes and draw comparisons for better and informed selection a voting technique.

[10]  arXiv:2405.02298 [pdf, ps, other]
Title: Development and Validation of an Artificial Neural Network for the Recognition of Custom Dataset with YOLOv4
Subjects: Computational Engineering, Finance, and Science (cs.CE)

The expanding applications, utilized by more users, enhance hardware performance and further develop cloud systems for big data processing. This leads to numerous unexplored deep learning applications, especially in advanced computer vision for object recognition. Deep learning in image processing encompasses varied tasks from recognizing elements with diverse shapes and sizes to complex element classification, coping with varying backgrounds and lighting conditions, and text recognition. Its advantages lie in robust setup and high performance for recognizing complex elements. This work aims to develop a deep learning-based detection system for automated recognition of assembly components differing in geometry, size, contour, or color. Implementing the YOLOv4 algorithm, the system detects components based on their characteristics. Testing with 13 components involves capturing them in different orientations, numbers, individual parts, or assembled groups using a Raspberry Pi microcontroller and camera. Evaluation focuses on correct object recognition, confidence values, different compositions, distances between objects, and environmental factors affecting system quality. Results show positive object recognition across all scenarios, irrespective of orientation or number of objects. Even densely packed objects are correctly recognized with high confidence (97-100%). Lighting conditions don't significantly impact results, and all objects are properly labeled. The developed system is suitable for real-time two-dimensional component detection, with potential for extension to three-dimensional analysis using multiple cameras with varied positioning and views.

[11]  arXiv:2405.02299 [pdf, other]
Title: Deep Reinforcement Learning for Modelling Protein Complexes
Comments: International Conference on Learning Representations (ICLR 2024)
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modelling, PCM). However, there are still two challenges: 1) The huge combinatorial optimization space of $N^{N-2}$ ($N$ is the number of chains) for the PCM problem can easily lead to high computational cost. 2) The scales of protein complexes exhibit distribution shift due to variance in chain numbers, which calls for the generalization in modelling complexes of various scales. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PCM prediction. Specifically, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we design an adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of complexes and the global assembly rules learned from complexes with varied chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading PCM softwares. GAPN outperforms the state-of-the-art method (MoLPC) with up to 27% improvement in TM-Score, with a speed-up of 600 times. Our code is released at \url{https://github.com/ft2023/GAPN}.

[12]  arXiv:2405.02301 [pdf, other]
Title: TFCounter:Polishing Gems for Training-Free Object Counting
Comments: 14pages,11 figuers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-aware via the cascade of the essential elements in large-scale foundation models. This approach employs an iterative counting framework with a dual prompt system to recognize a broader spectrum of objects varying in shape, appearance, and size. Besides, it introduces an innovative context-aware similarity module incorporating background context to enhance accuracy within messy scenes. To demonstrate cross-domain generalizability, we collect a novel counting dataset named BIKE-1000, including exclusive 1000 images of shared bicycles from Meituan. Extensive experiments on FSC-147, CARPK, and BIKE-1000 datasets demonstrate that TFCounter outperforms existing leading training-free methods and exhibits competitive results compared to trained counterparts.

[13]  arXiv:2405.02302 [pdf, other]
Title: The Democratization of Wealth Management: Hedged Mutual Fund Blockchain Protocol
Authors: Ravi Kashyap
Subjects: Cryptography and Security (cs.CR); Computational Finance (q-fin.CP); Portfolio Management (q-fin.PM); Risk Management (q-fin.RM); Trading and Market Microstructure (q-fin.TR)

We develop several innovations designed to bring the best practices of traditional investment funds to the blockchain landscape. Our innovations combine the superior mechanisms of mutual funds and hedge funds. Specifically, we illustrate how fund prices can be updated regularly like mutual funds and performance fees can be charged like hedge funds. We show how mutually hedged blockchain investment funds can operate with investor protection schemes - high water marks - and measures to offset trading slippage when redemptions happen. We provide detailed steps - including mathematical formulations and instructive pointers - to implement these ideas as blockchain smart contracts. We discuss how our designs overcome several blockchain bottlenecks and how we can make smart contracts smarter. We provide numerical illustrations of several scenarios related to the mechanisms we have tailored for blockchain implementation.
The concepts we have developed for blockchain implementation can also be useful in traditional financial funds to calculate performance fees in a simplified manner. We highlight two main issues with the operation of mutual funds and hedge funds and show how blockchain technology can alleviate those concerns. The ideas developed here illustrate on one hand, how blockchain can solve many issues faced by the traditional world and on the other hand, how many innovations from traditional finance can benefit decentralized finance and speed its adoption. This becomes an example of symbiosis between decentralized and traditional finance - bringing these two realms closer and breaking down barriers between such artificial distinctions - wherein the future will be about providing better risk adjusted wealth appreciation opportunities to end customers through secure, reliable, accessible and transparent services - without getting too caught up about how such services are being rendered.

[14]  arXiv:2405.02303 [pdf, other]
Title: Enhanced Thermal Management in High-Temperature Applications: Design and Optimization of a Water-Cooled Forced Convection System in a Hollow Cuboid Vapour Chamber Using COMSOL and MATLAB
Comments: 18 pages, 52 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)

This report details the design and optimisation of a water-cooled forced convection heat dissipation system for use in high-temperature applications (ranges between 700 degrees - 1000 degrees K). A hollow cuboid vapour chamber model was investigated. The space within the hollow cuboid was used as the design space. COMSOL, a FEM software product was used to solve for the physical parameters of each geometry for the heat dissipation system design space. COMSOL in conjunction with MATLAB was used for the parametric and density-based topology optimisation of the geometric design in the design space. The goal of the optimization is the minimisation of a temperature gradient over the design space. This allows the heat to be evenly spread throughout the designed mesh which allows for more effective cooling. To reduce the computational time needed to solve and optimise each geometry in 3D, a 2D representation was created for the front and rear faces of the hollow cuboid setup. These 2D face designs were then extrapolated into 3D over the length of the hollow cube and COMSOL was used to find a solution for each model. This report also proposes a use case for this system wherein it would be used in conjunction with MGA and thermometric technology within coal-fired power stations for the extraction and storage of waste heat for later use.

[15]  arXiv:2405.02305 [pdf, ps, other]
Title: Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Image and Video Processing (eess.IV)

Image captioning models are widely used to describe recent and archived pictures with the objective of improving their accessibility and retrieval. Yet, these approaches tend to be inefficient and biased at retrieving people's names. In this work we introduce AstroCaptions, a dataset for the image captioning task. This dataset specifically contains thousands of public fig-ures that are complex to identify for a traditional model. We also propose a novel post-processing method to insert identified people's names inside the caption using explainable AI tools and the grounding capabilities of vi-sion-language models. The results obtained with this method show signifi-cant improvements of captions quality and a potential of reducing halluci-nations. Up to 93.2% of the persons detected can be inserted in the image captions leading to improvements in the BLEU, ROUGE, CIDEr and METEOR scores of each captioning model.

[16]  arXiv:2405.02310 [pdf, other]
Title: Simulating the aftermath of Northern European Enclosure Dam (NEED) break and flooding of European coast
Comments: 24 pages, 32 figures, 1 table
Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA); Atmospheric and Oceanic Physics (physics.ao-ph)

The Northern European Enclosure Dam (NEED) is a hypothetical project to prevent flooding in European countries following the rising ocean level due to melting arctic glaciers. This project involves the construction of two large dams between Scotland and Norway, as well as England and France. The anticipated cost of this project is 250 to 500 billion euros. In this paper, we present the simulation of the aftermath of flooding on the European coastline caused by a catastrophic break of this hypothetical dam. From our simulation results, we can observe that there is a traveling wave after the accident, with a velocity of around 10 kilometers per hour, raising the sea level permanently inside the dammed region. This observation implies a need to construct additional dams or barriers protecting the northern coastline of the Netherlands and the interior of the Baltic Sea. Our simulations have been obtained using the following building blocks. First, a graph transformation model was applied to generate an adaptive mesh approximating the topography of the Earth. We employ the composition graph grammar model for breaking triangular elements in the mesh without the generation of hanging nodes. Second, the wave equation is formulated in a spherical latitude-longitude system of coordinates and solved by a high-order time integration scheme using the generalized $\alpha$ method.

[17]  arXiv:2405.02312 [pdf, ps, other]
Title: YOLOv5 vs. YOLOv8 in Marine Fisheries: Balancing Class Detection and Instance Count
Comments: 12 pages, 25 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This paper presents a comparative study of object detection using YOLOv5 and YOLOv8 for three distinct classes: artemia, cyst, and excrement. In this comparative study, we analyze the performance of these models in terms of accuracy, precision, recall, etc. where YOLOv5 often performed better in detecting Artemia and cysts with excellent precision and accuracy. However, when it came to detecting excrement, YOLOv5 faced notable challenges and limitations. This suggests that YOLOv8 offers greater versatility and adaptability in detection tasks while YOLOv5 may struggle in difficult situations and may need further fine-tuning or specialized training to enhance its performance. The results show insights into the suitability of YOLOv5 and YOLOv8 for detecting objects in challenging marine environments, with implications for applications such as ecological research.

[18]  arXiv:2405.02314 [pdf, ps, other]
Title: A geometric framework for interstellar discourse on fundamental physical structures
Comments: 15 pages, 2 figures
Subjects: Information Theory (cs.IT); Computation and Language (cs.CL)

This paper considers the possibility that abstract thinking and advanced synthesis skills might encourage extraterrestrial civilizations to accept communication with mankind on Earth. For this purpose, a notation not relying upon the use of alphabet and numbers is proposed, in order to denote just some basic geometric structures of current physical theories: vector fields, one-form fields, and tensor fields of arbitrary order. An advanced civilization might appreciate the way here proposed to achieve a concise description of electromagnetism and general relativity, and hence it might accept the challenge of responding to our signals. The abstract symbols introduced in this paper to describe the basic structures of physical theories are encoded into black and white bitmap images that can be easily converted into short bit sequences and modulated on a carrier wave for radio transmission.

[19]  arXiv:2405.02316 [pdf, ps, other]
Title: A Cloud-Edge Framework for Energy-Efficient Event-Driven Control: An Integration of Online Supervised Learning, Spiking Neural Networks and Local Plasticity Rules
Comments: 13 pages, 19 figures
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

This paper presents a novel cloud-edge framework for addressing computational and energy constraints in complex control systems. Our approach centers around a learning-based controller using Spiking Neural Networks (SNN) on physical plants. By integrating a biologically plausible learning method with local plasticity rules, we harness the efficiency, scalability, and low latency of SNNs. This design replicates control signals from a cloud-based controller directly on the plant, reducing the need for constant plant-cloud communication. The plant updates weights only when errors surpass predefined thresholds, ensuring efficiency and robustness in various conditions. Applied to linear workbench systems and satellite rendezvous scenarios, including obstacle avoidance, our architecture dramatically lowers normalized tracking error by 96% with increased network size. The event-driven nature of SNNs minimizes energy consumption, utilizing only about 111 nJ (0.3% of conventional computing requirements). The results demonstrate the system's adjustment to changing work environments and its efficient use of computational and energy resources, with a moderate increase in energy consumption of 27.2% and 37% for static and dynamic obstacles, respectively, compared to non-obstacle scenarios.

[20]  arXiv:2405.02317 [pdf, other]
Title: Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the camera. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group. A massive independent testing dataset of 12,518,250 student label instances, of total duration of 21 hours and 22 minutes of real-life videos, is used for evaluating the performance of our proposed method for student group detection. The proposed method of using multiple image representations is shown to perform equally or better than YOLO on all video instances. Over the entire dataset, the proposed method achieved an F1 score of 0.85 compared to 0.80 for YOLO. Following student group detection, the paper presents the development of a dynamic participant tracking system for assessing student group participation through long video sessions. The proposed dynamic participant tracking system is shown to perform exceptionally well, missing a student in just one out of 35 testing videos. In comparison, a state of the art method fails to track students in 14 out of the 35 testing videos. The proposed method achieves 82.3% accuracy on an independent set of long, real-life collaborative videos.

[21]  arXiv:2405.02318 [pdf, other]
Title: NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

Logical fallacies are common errors in reasoning that undermine the logic of an argument. Automatically detecting logical fallacies has important applications in tracking misinformation and validating claims. In this paper, we design a process to reliably detect logical fallacies by translating natural language to First-order Logic (FOL) step-by-step using Large Language Models (LLMs). We then utilize Satisfiability Modulo Theory (SMT) solvers to reason about the validity of the formula and classify inputs as either a fallacy or valid statement. Our model also provides a novel means of utilizing LLMs to interpret the output of the SMT solver, offering insights into the counter-examples that illustrate why a given sentence is considered a logical fallacy. Our approach is robust, interpretable and does not require training data or fine-tuning. We evaluate our model on a mixed dataset of fallacies and valid sentences. The results demonstrate improved performance compared to end-to-end LLMs, with our classifier achieving an F1-score of 71\% on the Logic dataset. The approach is able to generalize effectively, achieving an F1-score of 73% on the challenge set, LogicClimate, outperforming state-of-the-art models by 21% despite its much smaller size.

[22]  arXiv:2405.02320 [pdf, other]
Title: A SER-based Device Selection Mechanism in Multi-bits Quantization Federated Learning
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)

The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes advantage of the superposition characteristics of wireless channels. The Minimum Mean Square Error (MMSE) based serial interference cancellation (SIC) technology is used to recover the gradient of each terminal node one by one at the receiving end. In this paper, the gradient parameters are quantized into multiple bits to retain more gradient information to the maximum extent and to improve the tolerance of transmission errors. On this basis, we designed the SER-based device selection mechanism (SER-DSM) to ensure that the learning performance is not affected by users with bad communication conditions, while accommodating as many users as possible to participate in the learning process, which is inclusive to a certain extent. The experiments show the influence of multi-bit quantization of gradient on FL and the necessity and superiority of the proposed SER-based device selection mechanism.

[23]  arXiv:2405.02321 [pdf, other]
Title: Accelerating Medical Knowledge Discovery through Automated Knowledge Graph Generation and Enrichment
Comments: 18 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Knowledge graphs (KGs) serve as powerful tools for organizing and representing structured knowledge. While their utility is widely recognized, challenges persist in their automation and completeness. Despite efforts in automation and the utilization of expert-created ontologies, gaps in connectivity remain prevalent within KGs. In response to these challenges, we propose an innovative approach termed ``Medical Knowledge Graph Automation (M-KGA)". M-KGA leverages user-provided medical concepts and enriches them semantically using BioPortal ontologies, thereby enhancing the completeness of knowledge graphs through the integration of pre-trained embeddings. Our approach introduces two distinct methodologies for uncovering hidden connections within the knowledge graph: a cluster-based approach and a node-based approach. Through rigorous testing involving 100 frequently occurring medical concepts in Electronic Health Records (EHRs), our M-KGA framework demonstrates promising results, indicating its potential to address the limitations of existing knowledge graph automation techniques.

[24]  arXiv:2405.02323 [pdf, ps, other]
Title: CNN-Based Equalization for Communications: Achieving Gigabit Throughput with a Flexible FPGA Hardware Architecture
Comments: The article was submitted to the International Journal of Parallel Programming (IJPP) and is currently under review
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Signal Processing (eess.SP)

To satisfy the growing throughput demand of data-intensive applications, the performance of optical communication systems increased dramatically in recent years. With higher throughput, more advanced equalizers are crucial, to compensate for impairments caused by inter-symbol interference (ISI). The latest research shows that artificial neural network (ANN)-based equalizers are promising candidates to replace traditional algorithms for high-throughput communications. On the other hand, not only throughput but also flexibility is a main objective of beyond-5G and 6G communication systems. A platform that is able to satisfy the strict throughput and flexibility requirements of modern communication systems are field programmable gate arrays (FPGAs). Thus, in this work, we present a high-performance FPGA implementation of an ANN-based equalizer, which meets the throughput requirements of modern optical communication systems. Further, our architecture is highly flexible since it includes a variable degree of parallelism (DOP) and therefore can also be applied to low-cost or low-power applications which is demonstrated for a magnetic recording channel. The implementation is based on a cross-layer design approach featuring optimizations from the algorithm down to the hardware architecture, including a detailed quantization analysis. Moreover, we present a framework to reduce the latency of the ANN-based equalizer under given throughput constraints. As a result, the bit error ratio (BER) of our equalizer for the optical fiber channel is around four times lower than that of a conventional one, while the corresponding FPGA implementation achieves a throughput of more than 40 GBd, outperforming a high-performance graphics processing unit (GPU) by three orders of magnitude for a similar batch size.

[25]  arXiv:2405.02324 [pdf, ps, other]
Title: Combined Compromise for Ideal Solution (CoCoFISo): a multi-criteria decision-making based on the CoCoSo method algorithm
Authors: Rôlin Gabriel Rasoanaivo (IRIT, UT Capitole, IRIT-ADRIA), Morteza Yazdani (UIV), Pascale Zaraté (IRIT, UT Capitole, IRIT-ADRIA), Amirhossein Fateh (UPV)
Comments: Expert Systems with Applications, 2024
Subjects: Artificial Intelligence (cs.AI)

Each decision-making tool should be tested and validated in real case studies to be practical and fit to global problems. The application of multi-criteria decision-making methods (MCDM) is currently a trend to rank alternatives. In the literature, there are several multi-criteria decision-making methods according to their classification. During our experimentation on the Combined Compromise Solution (CoCoSo) method, we encountered its limits for real cases. The authors examined the applicability of the CoCoFISo method (improved version of combined compromise solution), by a real case study in a university campus and compared the obtained results to other MCDMs such as Preference Ranking Organisation Method for Enrichment Evaluations (PROMETHEE), Weighted Sum Method (WSM) and Technique for Order Preference by Similarity to the Ideal Solution (TOPSIS). Our research finding indicates that CoCoSo is an applied method that has been developed to solve complex multi variable assessment problems, while CoCoFISo can improve the shortages observed in CoCoSo and deliver stable outcomes compared to other developed tools. The findings imply that application of CoCoFISo is suggested to decision makers, experts and researchers while they are facing practical challenges and sensitive questions regarding the utilization of a reliable decision-making method. Unlike many prior studies, the current version of CoCoSo is unique, original and is presented for the first time. Its performance was approved using several strategies and examinations.

[26]  arXiv:2405.02325 [pdf, ps, other]
Title: Meat Meets Machine! Multiscale Competency Enables Causal Learning
Comments: Definitions shared with arXiv:2404.07227, arXiv:2302.00843
Subjects: Artificial Intelligence (cs.AI)

Biological intelligence uses a "multiscale competency architecture" (MCA). It exhibits adaptive, goal directed behaviour at all scales, from cells to organs to organisms. In contrast, machine intelligence is only adaptive and goal directed at a high level. Learned policies are passively interpreted using abstractions (e.g. arithmetic) embodied in static interpreters (e.g. x86). Biological intelligence excels at causal learning. Machine intelligence does not. Previous work showed causal learning follows from weak policy optimisation, which is hindered by presupposed abstractions in silico. Here we formalise MCAs as nested "agentic abstraction layers", to understand how they might learn causes. We show that weak policy optimisation at low levels enables weak policy optimisation at high. This facilitates what we call "multiscale causal learning" and high level goal directed behaviour. We argue that by engineering human abstractions in silico we disconnect high level goal directed behaviour from the low level goal directed behaviour that gave rise to it. This inhibits causal learning, and we speculate this is one reason why human recall would be accompanied by feeling, and in silico recall not.

[27]  arXiv:2405.02326 [pdf, other]
Title: Evaluating LLMs for Hardware Design and Test
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)

Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs). However, most of the focus remains on their abilities to write functional code, not test code. The hardware design process consists of both design and test, and so eschewing validation and verification leaves considerable potential benefit unexplored, given that a design and test framework may allow for progress towards full automation of the digital design pipeline. In this work, we perform one of the first studies exploring how a LLM can both design and test hardware modules from provided specifications. Using a suite of 8 representative benchmarks, we examined the capabilities and limitations of the state-of-the-art conversational LLMs when producing Verilog for functional and verification purposes. We taped out the benchmarks on a Skywater 130nm shuttle and received the functional chip.

[28]  arXiv:2405.02327 [pdf, other]
Title: CausalDisco: Causal discovery using knowledge graph link prediction
Comments: 9 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI)

Causal discovery is a process of discovering new causal relations from observational data. Traditional causal discovery methods often suffer from issues related to missing data To address these issues, this paper presents a novel approach called CausalDisco that formulates causal discovery as a knowledge graph completion problem. More specifically, the task of discovering causal relations is mapped to the task of knowledge graph link prediction. CausalDisco supports two types of discovery: causal explanation and causal prediction. The causal relations have weights representing the strength of the causal association between entities in the knowledge graph. An evaluation of this approach uses a benchmark dataset of simulated videos for causal reasoning, CLEVRER-Humans, and compares the performance of multiple knowledge graph embedding algorithms. In addition, two distinct dataset splitting approaches are utilized within the evaluation: (1) random-based split, which is the method typically used to evaluate link prediction algorithms, and (2) Markov-based split, a novel data split technique for evaluating link prediction that utilizes the Markovian property of the causal relation. Results show that using weighted causal relations improves causal discovery over the baseline without weighted relations.

[29]  arXiv:2405.02329 [pdf, ps, other]
Title: Digital ASIC Design with Ongoing LLMs: Strategies and Prospects
Comments: 8 pages, 2 figures, 1 table
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The escalating complexity of modern digital systems has imposed significant challenges on integrated circuit (IC) design, necessitating tools that can simplify the IC design flow. The advent of Large Language Models (LLMs) has been seen as a promising development, with the potential to automate the generation of Hardware Description Language (HDL) code, thereby streamlining digital IC design. However, the practical application of LLMs in this area faces substantial hurdles. Notably, current LLMs often generate HDL code with small but critical syntax errors and struggle to accurately convey the high-level semantics of circuit designs. These issues significantly undermine the utility of LLMs for IC design, leading to misinterpretations and inefficiencies.
In response to these challenges, this paper presents targeted strategies to harness the capabilities of LLMs for digital ASIC design. We outline approaches that improve the reliability and accuracy of HDL code generation by LLMs. As a practical demonstration of these strategies, we detail the development of a simple three-phase Pulse Width Modulation (PWM) generator. This project, part of the "Efabless AI-Generated Open-Source Chip Design Challenge," successfully passed the Design Rule Check (DRC) and was fabricated, showcasing the potential of LLMs to enhance digital ASIC design. This work underscores the feasibility and benefits of integrating LLMs into the IC design process, offering a novel approach to overcoming the complexities of modern digital systems.

[30]  arXiv:2405.02330 [pdf, other]
Title: Adaptive Semantic Token Selection for AI-native Goal-oriented Communications
Comments: 5 pages
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on bandwidth and computation. Transformers have become the standard architecture for pretraining large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable latency and bandwidth constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multihead attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and bandwidth constraints, without the need for deploying multiple architectures tailored to each constraint. Last, but not least, the proposed token selection mechanism helps extract powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.

[31]  arXiv:2405.02332 [pdf, other]
Title: Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models
Journal-ref: Generative Models for Computer Vision - CVPR 2024 Workshop, Jun 2024, Seattle, United States
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Image classifiers should be used with caution in the real world. Performance evaluated on a validation set may not reflect performance in the real world. In particular, classifiers may perform well for conditions that are frequently encountered during training, but poorly for other infrequent conditions. In this study, we hypothesize that recent advances in text-to-image generative models make them valuable for benchmarking computer vision models such as image classifiers: they can generate images conditioned by textual prompts that cause classifier failures, allowing failure conditions to be described with textual attributes. However, their generation cost becomes an issue when a large number of synthetic images need to be generated, which is the case when many different attribute combinations need to be tested. We propose an image classifier benchmarking method as an iterative process that alternates image generation, classifier evaluation, and attribute selection. This method efficiently explores the attributes that ultimately lead to poor behavior detection.

[32]  arXiv:2405.02333 [pdf, ps, other]
Title: Speech Technology Services for Oral History Research
Comments: 5 pages plus references, 3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Oral history is about oral sources of witnesses and commentors on historical events. Speech technology is an important instrument to process such recordings in order to obtain transcription and further enhancements to structure the oral account In this contribution we address the transcription portal and the webservices associated with speech processing at BAS, speech solutions developed at LINDAT, how to do it yourself with Whisper, remaining challenges, and future developments.

[33]  arXiv:2405.02334 [pdf, other]
Title: Rad4XCNN: a new agnostic method for post-hoc global explanation of CNN-derived features by means of radiomics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In the last years, artificial intelligence (AI) in clinical decision support systems (CDSS) played a key role in harnessing machine learning and deep learning architectures. Despite their promising capabilities, the lack of transparency and explainability of AI models poses significant challenges, particularly in medical contexts where reliability is a mandatory aspect. Achieving transparency without compromising predictive accuracy remains a key challenge. This paper presents a novel method, namely Rad4XCNN, to enhance the predictive power of CNN-derived features with the interpretability inherent in radiomic features. Rad4XCNN diverges from conventional methods based on saliency map, by associating intelligible meaning to CNN-derived features by means of Radiomics, offering new perspectives on explanation methods beyond visualization maps. Using a breast cancer classification task as a case study, we evaluated Rad4XCNN on ultrasound imaging datasets, including an online dataset and two in-house datasets for internal and external validation. Some key results are: i) CNN-derived features guarantee more robust accuracy when compared against ViT-derived and radiomic features; ii) conventional visualization map methods for explanation present several pitfalls; iii) Rad4XCNN does not sacrifice model accuracy for their explainability; iv) Rad4XCNN provides global explanation insights enabling the physician to analyze the model outputs and findings. In addition, we highlight the importance of integrating interpretability into AI models for enhanced trust and adoption in clinical practice, emphasizing how our method can mitigate some concerns related to explainable AI methods.

[34]  arXiv:2405.02335 [pdf, other]
Title: sDAC -- Semantic Digital Analog Converter for Semantic Communications
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper, we propose a novel semantic digital analog converter (sDAC) for the compatibility of semantic communications and digital communications. Most of the current semantic communication systems are based on the analog modulations, ignoring their incorporation with digital communication systems, which are more common in practice. In fact, quantization methods in traditional communication systems are not appropriate for use in the era of semantic communication as these methods do not consider the semantic information inside symbols. In this case, any bit flip caused by channel noise can lead to a great performance drop. To address this challenge, sDAC is proposed. It is a simple yet efficient and generative module used to realize digital and analog bi-directional conversion. On the transmitter side, continuous values from the encoder are converted to binary bits and then can be modulated by any existing methods. After transmitting through the noisy channel, these bits get demodulated by paired methods and converted back to continuous values for further semantic decoding. The whole progress does not depend on any specific semantic model, modulation methods, or channel conditions. In the experiment section, the performance of sDAC is tested across different semantic models, semantic tasks, modulation methods, channel conditions and quantization orders. Test results show that the proposed sDAC has great generative properties and channel robustness.

[35]  arXiv:2405.02336 [pdf, other]
Title: Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

Building future wireless systems that support services like digital twins (DTs) is challenging to achieve through advances to conventional technologies like meta-surfaces. While artificial intelligence (AI)-native networks promise to overcome some limitations of wireless technologies, developments still rely on AI tools like neural networks. Such tools struggle to cope with the non-trivial challenges of the network environment and the growing demands of emerging use cases. In this paper, we revisit the concept of AI-native wireless systems, equipping them with the common sense necessary to transform them into artificial general intelligence (AGI)-native systems. These systems acquire common sense by exploiting different cognitive abilities such as perception, analogy, and reasoning, that enable them to generalize and deal with unforeseen scenarios. Towards developing the components of such a system, we start by showing how the perception module can be built through abstracting real-world elements into generalizable representations. These representations are then used to create a world model, founded on principles of causality and hyper-dimensional (HD) computing, that aligns with intuitive physics and enables analogical reasoning, that define common sense. Then, we explain how methods such as integrated information theory play a role in the proposed intent-driven and objective-driven planning methods that maneuver the AGI-native network to take actions. Next, we discuss how an AGI-native network can enable use cases related to human and autonomous agents: a) analogical reasoning for next-generation DTs, b) synchronized and resilient experiences for cognitive avatars, and c) brain-level metaverse experiences like holographic teleportation. Finally, we conclude with a set of recommendations to build AGI-native systems. Ultimately, we envision this paper as a roadmap for the beyond 6G era.

[36]  arXiv:2405.02337 [pdf, ps, other]
Title: Design Fiction as Breaching Experiment: An Interdisciplinary Methodology for Understanding the Acceptability and Adoption of Future Technologies
Comments: 22 pages plus references
Subjects: Human-Computer Interaction (cs.HC)

HCI is fundamentally occupied with the problem of the future and understanding the acceptability and adoption challenges that future and emerging technologies face from the viewpoint of their being situated in everyday life. This paper explicates an interdisciplinary approach towards addressing the problem and understanding acceptability and adoption challenges that leverages design fiction as breaching experiment. Design fiction is an arts based approach to exploring the future, breaching experiments a social science method for explicating common sense reasoning and surfacing the taken for granted expectations societys members have and hold about situated action and how it should work. Both approaches have previously been employed in HCI, but this the first time they have been combined to enable HCI researchers to provoke through design the acceptability and adoption challenges that confront future and emerging technologies.

[37]  arXiv:2405.02338 [pdf, other]
Title: Mixed or Misperceived Reality?
Subjects: Human-Computer Interaction (cs.HC)

"Surrealism Me" delves into Vil\'em Flusser's critique of media as mediators that often distort human perception of reality through an interactive virtual-embodying MR experience. It examines the obfuscating nature of media and reveals the constructed nature of media-projected realities, prompting a reevaluation of media's role and influence on our perception.

[38]  arXiv:2405.02341 [pdf, other]
Title: Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.
In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.

[39]  arXiv:2405.02342 [pdf, other]
Title: The Birkhoff completion of finite lattices
Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

We introduce the Birkhoff completion as the smallest distributive lattice in which a given finite lattice can be embedded as semi-lattice. We discuss its relationship to implicational theories, in particular to R. Wille's simply-implicational theories. By an example, we show how the Birkhoff completion can be used as a tool for ordinal data science.

[40]  arXiv:2405.02344 [pdf, other]
Title: Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Attribution methods compute importance scores for input features to explain the output predictions of deep models. However, accurate assessment of attribution methods is challenged by the lack of benchmark fidelity for attributing model predictions. Moreover, other confounding factors in attribution estimation, including the setup choices of post-processing techniques and explained model predictions, further compromise the reliability of the evaluation. In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill, thereby facilitating a systematic assessment of attribution benchmarks. Next, we introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria. We theoretically establish the superiority of our approach over the existing benchmarks for well-founded attribution evaluation. With extensive analysis, we also identify a setup for a consistent and fair benchmarking of attribution methods across different underlying methodologies. This setup is ultimately employed for a comprehensive comparison of existing methods using our BackX benchmark. Finally, our analysis also provides guidance for defending against backdoor attacks with the help of attribution methods.

[41]  arXiv:2405.02345 [pdf, other]
Title: Exploring the Capabilities of Large Language Models for Generating Diverse Design Solutions
Comments: preprint of journal paper
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Access to large amounts of diverse design solutions can support designers during the early stage of the design process. In this paper, we explore the efficacy of large language models (LLM) in producing diverse design solutions, investigating the level of impact that parameter tuning and various prompt engineering techniques can have on the diversity of LLM-generated design solutions. Specifically, LLMs are used to generate a total of 4,000 design solutions across five distinct design topics, eight combinations of parameters, and eight different types of prompt engineering techniques, comparing each combination of parameter and prompt engineering method across four different diversity metrics. LLM-generated solutions are compared against 100 human-crowdsourced solutions in each design topic using the same set of diversity metrics. Results indicate that human-generated solutions consistently have greater diversity scores across all design topics. Using a post hoc logistic regression analysis we investigate whether these differences primarily exist at the semantic level. Results show that there is a divide in some design topics between humans and LLM-generated solutions, while others have no clear divide. Taken together, these results contribute to the understanding of LLMs' capabilities in generating a large volume of diverse design solutions and offer insights for future research that leverages LLMs to generate diverse design solutions for a broad range of design tasks (e.g., inspirational stimuli).

[42]  arXiv:2405.02346 [pdf, other]
Title: Temporal assessment of malicious behaviors: application to turnout field data monitoring
Comments: To be published in the International Conference on Control, Automation and Diagnosis (ICCAD24)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Systems and Control (eess.SY)

Monitored data collected from railway turnouts are vulnerable to cyberattacks: attackers may either conceal failures or trigger unnecessary maintenance actions. To address this issue, a cyberattack investigation method is proposed based on predictions made from the temporal evolution of the turnout behavior. These predictions are then compared to the field acquired data to detect any discrepancy. This method is illustrated on a collection of real-life data.

[43]  arXiv:2405.02347 [pdf, other]
Title: COPAL: Continual Pruning in Large Language Generative Models
Comments: Accepted to ICML2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Adapting pre-trained large language models to different domains in natural language processing requires two key considerations: high computational demands and model's inability to continual adaptation. To simultaneously address both issues, this paper presents COPAL (COntinual Pruning in Adaptive Language settings), an algorithm developed for pruning large language generative models under a continual model adaptation setting. While avoiding resource-heavy finetuning or retraining, our pruning process is guided by the proposed sensitivity analysis. The sensitivity effectively measures model's ability to withstand perturbations introduced by the new dataset and finds model's weights that are relevant for all encountered datasets. As a result, COPAL allows seamless model adaptation to new domains while enhancing the resource efficiency. Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models, demonstrating its efficacy in efficiency and adaptability.

[44]  arXiv:2405.02349 [pdf, ps, other]
Title: Explainable Muti-Label Classification of MBTI Types
Comments: 22 pages, 12 tables, 2 figure
Subjects: Machine Learning (cs.LG)

In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.

[45]  arXiv:2405.02350 [pdf, ps, other]
Title: What makes Models Compositional? A Theoretical View: With Supplement
Comments: Extended version of the original IJCAI 2024 paper with detailed supplementary materials (27 pages, 7 figures)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compositional structure of the models plays in these failures and how this structure relates to their expressivity and sample complexity. We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We then show how various existing general and special purpose sequence processing models (such as recurrent, convolution and attention-based ones) fit this definition and use it to analyze their compositional complexity. Finally, we provide theoretical guarantees for the expressivity and systematic generalization of compositional models that explicitly depend on our proposed definition and highlighting factors which drive poor empirical performance.

[46]  arXiv:2405.02351 [pdf, other]
Title: Towards General Neural Surrogate Solvers with Specialized Neural Accelerators
Comments: 8 pages, 7 Figures, to be published in ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Optics (physics.optics)

Surrogate neural network-based partial differential equation (PDE) solvers have the potential to solve PDEs in an accelerated manner, but they are largely limited to systems featuring fixed domain sizes, geometric layouts, and boundary conditions. We propose Specialized Neural Accelerator-Powered Domain Decomposition Methods (SNAP-DDM), a DDM-based approach to PDE solving in which subdomain problems containing arbitrary boundary conditions and geometric parameters are accurately solved using an ensemble of specialized neural operators. We tailor SNAP-DDM to 2D electromagnetics and fluidic flow problems and show how innovations in network architecture and loss function engineering can produce specialized surrogate subdomain solvers with near unity accuracy. We utilize these solvers with standard DDM algorithms to accurately solve freeform electromagnetics and fluids problems featuring a wide range of domain sizes.

[47]  arXiv:2405.02353 [pdf, other]
Title: Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets
Authors: Shravan Cheekati
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The training of Transformer models has revolutionized natural language processing and computer vision, but it remains a resource-intensive and time-consuming process. This paper investigates the applicability of the early-bird ticket hypothesis to optimize the training efficiency of Transformer models. We propose a methodology that combines iterative pruning, masked distance calculation, and selective retraining to identify early-bird tickets in various Transformer architectures, including ViT, Swin-T, GPT-2, and RoBERTa. Our experimental results demonstrate that early-bird tickets can be consistently found within the first few epochs of training or fine-tuning, enabling significant resource optimization without compromising performance. The pruned models obtained from early-bird tickets achieve comparable or even superior accuracy to their unpruned counterparts while substantially reducing memory usage. Furthermore, our comparative analysis highlights the generalizability of the early-bird ticket phenomenon across different Transformer models and tasks. This research contributes to the development of efficient training strategies for Transformer models, making them more accessible and resource-friendly. By leveraging early-bird tickets, practitioners can accelerate the progress of natural language processing and computer vision applications while reducing the computational burden associated with training Transformer models.

[48]  arXiv:2405.02354 [pdf, ps, other]
Title: Heterogeneous network and graph attention auto-encoder for LncRNA-disease association prediction
Comments: 10 pages, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

The emerging research shows that lncRNAs are associated with a series of complex human diseases. However, most of the existing methods have limitations in identifying nonlinear lncRNA-disease associations (LDAs), and it remains a huge challenge to predict new LDAs. Therefore, the accurate identification of LDAs is very important for the warning and treatment of diseases. In this work, multiple sources of biomedical data are fully utilized to construct characteristics of lncRNAs and diseases, and linear and nonlinear characteristics are effectively integrated. Furthermore, a novel deep learning model based on graph attention automatic encoder is proposed, called HGATELDA. To begin with, the linear characteristics of lncRNAs and diseases are created by the miRNA-lncRNA interaction matrix and miRNA-disease interaction matrix. Following this, the nonlinear features of diseases and lncRNAs are extracted using a graph attention auto-encoder, which largely retains the critical information and effectively aggregates the neighborhood information of nodes. In the end, LDAs can be predicted by fusing the linear and nonlinear characteristics of diseases and lncRNA. The HGATELDA model achieves an impressive AUC value of 0.9692 when evaluated using a 5-fold cross-validation indicating its superior performance in comparison to several recent prediction models. Meanwhile, the effectiveness of HGATELDA in identifying novel LDAs is further demonstrated by case studies. the HGATELDA model appears to be a viable computational model for predicting LDAs.

[49]  arXiv:2405.02355 [pdf, other]
Title: CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are inherently logical and complex, making them hard to be correctly generated. Existing methods rely on multiple prompts to the large language model to explore better solutions, which is expensive. In this paper, we propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks. CodeGRAG extracts and summarizes the control flow and data flow of code blocks to fill the gap between programming languages and natural language. The extracted external structural knowledge models the inherent flows of code blocks, which can facilitate LLMs for better understanding of code syntax and serve as a bridge among different programming languages. CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation, e.g., C++ for Python.

[50]  arXiv:2405.02356 [pdf, other]
Title: Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind stochastic multivariate universal-radix finite-state machine (SMURF) that harnesses SC for hardware-simplistic multivariate nonlinear function generation at high accuracy. We present the finite-state machine (FSM) architecture for SMURF, as well as analytical derivations of sampling gate coefficients for accurately approximating generic nonlinear functions. Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation, and merely 2.22% area of look-up table (LUT) schemes.

[51]  arXiv:2405.02357 [pdf, other]
Title: Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks
Comments: 9 pages
Subjects: Machine Learning (cs.LG)

Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban planning. Machine learning and deep learning methods are favored for their flexibility and accuracy. Nowadays, with the advent of large language models (LLMs), many researchers have combined these models with previous techniques or applied LLMs to directly predict future traffic information and human travel behaviors. However, there is a lack of comprehensive studies on how LLMs can contribute to this field. This survey explores existing approaches using LLMs for mobility forecasting problems. We provide a literature review concerning the forecasting applications within transportation systems, elucidating how researchers utilize LLMs, showcasing recent state-of-the-art advancements, and identifying the challenges that must be overcome to fully leverage LLMs in this domain.

[52]  arXiv:2405.02358 [pdf, other]
Title: A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Mode
Comments: 5 figures, 6 tables, 41 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. This success has sparked interest in the exploration of foundation models to solve multiple time series challenges simultaneously. There are two main research lines, namely \textbf{pre-training foundation models from scratch for time series} and \textbf{adapting large language foundation models for time series}. They both contribute to the development of a unified model that is highly generalizable, versatile, and comprehensible for time series analysis. This survey offers a 3E analytical framework for comprehensive examination of related research. Specifically, we examine existing works from three dimensions, namely \textbf{Effectiveness}, \textbf{Efficiency} and \textbf{Explainability}. In each dimension, we focus on discussing how related works devise tailored solution by considering unique challenges in the realm of time series.Furthermore, we provide a domain taxonomy to help followers keep up with the domain-specific advancements. In addition, we introduce extensive resources to facilitate the field's development, including datasets, open-source, time series libraries. A GitHub repository is also maintained for resource updates (https://github.com/start2020/Awesome-TimeSeries-LLM-FM).

[53]  arXiv:2405.02359 [pdf, other]
Title: CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Unsupervised graph-level anomaly detection (UGAD) has received remarkable performance in various critical disciplines, such as chemistry analysis and bioinformatics. Existing UGAD paradigms often adopt data augmentation techniques to construct multiple views, and then employ different strategies to obtain representations from different views for jointly conducting UGAD. However, most previous works only considered the relationship between nodes/graphs from a limited receptive field, resulting in some key structure patterns and feature information being neglected. In addition, most existing methods consider different views separately in a parallel manner, which is not able to explore the inter-relationship across different views directly. Thus, a method with a larger receptive field that can explore the inter-relationship across different views directly is in need. In this paper, we propose a novel Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection, namely, CVTGAD. To increase the receptive field, we construct a simplified transformer-based module, exploiting the relationship between nodes/graphs from both intra-graph and inter-graph perspectives. Furthermore, we design a cross-view attention mechanism to directly exploit the view co-occurrence between different views, bridging the inter-view gap at node level and graph level. To the best of our knowledge, this is the first work to apply transformer and cross attention to UGAD, which realizes graph neural network and transformer working collaboratively. Extensive experiments on 15 real-world datasets of 3 fields demonstrate the superiority of CVTGAD on the UGAD task. The code is available at \url{https://github.com/jindongli-Ai/CVTGAD}.

[54]  arXiv:2405.02360 [pdf, other]
Title: Holistic Evaluation Metrics: Use Case Sensitive Evaluation Metrics for Federated Learning
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

A large number of federated learning (FL) algorithms have been proposed for different applications and from varying perspectives. However, the evaluation of such approaches often relies on a single metric (e.g., accuracy). Such a practice fails to account for the unique demands and diverse requirements of different use cases. Thus, how to comprehensively evaluate an FL algorithm and determine the most suitable candidate for a designated use case remains an open question. To mitigate this research gap, we introduce the Holistic Evaluation Metrics (HEM) for FL in this work. Specifically, we collectively focus on three primary use cases, which are Internet of Things (IoT), smart devices, and institutions. The evaluation metric encompasses various aspects including accuracy, convergence, computational efficiency, fairness, and personalization. We then assign a respective importance vector for each use case, reflecting their distinct performance requirements and priorities. The HEM index is finally generated by integrating these metric components with their respective importance vectors. Through evaluating different FL algorithms in these three prevalent use cases, our experimental results demonstrate that HEM can effectively assess and identify the FL algorithms best suited to particular scenarios. We anticipate this work sheds light on the evaluation process for pragmatic FL algorithms in real-world applications.

[55]  arXiv:2405.02363 [pdf, other]
Title: LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

The distribution of subpopulations is an important property hidden within a dataset. Uncovering and analyzing the subpopulation distribution within datasets provides a comprehensive understanding of the datasets, standing as a powerful tool beneficial to various downstream tasks, including Dataset Subpopulation Organization, Subpopulation Shift, and Slice Discovery. Despite its importance, there has been no work that systematically explores the subpopulation distribution of datasets to our knowledge. To address the limitation and solve all the mentioned tasks in a unified way, we introduce a novel concept of subpopulation structures to represent, analyze, and utilize subpopulation distributions within datasets. To characterize the structures in an interpretable manner, we propose the Subpopulation Structure Discovery with Large Language Models (SSD-LLM) framework, which employs world knowledge and instruction-following capabilities of Large Language Models (LLMs) to linguistically analyze informative image captions and summarize the structures. Furthermore, we propose complete workflows to address downstream tasks, named Task-specific Tuning, showcasing the application of the discovered structure to a spectrum of subpopulation-related tasks, including dataset subpopulation organization, subpopulation shift, and slice discovery. Furthermore, we propose complete workflows to address downstream tasks, named Task-specific Tuning, showcasing the application of the discovered structure to a spectrum of subpopulation-related tasks, including dataset subpopulation organization, subpopulation shift, and slice discovery.

[56]  arXiv:2405.02364 [pdf, other]
Title: A Survey on Contribution Evaluation in Vertical Federated Learning
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and accurate evaluation of each entity's contribution to the learning process. This is crucial for maintaining trust among participating entities, ensuring equitable resource sharing, and fostering a sustainable collaboration framework. This paper provides a thorough review of contribution evaluation in VFL. We categorize the vast array of contribution evaluation techniques along the VFL lifecycle, granularity of evaluation, privacy considerations, and core computational methods. We also explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties and relation to the VFL lifecycle phases. Finally, we present a vision for the future challenges of contribution evaluation in VFL. By providing a structured analysis of the current landscape and potential advancements, this paper aims to guide researchers and practitioners in the design and implementation of more effective, efficient, and privacy-centric VFL solutions. Relevant literature and open-source resources have been compiled and are being continuously updated at the GitHub repository: \url{https://github.com/cuiyuebing/VFL_CE}.

[57]  arXiv:2405.02365 [pdf, other]
Title: Adaptive and robust watermark against model extraction attack
Subjects: Cryptography and Security (cs.CR)

Large language models have boosted Large Models as a Service (LMaaS) into a thriving business sector. But even model owners offering only API access while keeping model parameters and internal workings private, their Intellectual Property (IP) are still at risk of theft through model extraction attacks. To safeguard the IP of these models and mitigate unfair competition in the language model market, watermarking technology serves as an efficient post-hoc solution for identifying IP infringements. However, existing IP protection watermarking methods either explicitly alter the original output of the language model or implant watermark signals in the model logits. These methods forcefully distort the original distribution of the language model and impact the sampling process, leading to a decline in the quality of the generated text. The existing method also fails to achieve end-to-end adaptive watermark embedding and lack robustness verification in complex scenarios where watermark detection is subject to interference. To overcome these challenges, we propose PromptShield, a plug-and-play IP protection watermarking method to resist model extraction attacks without training additional modules. Leveraging the self-reminding properties inherent in large language models, we encapsulate the user's query with a watermark self-generated instruction, nudging the LLMs to automatically generate watermark words in its output without compromising generation quality. Our method does not require access to the model's internal logits and minimizes alterations to the model's distribution using prompt-guided cues. Comprehensive experimental results consistently demonstrate the effectiveness, harmlessness, and robustness of our watermark. Moreover, Our watermark detection method remains robust and high detection sensitivity even when subjected to interference.

[58]  arXiv:2405.02367 [pdf, other]
Title: Enhancing Social Media Post Popularity Prediction with Visual Content
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8\% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.

[59]  arXiv:2405.02369 [pdf, other]
Title: No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks
Comments: 12 pages, 4 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Biologically, the brain does not rely on a single type of neuron that universally functions in all aspects. Instead, it acts as a sophisticated designer of task-based neurons. In this study, we address the following question: since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? Since methodologically there are no one-size-fits-all neurons, given the same structure, task-based neurons can enhance the feature representation ability relative to the existing universal neurons due to the intrinsic inductive bias for the task. Specifically, we propose a two-step framework for prototyping task-based neurons. First, symbolic regression is used to identify optimal formulas that fit input data by utilizing base functions such as logarithmic, trigonometric, and exponential functions. We introduce vectorized symbolic regression that stacks all variables in a vector and regularizes each input variable to perform the same computation, which can expedite the regression speed, facilitate parallel computation, and avoid overfitting. Second, we parameterize the acquired elementary formula to make parameters learnable, which serves as the aggregation function of the neuron. The activation functions such as ReLU and the sigmoidal functions remain the same because they have proven to be good. Empirically, experimental results on synthetic data, classic benchmarks, and real-world applications show that the proposed task-based neuron design is not only feasible but also delivers competitive performance over other state-of-the-art models.

[60]  arXiv:2405.02370 [pdf, other]
Title: Neuromorphic Correlates of Artificial Consciousness
Authors: Anwaar Ulhaq
Comments: 13 Pages, 8 Figures
Subjects: Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

The concept of neural correlates of consciousness (NCC), which suggests that specific neural activities are linked to conscious experiences, has gained widespread acceptance. This acceptance is based on a wealth of evidence from experimental studies, brain imaging techniques such as fMRI and EEG, and theoretical frameworks like integrated information theory (IIT) within neuroscience and the philosophy of mind. This paper explores the potential for artificial consciousness by merging neuromorphic design and architecture with brain simulations. It proposes the Neuromorphic Correlates of Artificial Consciousness (NCAC) as a theoretical framework. While the debate on artificial consciousness remains contentious due to our incomplete grasp of consciousness, this work may raise eyebrows and invite criticism. Nevertheless, this optimistic and forward-thinking approach is fueled by insights from the Human Brain Project, advancements in brain imaging like EEG and fMRI, and recent strides in AI and computing, including quantum and neuromorphic designs. Additionally, this paper outlines how machine learning can play a role in crafting artificial consciousness, aiming to realise machine consciousness and awareness in the future.

[61]  arXiv:2405.02371 [pdf, ps, other]
Title: Architecture of a Cortex Inspired Hierarchical Event Recaller
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)

This paper proposes a new approach to Machine Learning (ML) that focuses on unsupervised continuous context-dependent learning of complex patterns. Although the proposal is partly inspired by some of the current knowledge about the structural and functional properties of the mammalian brain, we do not claim that biological systems work in an analogous way (nor the opposite). Based on some properties of the cerebellar cortex and adjacent structures, a proposal suitable for practical problems is presented. A synthetic structure capable of identifying and predicting complex temporal series will be defined and experimentally tested. The system relies heavily on prediction to help identify and learn patterns based on previously acquired contextual knowledge. As a proof of concept, the proposed system is shown to be able to learn, identify and predict a remarkably complex temporal series such as human speech, with no prior knowledge. From raw data, without any adaptation in the core algorithm, the system is able to identify certain speech structures from a set of Spanish sentences. Unlike conventional ML, the proposal can learn with a reduced training set. Although the idea can be applied to a constrained problem, such as the detection of unknown vocabulary in a speech, it could be used in more applications, such as vision, or (by incorporating the missing biological periphery) fit into other ML techniques. Given the trivial computational primitives used, a potential hardware implementation will be remarkably frugal. Coincidentally, the proposed model not only conforms to a plausible functional framework for biological systems but may also explain many elusive cognitive phenomena.

[62]  arXiv:2405.02375 [pdf, other]
Title: The Sparse Tsetlin Machine: Sparse Representation with Active Literals
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)

This paper introduces the Sparse Tsetlin Machine (STM), a novel Tsetlin Machine (TM) that processes sparse data efficiently. Traditionally, the TM does not consider data characteristics such as sparsity, commonly seen in NLP applications and other bag-of-word-based representations. Consequently, a TM must initialize, store, and process a significant number of zero values, resulting in excessive memory usage and computational time. Previous attempts at creating a sparse TM have predominantly been unsuccessful, primarily due to their inability to identify which literals are sufficient for TM training. By introducing Active Literals (AL), the STM can focus exclusively on literals that actively contribute to the current data representation, significantly decreasing memory footprint and computational time while demonstrating competitive classification performance.

[63]  arXiv:2405.02377 [pdf, other]
Title: Robustness of Decentralised Learning to Nodes and Data Disruption
Comments: Supported by the H2020 HumaneAI Net (952026), CHIST-ERA-19-XAI010 SAI, PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 FAIR, PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000001 RESTART
Subjects: Machine Learning (cs.LG)

In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes' disruption on the collective learning process. Assuming a given percentage of "central" nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.

[64]  arXiv:2405.02378 [pdf, other]
Title: Combining Crown Structures for Vulnerability Measures
Subjects: Data Structures and Algorithms (cs.DS)

Over the past decades, various metrics have emerged in graph theory to grasp the complex nature of network vulnerability. In this paper, we study two specific measures: (weighted) vertex integrity (wVI) and (weighted) component order connectivity (wCOC). These measures not only evaluate the number of vertices required to decompose a graph into fragments, but also take into account the size of the largest remaining component. The main focus of our paper is on kernelization algorithms tailored to both measures. We capitalize on the structural attributes inherent in different crown decompositions, strategically combining them to introduce novel kernelization algorithms that advance the current state of the field. In particular, we extend the scope of the balanced crown decomposition provided by Casel et al.~[7] and expand the applicability of crown decomposition techniques.
In summary, we improve the vertex kernel of VI from $p^3$ to $p^2$, and of wVI from $p^3$ to $3(p^2 + p^{1.5} p_{\ell})$, where $p_{\ell} < p$ represents the weight of the heaviest component after removing a solution. For wCOC we improve the vertex kernel from $\mathcal{O}(k^2W + kW^2)$ to $3\mu(k + \sqrt{\mu}W)$, where $\mu = \max(k,W)$. We also give a combinatorial algorithm that provides a $2kW$ vertex kernel in FPT-runtime when parameterized by $r$, where $r \leq k$ is the size of a maximum $(W+1)$-packing. We further show that the algorithm computing the $2kW$ vertex kernel for COC can be transformed into a polynomial algorithm for two special cases, namely when $W=1$, which corresponds to the well-known vertex cover problem, and for claw-free graphs. In particular, we show a new way to obtain a $2k$ vertex kernel (or to obtain a 2-approximation) for the vertex cover problem by only using crown structures.

[65]  arXiv:2405.02384 [pdf, other]
Title: CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models. We experimentally show that the precision weights effectively estimate the data predictability. We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and ERA surface wind datasets. Our results demonstrate that CogDPM outperforms both existing domain-specific operational models and general deep prediction models by providing more proficient forecasting.

[66]  arXiv:2405.02385 [pdf, other]
Title: Efficient Deep Learning with Decorrelated Backpropagation
Subjects: Machine Learning (cs.LG)

The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.

[67]  arXiv:2405.02386 [pdf, other]
Title: Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids
Comments: SIGGRAPH 2024, Project page: this https URL , Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotropic areas, achieving high-fidelity anti-aliasing renderings. Central to our approach are two key components: Platonic Solid Projection and Ripmap encoding. The Platonic Solid Projection factorizes the 3D space onto the unparalleled faces of a certain Platonic solid, such that the anisotropic 3D areas can be projected onto planes with distinguishable characterization. Meanwhile, each face of the Platonic solid is encoded by the Ripmap encoding, which is constructed by anisotropically pre-filtering a learnable feature grid, to enable featurzing the projected anisotropic areas both precisely and efficiently by the anisotropic area-sampling. Extensive experiments on both well-established synthetic datasets and a newly captured real-world dataset demonstrate that our Rip-NeRF attains state-of-the-art rendering quality, particularly excelling in the fine details of repetitive structures and textures, while maintaining relatively swift training times.

[68]  arXiv:2405.02411 [pdf, other]
Title: The Call for Socially Aware Language Technologies
Subjects: Computation and Language (cs.CL)

Language technologies have made enormous progress, especially with the introduction of large language models (LLMs). On traditional tasks such as machine translation and sentiment analysis, these models perform at near-human level. These advances can, however, exacerbate a variety of issues that models have traditionally struggled with, such as bias, evaluation, and risks. In this position paper, we argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates, which we call social awareness. While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users. Integrating social awareness into NLP models will make applications more natural, helpful, and safe, and will open up new possibilities. Thus we argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.

[69]  arXiv:2405.02412 [pdf, other]
Title: Deep Learning and Transfer Learning Architectures for English Premier League Player Performance Forecasting
Comments: 10 pages
Subjects: Machine Learning (cs.LG)

This paper presents a groundbreaking model for forecasting English Premier League (EPL) player performance using convolutional neural networks (CNNs). We evaluate Ridge regression, LightGBM and CNNs on the task of predicting upcoming player FPL score based on historical FPL data over the previous weeks. Our baseline models, Ridge regression and LightGBM, achieve solid performance and emphasize the importance of recent FPL points, influence, creativity, threat, and playtime in predicting EPL player performances. Our optimal CNN architecture achieves better performance with fewer input features and even outperforms the best previous EPL player performance forecasting models in the literature. The optimal CNN architecture also achieves very strong Spearman correlation with player rankings, indicating its strong implications for supporting the development of FPL artificial intelligence (AI) Agents and providing analysis for FPL managers. We additionally perform transfer learning experiments on soccer news data collected from The Guardian, for the same task of predicting upcoming player score, but do not identify a strong predictive signal in natural language news texts, achieving worse performance compared to both the CNN and baseline models. Overall, our CNN-based approach marks a significant advancement in EPL player performance forecasting and lays the foundation for transfer learning to other EPL prediction tasks such as win-loss odds for sports betting and the development of cutting-edge FPL AI Agents.

[70]  arXiv:2405.02413 [pdf, other]
Title: A Unified Framework for Human-Allied Learning of Probabilistic Circuits
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Probabilistic Circuits (PCs) have emerged as an efficient framework for representing and learning complex probability distributions. Nevertheless, the existing body of research on PCs predominantly concentrates on data-driven parameter learning, often neglecting the potential of knowledge-intensive learning, a particular issue in data-scarce/knowledge-rich domains such as healthcare. To bridge this gap, we propose a novel unified framework that can systematically integrate diverse domain knowledge into the parameter learning process of PCs. Experiments on several benchmarks as well as real world datasets show that our proposed framework can both effectively and efficiently leverage domain knowledge to achieve superior performance compared to purely data-driven learning approaches.

[71]  arXiv:2405.02417 [pdf, other]
Title: Hierarchies define the scalability of robot swarms
Comments: 31 Pages, 7 Figures. Supplementary material attached to the paper
Subjects: Robotics (cs.RO)

The emerging behaviors of swarms have fascinated scientists and gathered significant interest in the field of robotics. Traditionally, swarms are viewed as egalitarian, with robots sharing identical roles and capabilities. However, recent findings highlight the importance of hierarchy for deploying robot swarms more effectively in diverse scenarios. Despite nature's preference for hierarchies, the robotics field has clung to the egalitarian model, partly due to a lack of empirical evidence for the conditions favoring hierarchies. Our research demonstrates that while egalitarian swarms excel in environments proportionate to their collective sensing abilities, they struggle in larger or more complex settings. Hierarchical swarms, conversely, extend their sensing reach efficiently, proving successful in larger, more unstructured environments with fewer resources. We validated these concepts through simulations and physical robot experiments, using a complex radiation cleanup task. This study paves the way for developing adaptable, hierarchical swarm systems applicable in areas like planetary exploration and autonomous vehicles. Moreover, these insights could deepen our understanding of hierarchical structures in biological organisms.

[72]  arXiv:2405.02420 [pdf, ps, other]
Title: Inductive Reasoning with Equality Predicates, Contextual Rewriting and Variant-Based Simplification
Authors: Jose Meseguer
Comments: Submitted for publication
Subjects: Logic in Computer Science (cs.LO)

An inductive inference system for proving validity of formulas in the initial algebra $T_{\mathcal{E}}$ of an order-sorted equational theory $\mathcal{E}$ is presented. It has 20 inference rules, but only 9 of them require user interaction; the remaining 11 can be automated as simplification rules. In this way, a substantial fraction of the proof effort can be automated. The inference rules are based on advanced equational reasoning techniques, including: equationally defined equality predicates, narrowing, constructor variant unification, variant satisfiability, order-sorted congruence closure, contextual rewriting, ordered rewriting, and recursive path orderings. All these techniques work modulo axioms $B$, for $B$ any combination of associativity and/or commutativity and/or identity axioms. Most of these inference rules have already been implemented in Maude's NuITP inductive theorem prover.

[73]  arXiv:2405.02421 [pdf, other]
Title: What does the Knowledge Neuron Thesis Have to do with Knowledge?
Comments: ICLR 2024 (Spotlight)
Subjects: Computation and Language (cs.CL)

We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying the MLP modules, one can control the language model's generation of factual information. The plausibility of the KN thesis has been demonstrated by the success of KN-inspired model editing methods (Dai et al., 2022; Meng et al., 2022).
We find that this thesis is, at best, an oversimplification. Not only have we found that we can edit the expression of certain linguistic phenomena using the same model editing methods but, through a more comprehensive evaluation, we have found that the KN thesis does not adequately explain the process of factual expression. While it is possible to argue that the MLP weights store complex patterns that are interpretable both syntactically and semantically, these patterns do not constitute "knowledge." To gain a more comprehensive understanding of the knowledge representation process, we must look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms.

[74]  arXiv:2405.02425 [pdf, other]
Title: Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website https://sites.google.com/view/vision-soccer .

[75]  arXiv:2405.02429 [pdf, other]
Title: CALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Traditional recommender systems such as matrix factorization methods rely on learning a shared dense embedding space to represent both items and user preferences. Sequence models such as RNN, GRUs, and, recently, Transformers have also excelled in the task of sequential recommendation. This task requires understanding the sequential structure present in users' historical interactions to predict the next item they may like. Building upon the success of Large Language Models (LLMs) in a variety of tasks, researchers have recently explored using LLMs that are pretrained on vast corpora of text for sequential recommendation. To use LLMs in sequential recommendations, both the history of user interactions and the model's prediction of the next item are expressed in text form. We propose CALRec, a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss: the LLM is first finetuned on a data mixture from multiple domains followed by another round of target domain finetuning. Our model significantly outperforms many state-of-the-art baselines (+37% in Recall@1 and +24% in NDCG@10) and systematic ablation studies reveal that (i) both stages of finetuning are crucial, and, when combined, we achieve improved performance, and (ii) contrastive alignment is effective among the target domains explored in our experiments.

[76]  arXiv:2405.02430 [pdf, ps, other]
Title: How to generate all possible rational Wilf-Zeilberger forms?
Subjects: Symbolic Computation (cs.SC)

Wilf-Zeilberger pairs are fundamental in the algorithmic theory of Wilf and Zeilberger for computer-generated proofs of combinatorial identities. Wilf-Zeilberger forms are their high-dimensional generalizations, which can be used for proving and discovering convergence acceleration formulas. This paper presents a structural description of all possible rational such forms, which can be viewed as an additive analog of the classical Ore-Sato theorem. Based on this analog, we show a structural decomposition of so-called multivariate hyperarithmetic terms, which extend multivariate hypergeometric terms to the additive setting.

[77]  arXiv:2405.02431 [pdf, other]
Title: Delphi: Efficient Asynchronous Approximate Agreement for Distributed Oracles
Comments: 14 pages, 8 figures, Accepted to DSN 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

Agreement protocols are crucial in various emerging applications, spanning from distributed (blockchains) oracles to fault-tolerant cyber-physical systems. In scenarios where sensor/oracle nodes measure a common source, maintaining output within the convex range of correct inputs, known as convex validity, is imperative. Present asynchronous convex agreement protocols employ either randomization, incurring substantial computation overhead, or approximate agreement techniques, leading to high $\mathcal{\tilde{O}}(n^3)$ communication for an $n$-node system.
This paper introduces Delphi, a deterministic protocol with $\mathcal{\tilde{O}}(n^2)$ communication and minimal computation overhead. Delphi assumes that honest inputs are bounded, except with negligible probability, and integrates agreement primitives from literature with a novel weighted averaging technique. Experimental results highlight Delphi's superior performance, showcasing a significantly lower latency compared to state-of-the-art protocols. Specifically, for an $n=160$-node system, Delphi achieves an 8x and 3x improvement in latency within CPS and AWS environments, respectively.

[78]  arXiv:2405.02435 [pdf, other]
Title: Bridging the Gap: A Study of AI-based Vulnerability Management between Industry and Academia
Comments: Accepted by IEEE/IFIP International Conference on Dependable Systems and Networks, Industry Track, 2024
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Recent research advances in Artificial Intelligence (AI) have yielded promising results for automated software vulnerability management. AI-based models are reported to greatly outperform traditional static analysis tools, indicating a substantial workload relief for security engineers. However, the industry remains very cautious and selective about integrating AI-based techniques into their security vulnerability management workflow. To understand the reasons, we conducted a discussion-based study, anchored in the authors' extensive industrial experience and keen observations, to uncover the gap between research and practice in this field. We empirically identified three main barriers preventing the industry from adopting academic models, namely, complicated requirements of scalability and prioritization, limited customization flexibility, and unclear financial implications. Meanwhile, research works are significantly impacted by the lack of extensive real-world security data and expertise. We proposed a set of future directions to help better understand industry expectations, improve the practical usability of AI-based security vulnerability research, and drive a synergistic relationship between industry and academia.

[79]  arXiv:2405.02437 [pdf, other]
Title: FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation, suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms assume a trusted central curator and do not extend to federated settings. Naively combining the secure and DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves four orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by taking advantage of constrained clustering techniques.

[80]  arXiv:2405.02438 [pdf, other]
Title: ROS2swarm - A ROS 2 Package for Swarm Robot Behaviors
Comments: published in 2022 International Conference on Robotics and Automation (ICRA)
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)

Developing reusable software for mobile robots is still challenging. Even more so for swarm robots, despite the desired simplicity of the robot controllers. Prototyping and experimenting are difficult due to the multi-robot setting and often require robot-robot communication. Also, the diversity of swarm robot hardware platforms increases the need for hardware-independent software concepts. The main advantages of the commonly used robot software architecture ROS 2 are modularity and platform independence. We propose a new ROS 2 package, ROS2swarm, for applications of swarm robotics that provides a library of ready-to-use swarm behavioral primitives. We show the successful application of our approach on three different platforms, the TurtleBot3 Burger, the TurtleBot3 Waffle Pi, and the Jackal UGV, and with a set of different behavioral primitives, such as aggregation, dispersion, and collective decision-making. The proposed approach is easy to maintain, extendable, and has good potential for simplifying swarm robotics experiments in future applications.

[81]  arXiv:2405.02441 [pdf, ps, other]
Title: Learning minimal volume uncertainty ellipsoids
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical cases, we propose a differentiable optimization approach for approximately computing the optimal ellipsoids using a neural network with proper calibration. Compared to existing methods, our network requires less storage and less computations in inference time, leading to accurate yet smaller ellipsoids. We demonstrate these advantages on four real-world localization datasets.

[82]  arXiv:2405.02447 [pdf, other]
Title: Active flux methods for hyperbolic conservation laws -- flux vector splitting and bound-preservation: One-dimensional case
Comments: 27 pages, 12 figures
Subjects: Numerical Analysis (math.NA)

The active flux (AF) method is a compact high-order finite volume method that evolves cell averages and point values at cell interfaces independently. Within the method of lines framework, the point value can be updated based on Jacobian splitting (JS), incorporating the upwind idea. However, such JS-based AF methods encounter transonic issues for nonlinear problems due to inaccurate upwind direction estimation. This paper proposes to use flux vector splitting for the point value update, offering a natural and uniform remedy to the transonic issue. To improve robustness, this paper also develops bound-preserving (BP) AF methods for one-dimensional hyperbolic conservation laws. Two cases are considered: preservation of the maximum principle for the scalar case, and preservation of positive density and pressure for the compressible Euler equations. The update of the cell average in high-order AF methods is rewritten as a convex combination of using the original high-order fluxes and robust low-order (local Lax-Friedrichs or Rusanov) fluxes, and the desired bounds are enforced by choosing the right amount of low-order fluxes. A similar blending strategy is used for the point value update. Several challenging benchmark tests are conducted to verify the accuracy, BP properties, and shock-capturing ability of the methods.

[83]  arXiv:2405.02454 [pdf, other]
Title: What is Sentiment Meant to Mean to Language Models?
Authors: Michael Burnham
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Sentiment analysis is one of the most widely used techniques in text analysis. Recent advancements with Large Language Models have made it more accurate and accessible than ever, allowing researchers to classify text with only a plain English prompt. However, "sentiment" entails a wide variety of concepts depending on the domain and tools used. It has been used to mean emotion, opinions, market movements, or simply a general ``good-bad'' dimension. This raises a question: What exactly are language models doing when prompted to label documents by sentiment? This paper first overviews how sentiment is defined across different contexts, highlighting that it is a confounded measurement construct in that it entails multiple variables, such as emotional valence and opinion, without disentangling them. I then test three language models across two data sets with prompts requesting sentiment, valence, and stance classification. I find that sentiment labels most strongly correlate with valence labels. I further find that classification improves when researchers more precisely specify their dimension of interest rather than using the less well-defined concept of sentiment. I conclude by encouraging researchers to move beyond "sentiment" when feasible and use a more precise measurement construct.

[84]  arXiv:2405.02458 [pdf, ps, other]
Title: Controlled Query Evaluation through Epistemic Dependencies
Subjects: Artificial Intelligence (cs.AI)

In this paper, we propose the use of epistemic dependencies to express data protection policies in Controlled Query Evaluation (CQE), which is a form of confidentiality-preserving query answering over ontologies and databases. The resulting policy language goes significantly beyond those proposed in the literature on CQE so far, allowing for very rich and practically interesting forms of data protection rules. We show the expressive abilities of our framework and study the data complexity of CQE for (unions of) conjunctive queries when ontologies are specified in the Description Logic DL-Lite_R. Interestingly, while we show that the problem is in general intractable, we prove tractability for the case of acyclic epistemic dependencies by providing a suitable query rewriting algorithm. The latter result paves the way towards the implementation and practical application of this new approach to CQE.

[85]  arXiv:2405.02463 [pdf, other]
Title: Knowledge Graph Extension by Entity Type Recognition
Authors: Daqian Shi
Comments: PhD thesis
Subjects: Artificial Intelligence (cs.AI)

Knowledge graphs have emerged as a sophisticated advancement and refinement of semantic networks, and their deployment is one of the critical methodologies in contemporary artificial intelligence. The construction of knowledge graphs is a multifaceted process involving various techniques, where researchers aim to extract the knowledge from existing resources for the construction since building from scratch entails significant labor and time costs. However, due to the pervasive issue of heterogeneity, the description diversity across different knowledge graphs can lead to mismatches between concepts, thereby impacting the efficacy of knowledge extraction. This Ph.D. study focuses on automatic knowledge graph extension, i.e., properly extending the reference knowledge graph by extracting and integrating concepts from one or more candidate knowledge graphs. We propose a novel knowledge graph extension framework based on entity type recognition. The framework aims to achieve high-quality knowledge extraction by aligning the schemas and entities across different knowledge graphs, thereby enhancing the performance of the extension. This paper elucidates three major contributions: (i) we propose an entity type recognition method exploiting machine learning and property-based similarities to enhance knowledge extraction; (ii) we introduce a set of assessment metrics to validate the quality of the extended knowledge graphs; (iii) we develop a platform for knowledge graph acquisition, management, and extension to benefit knowledge engineers practically. Our evaluation comprehensively demonstrated the feasibility and effectiveness of the proposed extension framework and its functionalities through quantitative experiments and case studies.

[86]  arXiv:2405.02466 [pdf, ps, other]
Title: ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models
Comments: This is the author's pre-print version of the work. It is posted here for your personal use. Not for redistribution
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Large language models (LLMs) have attracted significant attention in recent years. Due to their "Large" nature, training LLMs from scratch consumes immense computational resources. Since several major players in the artificial intelligence (AI) field have open-sourced their original LLMs, an increasing number of individual researchers and smaller companies are able to build derivative LLMs based on these open-sourced models at much lower costs. However, this practice opens up possibilities for unauthorized use or reproduction that may not comply with licensing agreements, and deriving models can change the model's behavior, thus complicating the determination of model ownership. Current copyright protection schemes for LLMs are either designed for white-box settings or require additional modifications to the original model, which restricts their use in real-world settings.
In this paper, we propose ProFLingo, a black-box fingerprinting-based copyright protection scheme for LLMs. ProFLingo generates adversarial examples (AEs) that can represent the unique decision boundary characteristics of an original model, thereby establishing unique fingerprints. Our scheme checks the effectiveness of these adversarial examples on a suspect model to determine whether it has been derived from the original model. ProFLingo offers a non-invasive approach, which neither requires knowledge of the suspect model nor modifications to the base model or its training process. To the best of our knowledge, our method represents the first black-box fingerprinting technique for copyright protection for LLMs. Our source code and generated AEs are available at: https://github.com/hengvt/ProFLingo_arXiv.

[87]  arXiv:2405.02472 [pdf, other]
Title: Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models
Authors: Michael Burnham
Subjects: Computation and Language (cs.CL)

This paper introduces "Semantic Scaling," a novel method for ideal point estimation from text. I leverage large language models to classify documents based on their expressed stances and extract survey-like data. I then use item response theory to scale subjects from these data. Semantic Scaling significantly improves on existing text-based scaling methods, and allows researchers to explicitly define the ideological dimensions they measure. This represents the first scaling approach that allows such flexibility outside of survey instruments and opens new avenues of inquiry for populations difficult to survey. Additionally, it works with documents of varying length, and produces valid estimates of both mass and elite ideology. I demonstrate that the method can differentiate between policy preferences and in-group/out-group affect. Among the public, Semantic Scaling out-preforms Tweetscores according to human judgement; in Congress, it recaptures the first dimension DW-NOMINATE while allowing for greater flexibility in resolving construct validity challenges.

[88]  arXiv:2405.02475 [pdf, other]
Title: Generalizing Orthogonalization for Models with Non-linearities
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO); Methodology (stat.ME)

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

[89]  arXiv:2405.02476 [pdf, other]
Title: SSI4IoT: Unlocking the Potential of IoT Tailored Self-Sovereign Identity
Subjects: Emerging Technologies (cs.ET); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

The emerging Self-Sovereign Identity (SSI) techniques, such as Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), move control of digital identity from conventional identity providers to individuals and lay down the foundation for people, organizations, and things establishing rich digital relationship. The existing applications of SSI mainly focus on creating person-to-person and person-to-service relationships, whereas person-to-device and device-to-device interactions have been largely overlooked. In this paper, we close this gap by identifying a number of key challenges of applying SSI to the Internet of Things (IoT) and providing a comprehensive taxonomy and usage of VCs in the IoT context with respect to their validity period, trust and interoperability level, and scope of usage. The life-cycle management of VCs as well as various optimization techniques for realizing SSI in IoT environments are also addressed in great detail. This work is a noteworthy step towards massive adoption of SSI for securing existing and future IoT applications in practice.

[90]  arXiv:2405.02478 [pdf, other]
Title: Continuous Learned Primal Dual
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Neural ordinary differential equations (Neural ODEs) propose the idea that a sequence of layers in a neural network is just a discretisation of an ODE, and thus can instead be directly modelled by a parameterised ODE. This idea has had resounding success in the deep learning literature, with direct or indirect influence in many state of the art ideas, such as diffusion models or time dependant models. Recently, a continuous version of the U-net architecture has been proposed, showing increased performance over its discrete counterpart in many imaging applications and wrapped with theoretical guarantees around its performance and robustness. In this work, we explore the use of Neural ODEs for learned inverse problems, in particular with the well-known Learned Primal Dual algorithm, and apply it to computed tomography (CT) reconstruction.

[91]  arXiv:2405.02479 [pdf, other]
Title: Deterministic Sub-exponential Algorithm for Discounted-sum Games with Unary Weights
Subjects: Computer Science and Game Theory (cs.GT)

Turn-based discounted-sum games are two-player zero-sum games played on finite directed graphs. The vertices of the graph are partitioned between player 1 and player 2. Plays are infinite walks on the graph where the next vertex is decided by a player that owns the current vertex. Each edge is assigned an integer weight and the payoff of a play is the discounted-sum of the weights of the play. The goal of player 1 is to maximize the discounted-sum payoff against the adversarial player 2. These games lie in NP and coNP and are among the rare combinatorial problems that belong to this complexity class and the existence of a polynomial-time algorithm is a major open question. Since breaking the general exponential barrier has been a challenging problem, faster parameterized algorithms have been considered. If the discount factor is expressed in unary, then discounted-sum games can be solved in polynomial time. However, if the discount factor is arbitrary (or expressed in binary), but the weights are in unary, none of the existing approaches yield a sub-exponential bound. Our main result is a new analysis technique for a classical algorithm (namely, the strategy iteration algorithm) that present a new runtime bound which is $n^{O ( W^{1/4} \sqrt{n} )}$, for game graphs with $n$ vertices and maximum absolute weight of at most $W$. In particular, our result yields a deterministic sub-exponential bound for games with weights that are constant or represented in unary.

[92]  arXiv:2405.02481 [pdf, other]
Title: Proximal Curriculum with Task Correlations for Deep Reinforcement Learning
Comments: IJCAI'24 paper (longer version)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Curriculum design for reinforcement learning (RL) can speed up an agent's learning process and help it learn to perform well on complex tasks. However, existing techniques typically require domain-specific hyperparameter tuning, involve expensive optimization procedures for task selection, or are suitable only for specific learning objectives. In this work, we consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks. We base our curriculum design on the Zone of Proximal Development concept, which has proven to be effective in accelerating the learning process of RL agents for uniform distribution over all tasks. We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's learning toward the target distribution via leveraging task correlations. We theoretically justify the task selection strategy of ProCuRL-Target by analyzing a simple learning setting with REINFORCE learner model. Our experimental results across various domains with challenging target task distributions affirm the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.

[93]  arXiv:2405.02484 [pdf, other]
Title: Hierarchically Decentralized Heterogeneous Multi-Robot Task Allocation System
Subjects: Robotics (cs.RO)

With plans to send humans to the Moon and further, the supply of resources like oxygen, water, fuel, etc., can be satiated by performing In-Situ Resource Utilization (ISRU), where resources from the extra-terrestrial body are extracted to be utilized. These ISRU missions can be carried out by a Multi-Robot System (MRS). In this research, a high-level auction- based Multi-Robot Task Allocation (MRTA) system is developed for coordinating tasks amongst multiple robots with distinct capabilities. A hierarchical decentralized coordination architecture is implemented in this research to allocate the tasks amongst the robots for achieving intentional cooperation in the Multi-Robot System (MRS). 3 different policies are formulated that govern how robots should act in the multiple auction situations of the auction-based task allocation system proposed in this research, and their performance is evaluated in a 2D simulation called pyrobosim using ROS2. The decentralized coordination architecture and the auction-based MRTA make the MRS highly scalable, reliable, flexible, and robust.

[94]  arXiv:2405.02485 [pdf, other]
Title: A Survey of Few-Shot Learning for Biomedical Time Series
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Advancements in wearable sensor technologies and the digitization of medical records have contributed to the unprecedented ubiquity of biomedical time series data. Data-driven models have tremendous potential to assist clinical diagnosis and improve patient care by improving long-term monitoring capabilities, facilitating early disease detection and intervention, as well as promoting personalized healthcare delivery. However, accessing extensively labeled datasets to train data-hungry deep learning models encounters many barriers, such as long-tail distribution of rare diseases, cost of annotation, privacy and security concerns, data-sharing regulations, and ethical considerations. An emerging approach to overcome the scarcity of labeled data is to augment AI methods with human-like capabilities to leverage past experiences to learn new tasks with limited examples, called few-shot learning. This survey provides a comprehensive review and comparison of few-shot learning methods for biomedical time series applications. The clinical benefits and limitations of such methods are discussed in relation to traditional data-driven approaches. This paper aims to provide insights into the current landscape of few-shot learning for biomedical time series and its implications for future research and applications.

[95]  arXiv:2405.02486 [pdf, other]
Title: Concurrent Stochastic Games with Stateful-discounted and Parity Objectives: Complexity and Algorithms
Subjects: Computer Science and Game Theory (cs.GT)

We study two-player zero-sum concurrent stochastic games with finite state and action space played for an infinite number of steps. In every step, the two players simultaneously and independently choose an action. Given the current state and the chosen actions, the next state is obtained according to a stochastic transition function. An objective is a measurable function on plays (or infinite trajectories) of the game, and the value for an objective is the maximal expectation that the player can guarantee against the adversarial player. We consider: (a) stateful-discounted objectives, which are similar to the classical discounted-sum objectives, but states are associated with different discount factors rather than a single discount factor; and (b) parity objectives, which are a canonical representation for $\omega$-regular objectives. For stateful-discounted objectives, given an ordering of the discount factors, the limit value is the limit of the value of the stateful-discounted objectives, as the discount factors approach zero according to the given order.
The computational problem we consider is the approximation of the value within an arbitrary additive error. The above problem is known to be in EXPSPACE for the limit value of stateful-discounted objectives and in PSPACE for parity objectives. The best-known algorithms for both the above problems are at least exponential time, with an exponential dependence on the number of states and actions. Our main results for the value approximation problem for the limit value of stateful-discounted objectives and parity objectives are as follows: (a) we establish TFNP[NP] complexity; and (b) we present algorithms that improve the dependency on the number of actions in the exponent from linear to logarithmic. In particular, if the number of states is constant, our algorithms run in polynomial time.

[96]  arXiv:2405.02487 [pdf, other]
Title: Stable Distributed Online Feedback Optimization for Distribution System Voltage Regulation
Subjects: Systems and Control (eess.SY)

We investigate the distributed voltage regulation problem in distribution systems employing online feedback optimization and short-range communication between physical neighbours. We show that a two-metric approach can be unstable. As a remedy, we propose a nested feedback optimization strategy. Simulation results reveal that while the two-metric approach fails to regulate voltages, the proposed approach achieves even less voltage limit violations than its centralized counterpart.

[97]  arXiv:2405.02490 [pdf, ps, other]
Title: Software Fairness Debt
Subjects: Software Engineering (cs.SE)

As software systems continue to play a significant role in modern society, ensuring their fairness has become a critical concern in software engineering. Motivated by this scenario, this paper focused on exploring the multifaceted nature of bias in software systems, aiming to provide a comprehensive understanding of its origins, manifestations, and impacts. Through a scoping study, we identified the primary causes of fairness deficiency in software development and highlighted their adverse effects on individuals and communities, including instances of discrimination and the perpetuation of inequalities. Our investigation culminated in the introduction of the concept of software fairness debt, which complements the notions of technical and social debt, encapsulating the accumulation of biases in software engineering practices while emphasizing the societal ramifications of bias embedded within software systems. Our study contributes to a deeper understanding of fairness in software engineering and paves the way for the development of more equitable and socially responsible software systems.

[98]  arXiv:2405.02492 [pdf, other]
Title: Investigating the Generalizability of Assistive Robots Models over Various Tasks
Comments: Accepted to 2024 21st International Conference on Ubiquitous Robots (UR)
Subjects: Robotics (cs.RO)

In the domain of assistive robotics, the significance of effective modeling is well acknowledged. Prior research has primarily focused on enhancing model accuracy or involved the collection of extensive, often impractical amounts of data. While improving individual model accuracy is beneficial, it necessitates constant remodeling for each new task and user interaction. In this paper, we investigate the generalizability of different modeling methods. We focus on constructing the dynamic model of an assistive exoskeleton using six data-driven regression algorithms. Six tasks are considered in our experiments, including horizontal, vertical, diagonal from left leg to the right eye and the opposite, as well as eating and pushing. We constructed thirty-six unique models applying different regression methods to data gathered from each task. Each trained model's performance was evaluated in a cross-validation scenario, utilizing five folds for each dataset. These trained models are then tested on the other tasks that the model is not trained with. Finally the models in our study are assessed in terms of generalizability. Results show the superior generalizability of the task model performed along the horizontal plane, and decision tree based algorithms.

[99]  arXiv:2405.02495 [pdf, other]
Title: Design of Fuzzy Logic Parameter Tuners for Upper-Limb Assistive Robots
Comments: 4 pages, 5 figures, Accepted to 2024 21st International Conference on Ubiquitous Robots (UR)
Subjects: Robotics (cs.RO)

Assistive Exoskeleton Robots are helping restore functions to people suffering from underlying medical conditions. These robots require precise tuning of hyper-parameters to feel natural to the user. The device hyper-parameters often need to be re-tuned from task to task, which can be tedious and require expert knowledge. To address this issue, we develop a set of fuzzy logic controllers that can dynamically tune robot gain parameters to adapt its sensitivity to the user's intention determined from muscle activation. The designed fuzzy controllers benefit from a set of expert-defined rules and do not rely on extensive amounts of training data. We evaluate the designed controllers with three different tasks and compare our results against the manually tuned system. Our preliminary results show that our controllers reduce the amount of fighting between the device and the human, measured using a set of pressure sensors.

[100]  arXiv:2405.02499 [pdf, other]
Title: DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands
Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.

[101]  arXiv:2405.02501 [pdf, other]
Title: Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
Comments: Paper accepted at ICML 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.

[102]  arXiv:2405.02503 [pdf, other]
Title: Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models
Comments: 10 pages, 10 figures, accepted at SIGIR 2024 as perspective paper
Subjects: Information Retrieval (cs.IR)

Neural models have demonstrated remarkable performance across diverse ranking tasks. However, the processes and internal mechanisms along which they determine relevance are still largely unknown. Existing approaches for analyzing neural ranker behavior with respect to IR properties rely either on assessing overall model behavior or employing probing methods that may offer an incomplete understanding of causal mechanisms. To provide a more granular understanding of internal model decision-making processes, we propose the use of causal interventions to reverse engineer neural rankers, and demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms within a ranking model. We identify a group of attention heads that detect duplicate tokens in earlier layers of the model, then communicate with downstream heads to compute overall document relevance. More generally, we propose that this style of mechanistic analysis opens up avenues for reverse engineering the processes neural retrieval models use to compute relevance. This work aims to initiate granular interpretability efforts that will not only benefit retrieval model development and training, but ultimately ensure safer deployment of these models.

[103]  arXiv:2405.02506 [pdf, ps, other]
Title: Big Data, Big Decisions Choosing the Right Database
Authors: Fahmy Hassan
Subjects: Databases (cs.DB)

In the burgeoning era of big data, selecting the optimal database solution has become a critical decision for organizations across every industry. Big data demands a powerful database solution. Traditionally, SQL Database, Database ruled, offering a structured approach familiar to many organizations. However, big data's complexity and unstructured nature challenge SQL Database's limitations. Enter NoSQL Database: flexible and scalable, making them ideal for big data's ever-changing nature. We'll explore the key differences between SQL and NoSQL Database. Performance-wise, SQL Database shines for structured queries. Its standardized language (SQL) ensures data consistency and complex analysis. But for big data's unstructured formats, this rigidity becomes a hurdle. NoSQL offers a welcome contrast. Its flexible schema allows for diverse data formats and evolving structures, perfect for undefined or frequently changing data models. Additionally, NoSQL boasts superior horizontal scalability, distributing data across multiple servers for cost-effective growth. Understanding these key differentiators empowers organizations to choose the optimal database for their big data needs.

[104]  arXiv:2405.02508 [pdf, other]
Title: Rasterized Edge Gradients: Handling Discontinuities Differentiably
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for rasterization-based differentiable renderers. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy, allowing for a straightforward, effective, and performant solution. We introduce a novel concept of micro-edges, which allows us to treat the rasterized images as outcomes of a differentiable, continuous process aligned with the inherently non-differentiable, discrete-pixel rasterization. This technique eliminates the necessity for rendering approximations or other modifications to the forward pass, preserving the integrity of the rendered image, which makes it applicable to rasterized masks, depth, and normals images where filtering is prohibitive. Utilizing micro-edges simplifies gradient interpretation at discontinuities and enables handling of geometry intersections, offering an advantage over the prior art. We showcase our method in dynamic human head scene reconstruction, demonstrating effective handling of camera images and segmentation masks.

[105]  arXiv:2405.02509 [pdf, other]
Title: Implicit Neural Representations for Robust Joint Sparse-View CT Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Computed Tomography (CT) is pivotal in industrial quality control and medical diagnostics. Sparse-view CT, offering reduced ionizing radiation, faces challenges due to its under-sampled nature, leading to ill-posed reconstruction problems. Recent advancements in Implicit Neural Representations (INRs) have shown promise in addressing sparse-view CT reconstruction. Recognizing that CT often involves scanning similar subjects, we propose a novel approach to improve reconstruction quality through joint reconstruction of multiple objects using INRs. This approach can potentially leverage both the strengths of INRs and the statistical regularities across multiple objects. While current INR joint reconstruction techniques primarily focus on accelerating convergence via meta-initialization, they are not specifically tailored to enhance reconstruction quality. To address this gap, we introduce a novel INR-based Bayesian framework integrating latent variables to capture the inter-object relationships. These variables serve as a dynamic reference throughout the optimization, thereby enhancing individual reconstruction fidelity. Our extensive experiments, which assess various key factors such as reconstruction quality, resistance to overfitting, and generalizability, demonstrate significant improvements over baselines in common numerical metrics. This underscores a notable advancement in CT reconstruction methods.

[106]  arXiv:2405.02510 [pdf, other]
Title: Low-cost sensors and circuits for plasma education: characterizing power and illuminance
Subjects: Systems and Control (eess.SY); Plasma Physics (physics.plasm-ph)

Industrial applications of plasma have significantly increased beyond semiconductor manufacturing in recent years. This necessitates training a skilled workforce in plasma science and technology. However, an essential challenge to this end stems from the high cost of plasma devices and diagnostics. The limited access to plasma devices has hindered plasma education, particularly in the least developed countries. To this end, this paper demonstrates how low-cost sensors and circuits can be developed to enable inexpensive plasma experiments in laboratory environments. In particular, we show how to measure high voltage, current, and power from a cold-atmospheric plasma discharge. Additionally, we develop a low-cost illuminance sensor and demonstrate how it can be used to estimate the corresponding plasma power. The low-cost sensors and electronics presented in this paper can aid educators in characterizing plasma power versus plasma illuminance.

[107]  arXiv:2405.02511 [pdf, other]
Title: Incremental Volt/Var Control for Distribution Networks via Chance-Constrained Optimization
Subjects: Systems and Control (eess.SY)

This paper considers an incremental Volt/Var control scheme for distribution systems with high integration of inverter-interfaced distributed generation (such as photovoltaic systems). The incremental Volt/Var controller is implemented with the objective of minimizing reactive power usage while maintaining voltages within safe limits sufficiently often. To this end, the parameters of the incremental Volt/Var controller are obtained by solving a chance-constrained optimization problem, where constraints are designed to ensure that voltage violations do not occur more often than a pre-specified probability. This approach leads to cost savings in a controlled, predictable way, while still avoiding significant over- or under-voltage issues. The proposed chance-constrained problem is solved using a successive convex approximation method. Once the gains are broadcast to the inverters, no additional communication is required since the controller is implemented locally at the inverters. The proposed method is successfully tested on a low-voltage 42-nodes network.

[108]  arXiv:2405.02512 [pdf, other]
Title: Spatio-Temporal SwinMAE: A Swin Transformer based Multiscale Representation Learner for Temporal Satellite Imagery
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Currently, the foundation models represented by large language models have made dramatic progress and are used in a very wide range of domains including 2D and 3D vision. As one of the important application domains of foundation models, earth observation has attracted attention and various approaches have been developed. When considering earth observation as a single image capture, earth observation imagery can be processed as an image with three or more channels, and when it comes with multiple image captures of different timestamps at one location, the temporal observation can be considered as a set of continuous image resembling video frames or medical SCAN slices. This paper presents Spatio-Temporal SwinMAE (ST-SwinMAE), an architecture which particularly focuses on representation learning for spatio-temporal image processing. Specifically, it uses a hierarchical Masked Auto-encoder (MAE) with Video Swin Transformer blocks. With the architecture, we present a pretrained model named Degas 100M as a geospatial foundation model. Also, we propose an approach for transfer learning with Degas 100M, which both pretrained encoder and decoder of MAE are utilized with skip connections added between them to achieve multi-scale information communication, forms an architecture named Spatio-Temporal SwinUNet (ST-SwinUNet). Our approach shows significant improvements of performance over existing state-of-the-art of foundation models. Specifically, for transfer learning of the land cover downstream task on the PhilEO Bench dataset, it shows 10.4\% higher accuracy compared with other geospatial foundation models on average.

[109]  arXiv:2405.02515 [pdf, other]
Title: SR4ZCT: Self-supervised Through-plane Resolution Enhancement for CT Images with Arbitrary Resolution and Overlap
Comments: MLMI2023
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Computed tomography (CT) is a widely used non-invasive medical imaging technique for disease diagnosis. The diagnostic accuracy is often affected by image resolution, which can be insufficient in practice. For medical CT images, the through-plane resolution is often worse than the in-plane resolution and there can be overlap between slices, causing difficulties in diagnoses. Self-supervised methods for through-plane resolution enhancement, which train on in-plane images and infer on through-plane images, have shown promise for both CT and MRI imaging. However, existing self-supervised methods either neglect overlap or can only handle specific cases with fixed combinations of resolution and overlap. To address these limitations, we propose a self-supervised method called SR4ZCT. It employs the same off-axis training approach while being capable of handling arbitrary combinations of resolution and overlap. Our method explicitly models the relationship between resolutions and voxel spacings of different planes to accurately simulate training images that match the original through-plane images. We highlight the significance of accurate modeling in self-supervised off-axis training and demonstrate the effectiveness of SR4ZCT using a real-world dataset.

[110]  arXiv:2405.02517 [pdf, other]
Title: Mothman at SemEval-2024 Task 9: An Iterative System for Chain-of-Thought Prompt Optimization
Comments: 13 pages, 2 figures, to be published in Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Subjects: Computation and Language (cs.CL)

Extensive research exists on the performance of large language models on logic-based tasks, whereas relatively little has been done on their ability to generate creative solutions on lateral thinking tasks. The BrainTeaser shared task tests lateral thinking and uses adversarial datasets to prevent memorization, resulting in poor performance for out-of-the-box models. We propose a system for iterative, chain-of-thought prompt engineering which optimizes prompts using human evaluation. Using this shared task, we demonstrate our system's ability to significantly improve model performance by optimizing prompts and evaluate the input dataset.

[111]  arXiv:2405.02520 [pdf, other]
Title: TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the error propagation issue by encoding a batch of input signals with different linear combinations, which not only allows fast batched error detection but also enables error correction on-the-fly instead of recomputing. We explore two-sided checksum designs at the kernel, thread, and threadblock levels, and provide a baseline FFT implementation competitive to the state-of-the-art, closed-source cuFFT. We demonstrate a kernel fusion strategy to mitigate and overlap the computation/memory overhead introduced by fault tolerance with underlying FFT computation. We present a template-based code generation strategy to reduce development costs and support a wide range of input sizes and data types. Experimental results on an NVIDIA A100 server GPU and a Tesla Turing T4 GPU demonstrate TurboFFT offers a competitive or superior performance compared to the closed-source library cuFFT. TurboFFT only incurs a minimum overhead (7\% to 15\% on average) compared to cuFFT, even under hundreds of error injections per minute for both single and double precision. TurboFFT achieves a 23\% improvement compared to existing fault tolerance FFT schemes.

[112]  arXiv:2405.02522 [pdf, ps, other]
Title: New contexts, old heuristics: How young people in India and the US trust online content in the age of generative AI
Comments: 14 pages
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

We conducted an in-person ethnography in India and the US to investigate how young people (18-24) trusted online content, with a focus on generative AI (GenAI). We had four key findings about how young people use GenAI and determine what to trust online. First, when online, we found participants fluidly shifted between mindsets and emotional states, which we term "information modes." Second, these information modes shaped how and why participants trust GenAI and how they applied literacy skills. In the modes where they spent most of their time, they eschewed literacy skills. Third, with the advent of GenAI, participants imported existing trust heuristics from familiar online contexts into their interactions with GenAI. Fourth, although study participants had reservations about GenAI, they saw it as a requisite tool to adopt to keep up with the times. Participants valued efficiency above all else, and used GenAI to further their goals quickly at the expense of accuracy. Our findings suggest that young people spend the majority of their time online not concerned with truth because they are seeking only to pass the time. As a result, literacy interventions should be designed to intervene at the right time, to match users' distinct information modes, and to work with their existing fact-checking practices.

[113]  arXiv:2405.02525 [pdf, other]
Title: RLStop: A Reinforcement Learning Stopping Method for TAR
Comments: Accepted at SIGIR 2024
Subjects: Information Retrieval (cs.IR)

We present RLStop, a novel Technology Assisted Review (TAR) stopping rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark datasets (CLEF e-Health, TREC Total Recall, and Reuters RCV1) demonstrated that RLStop substantially reduces the workload required to screen a document collection for relevance. RLStop outperforms a wide range of alternative approaches, achieving performance close to the maximum possible for the task under some circumstances.

[114]  arXiv:2405.02528 [pdf, other]
Title: GigSense: An LLM-Infused Tool forWorkers' Collective Intelligence
Journal-ref: CI: ACM Collective Intelligence Conference 2024
Subjects: Human-Computer Interaction (cs.HC)

Collective intelligence among gig workers yields considerable advantages, including improved information exchange, deeper social bonds, and stronger advocacy for better labor conditions. Especially as it enables workers to collaboratively pinpoint shared challenges and devise optimal strategies for addressing these issues. However, enabling collective intelligence remains challenging, as existing tools often overestimate gig workers' available time and uniformity in analytical reasoning. To overcome this, we introduce GigSense, a tool that leverages large language models alongside theories of collective intelligence and sensemaking. GigSense enables gig workers to rapidly understand and address shared challenges effectively, irrespective of their diverse backgrounds. Our user study showed that GigSense users outperformed those using a control interface in problem identification and generated solutions more quickly and of higher quality, with better usability experiences reported. GigSense not only empowers gig workers but also opens up new possibilities for supporting workers more broadly, demonstrating the potential of large language model interfaces to enhance collective intelligence efforts in the evolving workplace.

[115]  arXiv:2405.02534 [pdf, other]
Title: A Multi-Domain Multi-Task Approach for Feature Selection from Bulk RNA Datasets
Subjects: Machine Learning (cs.LG); Genomics (q-bio.GN)

In this paper a multi-domain multi-task algorithm for feature selection in bulk RNAseq data is proposed. Two datasets are investigated arising from mouse host immune response to Salmonella infection. Data is collected from several strains of collaborative cross mice. Samples from the spleen and liver serve as the two domains. Several machine learning experiments are conducted and the small subset of discriminative across domains features have been extracted in each case. The algorithm proves viable and underlines the benefits of across domain feature selection by extracting new subset of discriminative features which couldn't be extracted only by one-domain approach.

[116]  arXiv:2405.02537 [pdf, other]
Title: A Robust Data-Driven Iterative Control Method for Linear Systems with Bounded Disturbances
Authors: Kaijian Hu, Tao Liu
Subjects: Systems and Control (eess.SY)

This paper proposes a new robust data-driven control method for linear systems with bounded disturbances, where the system model and disturbances are unknown. Due to disturbances, accurately determining the true system becomes challenging using the collected dataset. Therefore, instead of designing controllers directly for the unknown true system, an available approach is to design controllers for all systems compatible with the dataset. To overcome the limitations of using a single dataset and benefit from collecting more data, multiple datasets are employed in this paper. Furthermore, a new iterative method is developed to address the challenges of using multiple datasets. Based on this method, this paper develops an offline and online robust data-driven iterative control method, respectively. Compared to the existing robust data-driven controller method, both proposed control methods iteratively utilize multiple datasets in the controller design process. This allows for the incorporation of numerous datasets, potentially reducing the conservativeness of the designed controller. Particularly, the online controller is iteratively designed by continuously incorporating online collected data into the historical data to construct new datasets. Lastly, the effectiveness of the proposed methods is demonstrated using a batch reactor.

[117]  arXiv:2405.02538 [pdf, other]
Title: AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.

[118]  arXiv:2405.02543 [pdf, other]
Title: Exploring Extreme Quantization in Spiking Language Models
Subjects: Neural and Evolutionary Computing (cs.NE)

Despite the growing prevalence of large language model (LLM) architectures, a crucial concern persists regarding their energy and power consumption, which still lags far behind the remarkable energy efficiency of the human brain. Recent strides in spiking language models (LM) and transformer architectures aim to address this concern by harnessing the spiking activity of biological neurons to enhance energy/power efficiency. Doubling down on the principles of model quantization and energy efficiency, this paper proposes the development of a novel binary/ternary (1/1.58-bit) spiking LM architecture. Achieving scalability comparable to a deep spiking LM architecture is facilitated by an efficient knowledge distillation technique, wherein knowledge from a non-spiking full-precision "teacher" model is transferred to an extremely weight quantized spiking "student" LM. Our proposed model represents a significant advancement as the first-of-its-kind 1/1.58-bit spiking LM, and its performance is rigorously evaluated on multiple text classification tasks of the GLUE benchmark.

[119]  arXiv:2405.02544 [pdf, ps, other]
Title: A Novel Endorsement Protocol to Secure BFT-Based Consensus in Permissionless Blockchain
Comments: Accepted at IEEE Wireless Communications and Networking Conference (WCNC), 2024
Subjects: Cryptography and Security (cs.CR)

Permissionless blockchain technology offers numerous potential benefits for decentralised applications, such as security, transparency, and openness. BFT-based consensus mechanisms are widely adopted in the permissioned blockchain to meet the high scalability requirements of the network. Sybil attacks are one of the most potential threats when applying BFT-based consensus mechanisms in permissionless blockchain due to the lack of effective verification mechanisms for participants' identities. This paper presents a novel endorsement-based bootstrapping protocol with a signature algorithm that offers a streamlined, scalable identity endorsement and verification process. This approach effectively safeguards the BFT-based consensus mechanism against Sybil attacks. Using our proposed method, we have conducted thorough security analyses and simulation experiments to assess security, robustness, and scalability advantages in large-scale networks. Our results demonstrate that the scheme can effectively address the identity verification challenges when applying BFT-based consensus in a permissionless blockchain.

[120]  arXiv:2405.02546 [pdf, other]
Title: Scaling SNNs Trained Using Equilibrium Propagation to Convolutional Architectures
Subjects: Neural and Evolutionary Computing (cs.NE)

Equilibrium Propagation (EP) is a biologically plausible local learning algorithm initially developed for convergent recurrent neural networks (RNNs), where weight updates rely solely on the connecting neuron states across two phases. The gradient calculations in EP have been shown to approximate the gradients computed by Backpropagation Through Time (BPTT) when an infinitesimally small nudge factor is used. This property makes EP a powerful candidate for training Spiking Neural Networks (SNNs), which are commonly trained by BPTT. However, in the spiking domain, previous studies on EP have been limited to architectures involving few linear layers. In this work, for the first time we provide a formulation for training convolutional spiking convergent RNNs using EP, bridging the gap between spiking and non-spiking convergent RNNs. We demonstrate that for spiking convergent RNNs, there is a mismatch in the maximum pooling and its inverse operation, leading to inaccurate gradient estimation in EP. Substituting this with average pooling resolves this issue and enables accurate gradient estimation for spiking convergent RNNs. We also highlight the memory efficiency of EP compared to BPTT. In the regime of SNNs trained by EP, our experimental results indicate state-of-the-art performance on the MNIST and FashionMNIST datasets, with test errors of 0.97% and 8.89%, respectively. These results are comparable to those of convergent RNNs and SNNs trained by BPTT. These findings underscore EP as an optimal choice for on-chip training and a biologically-plausible method for computing error gradients.

[121]  arXiv:2405.02548 [pdf, other]
Title: CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls
Journal-ref: Bensaoud, A., & Kalita, J. (2024). CNN-LSTM and transfer learning models for malware classification based on opcodes and API calls. Knowledge-Based Systems, 111543
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.

[122]  arXiv:2405.02556 [pdf, other]
Title: Few-Shot Fruit Segmentation via Transfer Learning
Comments: To be published in the 2024 IEEE International Conference on Robotics and Automation (ICRA)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.

[123]  arXiv:2405.02559 [pdf, ps, other]
Title: A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

As generative artificial intelligence (AI), particularly Large Language Models (LLMs), continues to permeate healthcare, it remains crucial to supplement traditional automated evaluations with human expert evaluation. Understanding and evaluating the generated texts is vital for ensuring safety, reliability, and effectiveness. However, the cumbersome, time-consuming, and non-standardized nature of human evaluation presents significant obstacles to the widespread adoption of LLMs in practice. This study reviews existing literature on human evaluation methodologies for LLMs within healthcare. We highlight a notable need for a standardized and consistent human evaluation approach. Our extensive literature search, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, spans publications from January 2018 to February 2024. This review provides a comprehensive overview of the human evaluation approaches used in diverse healthcare applications.This analysis examines the human evaluation of LLMs across various medical specialties, addressing factors such as evaluation dimensions, sample types, and sizes, the selection and recruitment of evaluators, frameworks and metrics, the evaluation process, and statistical analysis of the results. Drawing from diverse evaluation strategies highlighted in these studies, we propose a comprehensive and practical framework for human evaluation of generative LLMs, named QUEST: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence. This framework aims to improve the reliability, generalizability, and applicability of human evaluation of generative LLMs in different healthcare applications by defining clear evaluation dimensions and offering detailed guidelines.

[124]  arXiv:2405.02560 [pdf, other]
Title: A Pilot Study on the Comparison of Prefrontal Cortex Activities of Robotic Therapies on Elderly with Mild Cognitive Impairment
Comments: submitted to IEEE on affective computing
Subjects: Robotics (cs.RO)

Demographic shifts have led to an increase in mild cognitive impairment (MCI), and this study investigates the effects of cognitive training (CT) and reminiscence therapy (RT) conducted by humans or socially assistive robots (SARs) on prefrontal cortex activation in elderly individuals with MCI, aiming to determine the most effective therapy-modality combination for promoting cognitive function. This pilot study employs a randomized control trial (RCT) design. Additionally, the study explores the efficacy of Reminiscence Therapy (RT) in comparison to Cognitive Training (CT). Eight MCI subjects, with a mean age of 70.125 years, were randomly assigned to ``human-led'' or ``SAR-led'' groups. Utilizing Functional Near-infrared Spectroscopy (fNIRS) to measure oxy-hemoglobin concentration changes in the dorsolateral prefrontal cortex (DLPFC), the study found no significant differences in the effects of human-led and SAR-led cognitive training on DLPFC activation. However, distinct patterns emerged in memory encoding and retrieval phases between RT and CT, shedding light on the impacts of these interventions on brain activation in the context of MCI.

[125]  arXiv:2405.02561 [pdf, other]
Title: Understanding the Difficulty of Solving Cauchy Problems with PINNs
Comments: 13 pages and 18 figures
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural networks. We show that minimizing the sum of $L^2$ residual and initial condition error is not sufficient to guarantee the true solution, as this loss function does not capture the underlying dynamics. Additionally, neural networks are not capable of capturing singularities in the solutions due to the non-compactness of their image sets. This, in turn, influences the existence of global minima and the regularity of the network. We demonstrate that when the global minimum does not exist, machine precision becomes the predominant source of achievable error in practice. We also present numerical experiments in support of our theoretical claims.

[126]  arXiv:2405.02564 [pdf, ps, other]
Title: Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Human object recognition exhibits remarkable resilience in cluttered and dynamic visual environments. In contrast, despite their unparalleled performance across numerous visual tasks, Deep Neural Networks (DNNs) remain far less robust than humans, showing, for example, a surprising susceptibility to adversarial attacks involving image perturbations that are (almost) imperceptible to humans. Human object recognition likely owes its robustness, in part, to the increasingly resilient representations that emerge along the hierarchy of the ventral visual cortex. Here we show that DNNs, when guided by neural representations from a hierarchical sequence of regions in the human ventral visual stream, display increasing robustness to adversarial attacks. These neural-guided models also exhibit a gradual shift towards more human-like decision-making patterns and develop hierarchically smoother decision surfaces. Importantly, the resulting representational spaces differ in important ways from those produced by conventional smoothing methods, suggesting that such neural-guidance may provide previously unexplored robustness solutions. Our findings support the gradual emergence of human robustness along the ventral visual hierarchy and suggest that the key to DNN robustness may lie in increasing emulation of the human brain.

[127]  arXiv:2405.02568 [pdf, other]
Title: ActiveNeuS: Active 3D Reconstruction using Neural Implicit Surface Uncertainty
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Active learning in 3D scene reconstruction has been widely studied, as selecting informative training views is critical for the reconstruction. Recently, Neural Radiance Fields (NeRF) variants have shown performance increases in active 3D reconstruction using image rendering or geometric uncertainty. However, the simultaneous consideration of both uncertainties in selecting informative views remains unexplored, while utilizing different types of uncertainty can reduce the bias that arises in the early training stage with sparse inputs. In this paper, we propose ActiveNeuS, which evaluates candidate views considering both uncertainties. ActiveNeuS provides a way to accumulate image rendering uncertainty while avoiding the bias that the estimated densities can introduce. ActiveNeuS computes the neural implicit surface uncertainty, providing the color uncertainty along with the surface information. It efficiently handles the bias by using the surface information and a grid, enabling the fast selection of diverse viewpoints. Our method outperforms previous works on popular datasets, Blender and DTU, showing that the views selected by ActiveNeuS significantly improve performance.

[128]  arXiv:2405.02569 [pdf, other]
Title: Decoupling Exploration and Exploitation for Unsupervised Pre-training with Successor Features
Comments: IJCNN 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Unsupervised pre-training has been on the lookout for the virtue of a value function representation referred to as successor features (SFs), which decouples the dynamics of the environment from the rewards. It has a significant impact on the process of task-specific fine-tuning due to the decomposition. However, existing approaches struggle with local optima due to the unified intrinsic reward of exploration and exploitation without considering the linear regression problem and the discriminator supporting a small skill sapce. We propose a novel unsupervised pre-training model with SFs based on a non-monolithic exploration methodology. Our approach pursues the decomposition of exploitation and exploration of an agent built on SFs, which requires separate agents for the respective purpose. The idea will leverage not only the inherent characteristics of SFs such as a quick adaptation to new tasks but also the exploratory and task-agnostic capabilities. Our suggested model is termed Non-Monolithic unsupervised Pre-training with Successor features (NMPS), which improves the performance of the original monolithic exploration method of pre-training with SFs. NMPS outperforms Active Pre-training with Successor Features (APS) in a comparative experiment.

[129]  arXiv:2405.02571 [pdf, other]
Title: ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy
Comments: Nephrectomy surgery, Surgical Phase Recognition, Surgical Workflow Segmentation, 11 pages, 2 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Surgical action localization is a challenging computer vision problem. While it has promising applications including automated training of surgery procedures, surgical workflow optimization, etc., appropriate model design is pivotal to accomplishing this task. Moreover, the lack of suitable medical datasets adds an additional layer of complexity. To that effect, we introduce a new complex dataset of nephrectomy surgeries called UroSlice. To perform the action localization from these videos, we propose a novel model termed as `ViTALS' (Vision Transformer for Action Localization in Surgical Nephrectomy). Our model incorporates hierarchical dilated temporal convolution layers and inter-layer residual connections to capture the temporal correlations at finer as well as coarser granularities. The proposed approach achieves state-of-the-art performance on Cholec80 and UroSlice datasets (89.8% and 66.1% accuracy, respectively), validating its effectiveness.

[130]  arXiv:2405.02572 [pdf, other]
Title: Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
Comments: 12 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy data. However, these methods suffer from the high variance of the off-policy policy gradient (OPPG) estimator, which results in poor sample efficiency during training. In this paper, we propose an off-policy policy gradient method with the optimal action-dependent baseline (Off-OAB) to mitigate this variance issue. Specifically, this baseline maintains the OPPG estimator's unbiasedness while theoretically minimizing its variance. To enhance practical computational efficiency, we design an approximated version of this optimal baseline. Utilizing this approximation, our method (Off-OAB) aims to decrease the OPPG estimator's variance during policy optimization. We evaluate the proposed Off-OAB method on six representative tasks from OpenAI Gym and MuJoCo, where it demonstrably surpasses state-of-the-art methods on the majority of these tasks.

[131]  arXiv:2405.02573 [pdf, other]
Title: A Combination of BERT and Transformer for Vietnamese Spelling Correction
Comments: 13 pages
Journal-ref: ACIIDS 2022, LNCS, vol 13757, Springer, Cham
Subjects: Computation and Language (cs.CL)

Recently, many studies have shown the efficiency of using Bidirectional Encoder Representations from Transformers (BERT) in various Natural Language Processing (NLP) tasks. Specifically, English spelling correction task that uses Encoder-Decoder architecture and takes advantage of BERT has achieved state-of-the-art result. However, to our knowledge, there is no implementation in Vietnamese yet. Therefore, in this study, a combination of Transformer architecture (state-of-the-art for Encoder-Decoder model) and BERT was proposed to deal with Vietnamese spelling correction. The experiment results have shown that our model outperforms other approaches as well as the Google Docs Spell Checking tool, achieves an 86.24 BLEU score on this task.

[132]  arXiv:2405.02574 [pdf, ps, other]
Title: A Data Mining-Based Dynamical Anomaly Detection Method for Integrating with an Advance Metering System
Authors: Sarit Maitra
Subjects: Machine Learning (cs.LG)

Building operations consume 30% of total power consumption and contribute 26% of global power-related emissions. Therefore, monitoring, and early detection of anomalies at the meter level are essential for residential and commercial buildings. This work investigates both supervised and unsupervised approaches and introduces a dynamic anomaly detection system. The system introduces a supervised Light Gradient Boosting machine and an unsupervised autoencoder with a dynamic threshold. This system is designed to provide real-time detection of anomalies at the meter level. The proposed dynamical system comes with a dynamic threshold based on the Mahalanobis distance and moving averages. This approach allows the system to adapt to changes in the data distribution over time. The effectiveness of the proposed system is evaluated using real-life power consumption data collected from smart metering systems. This empirical testing ensures that the system's performance is validated under real-world conditions. By detecting unusual data movements and providing early warnings, the proposed system contributes significantly to visual analytics and decision science. Early detection of anomalies enables timely troubleshooting, preventing financial losses and potential disasters such as fire incidents.

[133]  arXiv:2405.02576 [pdf, other]
Title: CTD4 - A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks compared to conventional Reinforcement Learning (RL) approaches. However, the practical application of CDRL is encumbered by challenging projection steps, detailed parameter tuning, and domain knowledge. This paper addresses these challenges by introducing a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces. The proposed algorithm simplifies the implementation of distributional RL, adopting an actor-critic architecture wherein the critic outputs a continuous probability distribution. Additionally, we propose an ensemble of multiple critics fused through a Kalman fusion mechanism to mitigate overestimation bias. Through a series of experiments, we validate that our proposed method is easy to train and serves as a sample-efficient solution for executing complex continuous-control tasks.

[134]  arXiv:2405.02578 [pdf, ps, other]
Title: Mixat: A Data Set of Bilingual Emirati-English Speech
Comments: SIGUL 2024
Subjects: Computation and Language (cs.CL)

This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. Mixat was developed to address the shortcomings of current speech recognition resources when applied to Emirati speech, and in particular, to bilignual Emirati speakers who often mix and switch between their local dialect and English. The data set consists of 15 hours of speech derived from two public podcasts featuring native Emirati speakers, one of which is in the form of conversations between the host and a guest. Therefore, the collection contains examples of Emirati-English code-switching in both formal and natural conversational contexts. In this paper, we describe the process of data collection and annotation, and describe some of the features and statistics of the resulting data set. In addition, we evaluate the performance of pre-trained Arabic and multi-lingual ASR systems on our dataset, demonstrating the shortcomings of existing models on this low-resource dialectal Arabic, and the additional challenge of recognizing code-switching in ASR. The dataset will be made publicly available for research use.

[135]  arXiv:2405.02579 [pdf, other]
Title: Innate Motivation for Robot Swarms by Minimizing Surprise: From Simple Simulations to Real-World Experiments
Comments: Published in IEEE Transactions on Robotics
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)

Applications of large-scale mobile multi-robot systems can be beneficial over monolithic robots because of higher potential for robustness and scalability. Developing controllers for multi-robot systems is challenging because the multitude of interactions is hard to anticipate and difficult to model. Automatic design using machine learning or evolutionary robotics seem to be options to avoid that challenge, but bring the challenge of designing reward or fitness functions. Generic reward and fitness functions seem unlikely to exist and task-specific rewards often have undesired side effects. Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity. Our approach to innate motivation is to minimize surprise, which we implement by maximizing the accuracy of the swarm robot's sensor predictions using neuroevolution. A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop. We summarize our previous simulation-based results concerning behavioral diversity, robustness, scalability, and engineered self-organization, and put them into context. In several new studies, we analyze the influence of the optimizer's hyperparameters, the scalability of evolved behaviors, and the impact of realistic robot simulations. Finally, we present results using real robots that show how the reality gap can be bridged.

[136]  arXiv:2405.02580 [pdf, other]
Title: PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new prop- erty for a given code. While this basic process is relatively straight- forward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) runtime-verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated prop- erties. We have implemented these strategies into a novel system called PropertyGPT, with 623 human-written properties collected from 23 Certora projects. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, resulting in $8,256 bug bounty rewards.

[137]  arXiv:2405.02581 [pdf, other]
Title: Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements
Comments: Accepted at CVPR24 as Poster Highlight
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning compatible representations enables the interchangeable use of semantic features as models are updated over time. This is particularly relevant in search and retrieval systems where it is crucial to avoid reprocessing of the gallery images with the updated model. While recent research has shown promising empirical evidence, there is still a lack of comprehensive theoretical understanding about learning compatible representations. In this paper, we demonstrate that the stationary representations learned by the $d$-Simplex fixed classifier optimally approximate compatibility representation according to the two inequality constraints of its formal definition. This not only establishes a solid foundation for future works in this line of research but also presents implications that can be exploited in practical learning scenarios. An exemplary application is the now-standard practice of downloading and fine-tuning new pre-trained models. Specifically, we show the strengths and critical issues of stationary representations in the case in which a model undergoing sequential fine-tuning is asynchronously replaced by downloading a better-performing model pre-trained elsewhere. Such a representation enables seamless delivery of retrieval service (i.e., no reprocessing of gallery images) and offers improved performance without operational disruptions during model replacement. Code available at: https://github.com/miccunifi/iamcl2r.

[138]  arXiv:2405.02583 [pdf, other]
Title: Explainable Interface for Human-Autonomy Teaming: A Survey
Comments: 45 pages, 9 figures
Subjects: Artificial Intelligence (cs.AI)

Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.

[139]  arXiv:2405.02585 [pdf, ps, other]
Title: Maximal Guesswork Leakage
Comments: 6 pages. Extended version of a paper accepted to ISIT 2024
Subjects: Information Theory (cs.IT)

We introduce the study of information leakage through \emph{guesswork}, the minimum expected number of guesses required to guess a random variable. In particular, we define \emph{maximal guesswork leakage} as the multiplicative decrease, upon observing $Y$, of the guesswork of a randomized function of $X$, maximized over all such randomized functions. We also study a pointwise form of the leakage which captures the leakage due to the release of a single realization of $Y$. We also study these two notions of leakage with oblivious (or memoryless) guessing. We obtain closed-form expressions for all these leakage measures, with the exception of one. Specifically, we are able to obtain closed-form expression for maximal guesswork leakage for the binary erasure source only; deriving expressions for arbitrary sources appears challenging. Some of the consequences of our results are -- a connection between guesswork and differential privacy and a new operational interpretation to maximal $\alpha$-leakage in terms of guesswork.

[140]  arXiv:2405.02586 [pdf, other]
Title: Generalizing CLIP to Unseen Domain via Text-Guided Diverse Novel Feature Synthesis
Comments: 24 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language foundation models like CLIP have shown impressive zero-shot generalization, but finetuning on downstream datasets can cause overfitting and loss of its generalization ability on unseen domains. Although collecting additional data from new domains of interest is possible, this method is often impractical due to the challenges in obtaining annotated data. To address this, we propose a plug-and-play feature augmentation method called LDFS (Language-Guided Diverse Feature Synthesis) to synthesize new domain features and improve existing CLIP fine-tuning strategies. LDFS has three main contributions: 1) To synthesize novel domain features and promote diversity, we propose an instance-conditional feature augmentation strategy based on a textguided feature augmentation loss. 2) To maintain feature quality after augmenting, we introduce a pairwise regularizer to preserve augmented feature coherence within the CLIP feature space. 3) We propose to use stochastic text feature augmentation to reduce the modality gap and further facilitate the process of text-guided feature synthesis. Extensive experiments show LDFS superiority in improving CLIP generalization ability on unseen domains without collecting data from those domains. The code will be made publicly available.

[141]  arXiv:2405.02590 [pdf, other]
Title: Performance Evaluation of PAC Decoding with Deep Neural Networks
Subjects: Information Theory (cs.IT)

By concatenating a polar transform with a convolutional transform, polarization-adjusted convolutional (PAC) codes can reach the dispersion approximation bound in certain rate cases. However, the sequential decoding nature of traditional PAC decoding algorithms results in high decoding latency. Due to the parallel computing capability, deep neural network (DNN) decoders have emerged as a promising solution. In this paper, we propose three types of DNN decoders for PAC codes: multi-layer perceptron (MLP), convolutional neural network (CNN), and recurrent neural network (RNN). The performance of these DNN decoders is evaluated through extensive simulation. Numerical results show that the MLP decoder has the best error-correction performance under a similar model parameter number.

[142]  arXiv:2405.02591 [pdf, other]
Title: Better YOLO with Attention-Augmented Network and Enhanced Generalization Performance for Safety Helmet Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Safety helmets play a crucial role in protecting workers from head injuries in construction sites, where potential hazards are prevalent. However, currently, there is no approach that can simultaneously achieve both model accuracy and performance in complex environments. In this study, we utilized a Yolo-based model for safety helmet detection, achieved a 2% improvement in mAP (mean Average Precision) performance while reducing parameters and Flops count by over 25%. YOLO(You Only Look Once) is a widely used, high-performance, lightweight model architecture that is well suited for complex environments. We presents a novel approach by incorporating a lightweight feature extraction network backbone based on GhostNetv2, integrating attention modules such as Spatial Channel-wise Attention Net(SCNet) and Coordination Attention Net(CANet), and adopting the Gradient Norm Aware optimizer (GAM) for improved generalization ability. In safety-critical environments, the accurate detection and speed of safety helmets plays a pivotal role in preventing occupational hazards and ensuring compliance with safety protocols. This work addresses the pressing need for robust and efficient helmet detection methods, offering a comprehensive framework that not only enhances accuracy but also improves the adaptability of detection models to real-world conditions. Our experimental results underscore the synergistic effects of GhostNetv2, attention modules, and the GAM optimizer, presenting a compelling solution for safety helmet detection that achieves superior performance in terms of accuracy, generalization, and efficiency.

[143]  arXiv:2405.02594 [pdf, other]
Title: Leveraging (Biased) Information: Multi-armed Bandits with Offline Data
Comments: 24 pages, 5 figures. Accepted to ICML 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We leverage offline data to facilitate online learning in stochastic multi-armed bandits. The probability distributions that govern the offline data and the online rewards can be different. Without any non-trivial upper bound on their difference, we show that no non-anticipatory policy can outperform the UCB policy by (Auer et al. 2002), even in the presence of offline data. In complement, we propose an online policy MIN-UCB, which outperforms UCB when a non-trivial upper bound is given. MIN-UCB adaptively chooses to utilize the offline data when they are deemed informative, and to ignore them otherwise. MIN-UCB is shown to be tight in terms of both instance independent and dependent regret bounds. Finally, we corroborate the theoretical results with numerical experiments.

[144]  arXiv:2405.02595 [pdf, other]
Title: Vision-based 3D occupancy prediction in autonomous driving: a review and outlook
Comments: 20 pages, 20 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous driving. Although numerous studies have demonstrated the greater advantages of 3D occupancy prediction over object-centric perception tasks, there is still a lack of a dedicated review focusing on this rapidly developing field. In this paper, we first introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. Secondly, we conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects: feature enhancement, deployment friendliness and label efficiency, and provide an in-depth analysis of the potentials and challenges of each category of methods. Finally, we present a summary of prevailing research trends and propose some inspiring future outlooks. To provide a valuable reference for researchers, a regularly updated collection of related papers, datasets, and codes is organized at https://github.com/zya3d/Awesome-3D-Occupancy-Prediction.

[145]  arXiv:2405.02596 [pdf, other]
Title: Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Fine-tuning large language models (LLM) can be costly. Parameter-efficient fine-tuning (PEFT) addresses the problems by training a fraction of the parameters, whose success reveals the expressiveness and flexibility of pretrained models. This paper studies the limit of PEFT, by further simplifying its design and reducing the number of trainable parameters beyond standard setups. To this end, we use Random Masking to fine-tune the pretrained model. Despite its simplicity, we show that Random Masking is surprisingly effective: with a larger-than-expected learning rate, Random Masking can match the performance of standard PEFT algorithms such as LoRA on various tasks, using fewer trainable parameters. We provide both empirical and theoretical explorations into the success of Random Masking. We show that masking induces a flatter loss landscape and more distant solutions, which allows for and necessitates large learning rates.

[146]  arXiv:2405.02598 [pdf, other]
Title: UDUC: An Uncertainty-driven Approach for Learning-based Robust Control
Subjects: Machine Learning (cs.LG)

Learning-based techniques have become popular in both model predictive control (MPC) and reinforcement learning (RL). Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics, showcasing the ability to capture uncertainty and scalability in high-dimensional control scenarios. However, PE models are susceptible to mode collapse, resulting in non-robust control when faced with environments slightly different from the training set. In this paper, we introduce the $\textbf{u}$ncertainty-$\textbf{d}$riven rob$\textbf{u}$st $\textbf{c}$ontrol (UDUC) loss as an alternative objective for training PE models, drawing inspiration from contrastive learning. We analyze the robustness of UDUC loss through the lens of robust optimization and evaluate its performance on the challenging Real-world Reinforcement Learning (RWRL) benchmark, which involves significant environmental mismatches between the training and testing environments.

[147]  arXiv:2405.02599 [pdf, ps, other]
Title: Assembling ensembling: An adventure in approaches across disciplines
Comments: 33 pages, 4 figures
Subjects: Digital Libraries (cs.DL)

When we think of model ensembling or ensemble modeling, there are many possibilities that come to mind in different disciplines. For example, one might think of a set of descriptions of a phenomenon in the world, perhaps a time series or a snapshot of multivariate space, and perhaps that set is comprised of data-independent descriptions, or perhaps it is quite intentionally fit *to* data, or even a suite of data sets with a common theme or intention. The very meaning of 'ensemble' - a collection together - conjures different ideas across and even within disciplines approaching phenomena. In this paper, we present a typology of the scope of these potential perspectives. It is not our goal to present a review of terms and concepts, nor is it to convince all disciplines to adopt a common suite of terms, which we view as futile. Rather, our goal is to disambiguate terms, concepts, and processes associated with 'ensembles' and 'ensembling' in order to facilitate communication, awareness, and possible adoption of tools across disciplines.

[148]  arXiv:2405.02602 [pdf, other]
Title: Astro-NER -- Astronomy Named Entity Recognition: Is GPT a Good Domain Expert Annotator?
Comments: 9 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

In this study, we address one of the challenges of developing NER models for scholarly domains, namely the scarcity of suitable labeled data. We experiment with an approach using predictions from a fine-tuned LLM model to aid non-domain experts in annotating scientific entities within astronomy literature, with the goal of uncovering whether such a collaborative process can approximate domain expertise. Our results reveal moderate agreement between a domain expert and the LLM-assisted non-experts, as well as fair agreement between the domain expert and the LLM model's predictions. In an additional experiment, we compare the performance of finetuned and default LLMs on this task. We have also introduced a specialized scientific entity annotation scheme for astronomy, validated by a domain expert. Our approach adopts a scholarly research contribution-centric perspective, focusing exclusively on scientific entities relevant to the research theme. The resultant dataset, containing 5,000 annotated astronomy article titles, is made publicly available.

[149]  arXiv:2405.02604 [pdf, ps, other]
Title: Interleave Frequency Division Multiplexing
Comments: Accepted by IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this letter, we study interleave frequency division multiplexing (IFDM) for multicarrier modulation in static multipath and mobile time-varying channels, which outperforms orthogonal frequency division multiplexing (OFDM), orthogonal time frequency space (OTFS), and affine frequency division multiplexing (AFDM) by considering practical advanced detectors. The fundamental principle underlying existing modulation techniques is to establish sparse equivalent channel matrices in order to facilitate the design of low-complexity detection algorithms for signal recovery, making a trade-off between performance and implementation complexity. In contrast, the proposed IFDM establishes an equivalent fully dense and right-unitarily invariant channel matrix with the goal of achieving channel capacity, ensuring that the signals undergo sufficient statistical channel fading. Meanwhile, a low-complexity and replica maximum a posteriori (MAP)-optimal cross-domain memory approximate message passing (CD-MAMP) detector is proposed for IFDM by exploiting the sparsity of the time-domain channel and the unitary invariance in interleave-frequency-domain channel. Numerical results show that IFDM with extremely low-complexity CD-MAMP outperforms OFDM, OTFS, and AFDM with state-of-the-art orthogonal approximate message passing detectors, particularly at low velocities.

[150]  arXiv:2405.02605 [pdf, other]
Title: MEXGEN: An Effective and Efficient Information Gain Approximation for Information Gathering Path Planning
Comments: Accepted to IEEE Robotics and Automation Letters (RA-L)(Demo Video: this https URL)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Autonomous robots for gathering information on objects of interest has numerous real-world applications because of they improve efficiency, performance and safety. Realizing autonomy demands online planning algorithms to solve sequential decision making problems under uncertainty; because, objects of interest are often dynamic, object state, such as location is not directly observable and are obtained from noisy measurements. Such planning problems are notoriously difficult due to the combinatorial nature of predicting the future to make optimal decisions. For information theoretic planning algorithms, we develop a computationally efficient and effective approximation for the difficult problem of predicting the likely sensor measurements from uncertain belief states}. The approach more accurately predicts information gain from information gathering actions. Our theoretical analysis proves the proposed formulation achieves a lower prediction error than the current efficient-method. We demonstrate improved performance gains in radio-source tracking and localization problems using extensive simulated and field experiments with a multirotor aerial robot.

[151]  arXiv:2405.02606 [pdf, ps, other]
Title: Communication Modalities
Authors: Roman Kuznets
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)

Epistemic analysis of distributed systems is one of the biggest successes among applications of logic in computer science. The reason for that is that agents' actions are necessarily guided by their knowledge. Thus, epistemic modal logic, with its knowledge and belief modalities (and group versions thereof), has played a vital role in establishing both impossibility results and necessary conditions for solvable distributed tasks. In distributed systems, knowledge is largely attained via communication. It has been standard in both distributed systems and dynamic epistemic logic to treat incoming messages as trustworthy, thus, creating difficulties in the epistemic analysis of byzantine distributed systems where faulty agents may lie. In this paper, we argue that handling such communication scenarios calls for additional modalities representing the informational content of messages that should not be taken at face value. We present two such modalities: hope for the case of fully byzantine agents and creed for non-uniform communication protocols in general.

[152]  arXiv:2405.02608 [pdf, other]
Title: UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Comments: Accepted by CVPR 2024. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Traditional unsupervised optical flow methods are vulnerable to occlusions and motion boundaries due to lack of object-level information. Therefore, we propose UnSAMFlow, an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM). We first include a self-supervised semantic augmentation module tailored to SAM masks. We also analyze the poor gradient landscapes of traditional smoothness losses and propose a new smoothness definition based on homography instead. A simple yet effective mask feature module has also been added to further aggregate features on the object level. With all these adaptations, our method produces clear optical flow estimation with sharp boundaries around objects, which outperforms state-of-the-art methods on both KITTI and Sintel datasets. Our method also generalizes well across domains and runs very efficiently.

[153]  arXiv:2405.02609 [pdf, other]
Title: Advanced Equalization in 112 Gb/s Upstream PON Using a Novel Fourier Convolution-based Network
Comments: 4 pages, 5 figures
Subjects: Machine Learning (cs.LG)

We experimentally demonstrate a novel, low-complexity Fourier Convolution-based Network (FConvNet) based equalizer for 112 Gb/s upstream PAM4-PON. At a BER of 0.005, FConvNet enhances the receiver sensitivity by 2 and 1 dB compared to a 51-tap Sato equalizer and benchmark machine learning algorithms respectively.

[154]  arXiv:2405.02611 [pdf, ps, other]
Title: Predicting the impact of water transport on carbonation-induced corrosion in variably saturated reinforced concrete
Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Applied Physics (physics.app-ph); Chemical Physics (physics.chem-ph)

A modelling framework for predicting carbonation-induced corrosion in reinforced concrete is presented. The framework constituents include a new model for water transport in cracked concrete, a link between corrosion current density and water saturation, and a theory for characterising concrete carbonation. The theoretical framework is numerically implemented using the finite element method and model predictions are extensively benchmarked against experimental data. The results show that the model is capable of accurately predicting carbonation progress, as well as wetting and drying of cracked and uncracked concrete, revealing a very good agreement with independent experiments from a set of consistent parameters. In addition, insight is gained into the evolution of carbonation penetration and corrosion current density under periodic wetting and drying conditions. Among others, we find that cyclic wetting periods significantly speed up the carbonation progress and that the induced corrosion current density is very sensitive to concrete saturation.

[155]  arXiv:2405.02612 [pdf, other]
Title: Learning Linear Utility Functions From Pairwise Comparison Queries
Comments: Submitted to ECAI for review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)

We study learnability of linear utility functions from pairwise comparison queries. In particular, we consider two learning objectives. The first objective is to predict out-of-sample responses to pairwise comparisons, whereas the second is to approximately recover the true parameters of the utility function. We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective, both when query responses are uncorrupted by noise, and under Tsybakov noise when the distributions are sufficiently "nice". In contrast, we show that utility parameters are not learnable for a large set of data distributions without strong modeling assumptions, even when query responses are noise-free. Next, we proceed to analyze the learning problem in an active learning setting. In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings. Our results thus exhibit a qualitative learnability gap between passive and active learning from pairwise preference queries, demonstrating the value of the ability to select pairwise queries for utility learning.

[156]  arXiv:2405.02615 [pdf, other]
Title: TetraBFT: Reducing Latency of Unauthenticated, Responsive BFT Consensus
Comments: The full version of the PODC 2024 paper
Subjects: Cryptography and Security (cs.CR)

This paper presents TetraBFT, a novel unauthenticated Byzantine fault tolerant protocol for solving consensus in partial synchrony, eliminating the need for public key cryptography and ensuring resilience against computationally unbounded adversaries. TetraBFT has several compelling features: it necessitates only constant local storage, has optimal communication complexity, satisfies optimistic responsiveness -- allowing the protocol to operate at actual network speeds under ideal conditions -- and can achieve consensus in just 5 message delays, which outperforms all known unauthenticated protocols achieving the other properties listed. We validate the correctness of TetraBFT through rigorous security analysis and formal verification. Furthermore, we extend TetraBFT into a multi-shot, chained consensus protocol, making a pioneering effort in applying pipelining techniques to unauthenticated protocols. This positions TetraBFT as a practical and deployable solution for blockchain systems aiming for high efficiency.

[157]  arXiv:2405.02616 [pdf, ps, other]
Title: Convergence analysis of a second order numerical scheme for the Flory-Huggins-Cahn-Hilliard-Navier-Stokes system
Subjects: Numerical Analysis (math.NA)

We present an optimal rate convergence analysis for a second order accurate in time, fully discrete finite difference scheme for the Cahn-Hilliard-Navier-Stokes (CHNS) system, combined with logarithmic Flory-Huggins energy potential. The numerical scheme has been recently proposed, and the positivity-preserving property of the logarithmic arguments, as well as the total energy stability, have been theoretically justified. In this paper, we rigorously prove second order convergence of the proposed numerical scheme, in both time and space. Since the CHNS is a coupled system, the standard $\ell^\infty (0, T; \ell^2) \cap \ell^2 (0, T; H_h^2)$ error estimate could not be easily derived, due to the lack of regularity to control the numerical error associated with the coupled terms. Instead, the $\ell^\infty (0, T; H_h^1) \cap \ell^2 (0, T; H_h^3)$ error analysis for the phase variable and the $\ell^\infty (0, T; \ell^2)$ analysis for the velocity vector, which shares the same regularity as the energy estimate, is more suitable to pass through the nonlinear analysis for the error terms associated with the coupled physical process. Furthermore, the highly nonlinear and singular nature of the logarithmic error terms makes the convergence analysis even more challenging, since a uniform distance between the numerical solution and the singular limit values of is needed for the associated error estimate. Many highly non-standard estimates, such as a higher order asymptotic expansion of the numerical solution (up to the third order accuracy in time and fourth order in space), combined with a rough error estimate (to establish the maximum norm bound for the phase variable), as well as a refined error estimate, have to be carried out to conclude the desired convergence result.

[158]  arXiv:2405.02620 [pdf, other]
Title: Accelerating Autonomy: Insights from Pro Racers in the Era of Autonomous Racing - An Expert Interview Study
Comments: 8 pages, 6 figures
Subjects: Robotics (cs.RO)

This research aims to investigate professional racing drivers' expertise to develop an understanding of their cognitive and adaptive skills to create new autonomy algorithms. An expert interview study was conducted with 11 professional race drivers, data analysts, and racing instructors from across prominent racing leagues. The interviews were conducted using an exploratory, non-standardized expert interview format guided by a set of prepared questions. The study investigates drivers' exploration strategies to reach their vehicle limits and contrasts them with the capabilities of state-of-the-art autonomous racing software stacks. Participants were questioned about the techniques and skills they have developed to quickly approach and maneuver at the vehicle limit, ultimately minimizing lap times. The analysis of the interviews was grounded in Mayring's qualitative content analysis framework, which facilitated the organization of the data into multiple categories and subcategories. Our findings create insights into human behavior regarding reaching a vehicle's limit and minimizing lap times. We conclude from the findings the development of new autonomy software modules that allow for more adaptive vehicle behavior. By emphasizing the distinct nuances between manual and autonomous driving techniques, the paper encourages further investigation into human drivers' strategies to maximize their vehicles' capabilities.

[159]  arXiv:2405.02628 [pdf, other]
Title: Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Molecular property prediction is a key component of AI-driven drug discovery and molecular characterization learning. Despite recent advances, existing methods still face challenges such as limited ability to generalize, and inadequate representation of learning from unlabeled data, especially for tasks specific to molecular structures. To address these limitations, we introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction. This architecture leverages the power of contrast learning with dual interaction mechanisms and unique molecular graph enhancement strategies. DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization. The framework's ability to extract key information about molecular structure and higher-order semantics is supported by minimizing loss of contrast. We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks. In addition to demonstrating superior transferability in a small number of learning scenarios, our visualizations highlight DIG-Mol's enhanced interpretability and representation capabilities. These findings confirm the effectiveness of our approach in overcoming challenges faced by traditional methods and mark a significant advance in molecular property prediction.

[160]  arXiv:2405.02629 [pdf, other]
Title: SPARSE: Semantic Tracking and Path Analysis for Attack Investigation in Real-time
Subjects: Cryptography and Security (cs.CR)

As the complexity and destructiveness of Advanced Persistent Threat (APT) increase, there is a growing tendency to identify a series of actions undertaken to achieve the attacker's target, called attack investigation. Currently, analysts construct the provenance graph to perform causality analysis on Point-Of-Interest (POI) event for capturing critical events (related to the attack). However, due to the vast size of the provenance graph and the rarity of critical events, existing attack investigation methods suffer from problems of high false positives, high overhead, and high latency. To this end, we propose SPARSE, an efficient and real-time system for constructing critical component graphs (i.e., consisting of critical events) from streaming logs. Our key observation is 1) Critical events exist in a suspicious semantic graph (SSG) composed of interaction flows between suspicious entities, and 2) Information flows that accomplish attacker's goal exist in the form of paths. Therefore, SPARSE uses a two-stage framework to implement attack investigation (i.e., constructing the SSG and performing path-level contextual analysis). First, SPARSE operates in a state-based mode where events are consumed as streams, allowing easy access to the SSG related to the POI event through semantic transfer rule and storage strategy. Then, SPARSE identifies all suspicious flow paths (SFPs) related to the POI event from the SSG, quantifies the influence of each path to filter irrelevant events. Our evaluation on a real large-scale attack dataset shows that SPARSE can generate a critical component graph (~ 113 edges) in 1.6 seconds, which is 2014 X smaller than the backtracking graph (~ 227,589 edges). SPARSE is 25 X more effective than other state-of-the-art techniques in filtering irrelevant edges.

[161]  arXiv:2405.02631 [pdf, other]
Title: Unsupervised machine learning for data-driven classification of rock mass using drilling data: How can a data-driven system handle limitations in existing rock mass classification systems?
Comments: 38 pages, 11 figures. Includes ancillary interactive versions of some figures
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Systems and Control (eess.SY)

Rock mass classification systems are crucial for assessing stability and risk in underground construction globally and guiding support and excavation design. However, systems developed primarily in the 1970s lack access to modern high-resolution data and advanced statistical techniques, limiting their effectiveness as decision-support systems. Initially, we outline the limitations observed in this context and later describe how a data-driven system, based on drilling data as detailed in this study, can overcome these limitations. Using extracted statistical information from thousands of MWD-data values in one-meter sections of a full tunnel profile, thus working as a signature of the rock mass, we have demonstrated that it is possible to form well-defined clusters that can act as a foundational basis for various rock mass classification systems. We reduced the dimensionality of 48-value vectors using nonlinear manifold learning techniques (UMAP) and linear principal component analysis (PCA) to enhance clustering. Unsupervised machine learning methods (HDBSCAN, Agglomerative Clustering, K-means) were employed to cluster the data, with hyperparameters optimised through multi-objective Bayesian optimisation for effective clustering. Using domain knowledge, we experienced improved clustering and system tuning opportunities in adding extra features to core clusters of MWD-data. We structured and correlated these clusters with physical rock mass properties, including labels of rock type and rock quality, and analysed cumulative distributions of key MWD-parameters for rock mass assessment to determine if clusters meaningfully differentiate rock masses. The ability of MWD data to form distinct rock mass clusters suggests substantial potential for future classification systems grounded in this objective, data-driven methodology, free from human bias.

[162]  arXiv:2405.02633 [pdf, other]
Title: Risk Assessment for Nonlinear Cyber-Physical Systems under Stealth Attacks
Comments: 12 pages and 9 figures
Subjects: Systems and Control (eess.SY)

Stealth attacks pose potential risks to cyber-physical systems because they are difficult to detect. Assessing the risk of systems under stealth attacks remains an open challenge, especially in nonlinear systems. To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario. We propose an algorithm to approximate the reachability of a nonlinear system under stealth attacks with a union of standard sets. Meanwhile, we present a method to construct a risk field to formally describe the risk distribution in a given scenario. The intersection relationships of system reachability and risk regions in the risk field indicate that attackers can cause corresponding risks without being detected. Based on this, we introduce a metric to dynamically quantify the risk. Compared to traditional methods, our framework predicts the risk value in an explainable way and provides early warnings for safety control. We demonstrate the effectiveness of our framework through a case study of an automated warehouse.

[163]  arXiv:2405.02634 [pdf, other]
Title: Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The black box nature of deep learning models complicate their usage in critical applications such as remote sensing. Conformal prediction is a method to ensure trust in such scenarios. Subject to data exchangeability, conformal prediction provides finite sample coverage guarantees in the form of a prediction set that is guaranteed to contain the true class within a user defined error rate. In this letter we show that conformal prediction algorithms are related to the uncertainty of the deep learning model and that this relation can be used to detect if the deep learning model is out-of-calibration. Popular classification models like Resnet50, Densenet161, InceptionV3, and MobileNetV2 are applied on remote sensing datasets such as the EuroSAT to demonstrate how under noisy scenarios the model outputs become untrustworthy. Furthermore an out-of-calibration detection procedure relating the model uncertainty and the average size of the conformal prediction set is presented.

[164]  arXiv:2405.02637 [pdf, other]
Title: TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants
Comments: To appear in SIGIR 2024. arXiv admin note: substantial text overlap with arXiv:2401.01330
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Conversational information seeking has evolved rapidly in the last few years with the development of Large Language Models (LLMs), providing the basis for interpreting and responding in a naturalistic manner to user requests. The extended TREC Interactive Knowledge Assistance Track (iKAT) collection aims to enable researchers to test and evaluate their Conversational Search Agents (CSA). The collection contains a set of 36 personalized dialogues over 20 different topics each coupled with a Personal Text Knowledge Base (PTKB) that defines the bespoke user personas. A total of 344 turns with approximately 26,000 passages are provided as assessments on relevance, as well as additional assessments on generated responses over four key dimensions: relevance, completeness, groundedness, and naturalness. The collection challenges CSA to efficiently navigate diverse personal contexts, elicit pertinent persona information, and employ context for relevant conversations. The integration of a PTKB and the emphasis on decisional search tasks contribute to the uniqueness of this test collection, making it an essential benchmark for advancing research in conversational and interactive knowledge assistants.

[165]  arXiv:2405.02638 [pdf, other]
Title: PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds
Comments: This paper has been accepted by the 33rd International Joint Conference on Artificial Intelligence(IJCAI 2024)
Subjects: Machine Learning (cs.LG)

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees $(\epsilon, \delta)$-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of $\mathcal{O}(1/\sqrt{nK})$, where $n$ and $K$ are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to $n$. Leveraging the moments accountant method, we further derive an optimal $K$ to maximize the model utility under certain privacy budget in decentralized settings. With this optimized $K$, PrivSGP-VR achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}{\delta} \right)}/(\sqrt{n}J\epsilon) \right)$, where $J$ and $d$ are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of $1/\sqrt{n}$ improvement compared to that of the existing decentralized counterparts, such as A(DP)$^2$SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized $K$, in fully decentralized settings.

[166]  arXiv:2405.02639 [pdf, other]
Title: Wall-Climbing Performance of Gecko-inspired Robot with Soft Feet and Digits enhanced by Gravity Compensation
Subjects: Robotics (cs.RO)

Gravitational forces can induce deviations in body posture from desired configurations in multi-legged arboreal robot locomotion with low leg stiffness, affecting the contact angle between the swing leg's end-effector and the climbing surface during the gait cycle. The relationship between desired and actual foot positions is investigated here in a leg-stiffness-enhanced model under external forces, focusing on the challenge of unreliable end-effector attachment on climbing surfaces in such robots. Inspired by the difference in ceiling attachment postures of dead and living geckos, feedforward compensation of the stance phase legs is the key to solving this problem. A feedforward gravity compensation (FGC) strategy, complemented by leg coordination, is proposed to correct gravity-influenced body posture and improve adhesion stability by reducing body inclination. The efficacy of this strategy is validated using a quadrupedal climbing robot, EF-I, as the experimental platform. Experimental validation on an inverted surface (ceiling walking) highlight the benefits of the FGC strategy, demonstrating its role in enhancing stability and ensuring reliable end-effector attachment without external assistance. In the experiment, robots without FGC only completed in 3 out of 10 trials, while robots with FGC achieved a 100\% success rate in the same trials. The speed was substantially greater with FGC, achieved 9.2 mm/s in the trot gait. This underscores the proposed potential of FGC strategy in overcoming the challenges associated with inconsistent end-effector attachment in robots with low leg stiffness, thereby facilitating stable locomotion even at inverted body attitude.

[167]  arXiv:2405.02642 [pdf, other]
Title: Machine Learning in Space: Surveying the Robustness of on-board ML models to Radiation
Subjects: Machine Learning (cs.LG)

Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space applications are not well-studied. This is a problem: without understanding how ML models are affected by these natural phenomena, it is uncertain "where to start from" to develop radiation-tolerant ML software. As ML researchers, we attempt to tackle this dilemma. By partnering up with space-industry practitioners specialized in ML, we perform a reflective analysis of the state of the art. We provide factual evidence that prior work did not thoroughly examine the impact of natural hazards on ML models meant for spacecraft. Then, through a "negative result", we show that some existing open-source technologies can hardly be used by researchers to study the effects of radiation for some applications of ML in satellites. As a constructive step forward, we perform simple experiments showcasing how to leverage current frameworks to assess the robustness of practical ML models for cloud detection against radiation-induced faults. Our evaluation reveals that not all faults are as devastating as claimed by some prior work. By publicly releasing our resources, we provide a foothold -- usable by researchers without access to spacecraft -- for spearheading development of space-tolerant ML models.

[168]  arXiv:2405.02644 [pdf, other]
Title: Interpretable Multi-View Clustering
Comments: 12 pages,6 figures
Subjects: Machine Learning (cs.LG)

Multi-view clustering has become a significant area of research, with numerous methods proposed over the past decades to enhance clustering accuracy. However, in many real-world applications, it is crucial to demonstrate a clear decision-making process-specifically, explaining why samples are assigned to particular clusters. Consequently, there remains a notable gap in developing interpretable methods for clustering multi-view data. To fill this crucial gap, we make the first attempt towards this direction by introducing an interpretable multi-view clustering framework. Our method begins by extracting embedded features from each view and generates pseudo-labels to guide the initial construction of the decision tree. Subsequently, it iteratively optimizes the feature representation for each view along with refining the interpretable decision tree. Experimental results on real datasets demonstrate that our method not only provides a transparent clustering process for multi-view data but also delivers performance comparable to state-of-the-art multi-view clustering methods. To the best of our knowledge, this is the first effort to design an interpretable clustering framework specifically for multi-view data, opening a new avenue in this field.

[169]  arXiv:2405.02646 [pdf, other]
Title: Updating Windows Malware Detectors: Balancing Robustness and Regression against Adversarial EXEmples
Comments: 11 pages, 3 figures, 7 tables
Subjects: Cryptography and Security (cs.CR)

Adversarial EXEmples are carefully-perturbed programs tailored to evade machine learning Windows malware detectors, with an on-going effort in developing robust models able to address detection effectiveness. However, even if robust models can prevent the majority of EXEmples, to maintain predictive power over time, models are fine-tuned to newer threats, leading either to partial updates or time-consuming retraining from scratch. Thus, even if the robustness against attacks is higher, the new models might suffer a regression in performance by misclassifying threats that were previously correctly detected. For these reasons, we study the trade-off between accuracy and regression when updating Windows malware detectors, by proposing EXE-scanner, a plugin that can be chained to existing detectors to promptly stop EXEmples without causing regression. We empirically show that previously-proposed hardening techniques suffer a regression of accuracy when updating non-robust models. On the contrary, we show that EXE-scanner exhibits comparable performance to robust models without regression of accuracy, and we show how to properly chain it after the base classifier to obtain the best performance without the need of costly retraining. To foster reproducibility, we openly release source code, along with the dataset of adversarial EXEmples based on state-of-the-art perturbation algorithms.

[170]  arXiv:2405.02647 [pdf, ps, other]
Title: SubwayMeshDTN: Exploring Opportunistic Delay Tolerant Routing Protocols when Disseminating Emergency Alerts on a Smart City Subway Network
Subjects: Networking and Internet Architecture (cs.NI)

This paper seeks to understand the effectiveness of using multi-dimensional opportunistic delay-tolerant network (DTN) routing protocols, specifically Epidemic and MaxProp, in the context of New York City (NYC) metropolitan subway network. We examine how efficiently emergency messages spread through mobile, self-configuring, edge-based movement patterns on the train network to understand and propose solutions for improving communication in subterranean environments. Since DTNs are able to store, carry and forward messages through intermediate edges, this paper benchmarks both Wi-Fi and Bluetooth topologies to compare and critically evaluate movement patterns, latency, overheads and delivery rates on pseudo-realistic underground traces. We also show that the accordion effect is predominant in these networks, and therefore, the most effective protocol configurations vary.

[171]  arXiv:2405.02648 [pdf, other]
Title: A Conformal Prediction Score that is Robust to Label Noise
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Conformal Prediction (CP) quantifies network uncertainty by building a small prediction set with a pre-defined probability that the correct class is within this set. In this study we tackle the problem of CP calibration based on a validation set with noisy labels. We introduce a conformal score that is robust to label noise. The noise-free conformal score is estimated using the noisy labeled data and the noise level. In the test phase the noise-free score is used to form the prediction set. We applied the proposed algorithm to several standard medical imaging classification datasets. We show that our method outperforms current methods by a large margin, in terms of the average size of the prediction set, while maintaining the required coverage.

[172]  arXiv:2405.02649 [pdf, other]
Title: Generic Multi-modal Representation Learning for Network Traffic Analysis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Network traffic analysis is fundamental for network management, troubleshooting, and security. Tasks such as traffic classification, anomaly detection, and novelty discovery are fundamental for extracting operational information from network data and measurements. We witness the shift from deep packet inspection and basic machine learning to Deep Learning (DL) approaches where researchers define and test a custom DL architecture designed for each specific problem. We here advocate the need for a general DL architecture flexible enough to solve different traffic analysis tasks. We test this idea by proposing a DL architecture based on generic data adaptation modules, followed by an integration module that summarises the extracted information into a compact and rich intermediate representation (i.e. embeddings). The result is a flexible Multi-modal Autoencoder (MAE) pipeline that can solve different use cases. We demonstrate the architecture with traffic classification (TC) tasks since they allow us to quantitatively compare results with state-of-the-art solutions. However, we argue that the MAE architecture is generic and can be used to learn representations useful in multiple scenarios. On TC, the MAE performs on par or better than alternatives while avoiding cumbersome feature engineering, thus streamlining the adoption of DL solutions for traffic analysis.

[173]  arXiv:2405.02650 [pdf, other]
Title: Identifying Narrative Patterns and Outliers in Holocaust Testimonies Using Topic Modeling
Comments: 9 pages, 7 figures, LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The vast collection of Holocaust survivor testimonies presents invaluable historical insights but poses challenges for manual analysis. This paper leverages advanced Natural Language Processing (NLP) techniques to explore the USC Shoah Foundation Holocaust testimony corpus. By treating testimonies as structured question-and-answer sections, we apply topic modeling to identify key themes. We experiment with BERTopic, which leverages recent advances in language modeling technology. We align testimony sections into fixed parts, revealing the evolution of topics across the corpus of testimonies. This highlights both a common narrative schema and divergences between subgroups based on age and gender. We introduce a novel method to identify testimonies within groups that exhibit atypical topic distributions resembling those of other groups. This study offers unique insights into the complex narratives of Holocaust survivors, demonstrating the power of NLP to illuminate historical discourse and identify potential deviations in survivor experiences.

[174]  arXiv:2405.02652 [pdf, other]
Title: Deep Pulse-Signal Magnification for remote Heart Rate Estimation in Compressed Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recent advancements in remote heart rate measurement (rPPG), motivated by data-driven approaches, have significantly improved accuracy. However, certain challenges, such as video compression, still remain: recovering the rPPG signal from highly compressed videos is particularly complex. Although several studies have highlighted the difficulties and impact of video compression for this, effective solutions remain limited. In this paper, we present a novel approach to address the impact of video compression on rPPG estimation, which leverages a pulse-signal magnification transformation to adapt compressed videos to an uncompressed data domain in which the rPPG signal is magnified. We validate the effectiveness of our model by exhaustive evaluations on two publicly available datasets, UCLA-rPPG and UBFC-rPPG, employing both intra- and cross-database performance at several compression rates. Additionally, we assess the robustness of our approach on two additional highly compressed and widely-used datasets, MAHNOB-HCI and COHFACE, which reveal outstanding heart rate estimation results.

[175]  arXiv:2405.02653 [pdf, other]
Title: Isopignistic Canonical Decomposition via Belief Evolution Network
Subjects: Artificial Intelligence (cs.AI)

Developing a general information processing model in uncertain environments is fundamental for the advancement of explainable artificial intelligence. Dempster-Shafer theory of evidence is a well-known and effective reasoning method for representing epistemic uncertainty, which is closely related to subjective probability theory and possibility theory. Although they can be transformed to each other under some particular belief structures, there remains a lack of a clear and interpretable transformation process, as well as a unified approach for information processing. In this paper, we aim to address these issues from the perspectives of isopignistic belief functions and the hyper-cautious transferable belief model. Firstly, we propose an isopignistic transformation based on the belief evolution network. This transformation allows for the adjustment of the information granule while retaining the potential decision outcome. The isopignistic transformation is integrated with a hyper-cautious transferable belief model to establish a new canonical decomposition. This decomposition offers a reverse path between the possibility distribution and its isopignistic mass functions. The result of the canonical decomposition, called isopignistic function, is an identical information content distribution to reflect the propensity and relative commitment degree of the BPA. Furthermore, this paper introduces a method to reconstruct the basic belief assignment by adjusting the isopignistic function. It explores the advantages of this approach in modeling and handling uncertainty within the hyper-cautious transferable belief model. More general, this paper establishes a theoretical basis for building general models of artificial intelligence based on probability theory, Dempster-Shafer theory, and possibility theory.

[176]  arXiv:2405.02654 [pdf, ps, other]
Title: Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning
Comments: Accepted at IJCAI 2024 (33nd International Joint Conference on Artificial Intelligence - Jeju)
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding about how agents develop neighbour selection behaviours and the formation of strategic assortment within an explicit interaction structure. To address this, our study introduces a computational framework based on multi-agent reinforcement learning in the spatial Prisoner's Dilemma game. This framework allows agents to select dilemma strategies and interacting neighbours based on their long-term experiences, differing from existing research that relies on preset social norms or external incentives. By modelling each agent using two distinct Q-networks, we disentangle the coevolutionary dynamics between cooperation and interaction. The results indicate that long-term experience enables agents to develop the ability to identify non-cooperative neighbours and exhibit a preference for interaction with cooperative ones. This emergent self-organizing behaviour leads to the clustering of agents with similar strategies, thereby increasing network reciprocity and enhancing group cooperation.

[177]  arXiv:2405.02655 [pdf, other]
Title: Fast Online Movement Optimization of Aerial Base Stations Based on Global Connectivity Map
Comments: 6 pages, 6 figures. Investigate site-specific movement optimization of UAV-mounted aerial base stations to cover a group of moving ground users, based on site-specific Global Connectivity Map. arXiv admin note: text overlap with arXiv:2312.10490
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) can serve as aerial base stations (ABSs) to provide wireless connectivity for ground users (GUs) in diverse scenarios. However, it is an NP-hard problem with exponential complexity in $M$ and $N$, in order to maximize the coverage rate (CR) of $M$ GUs by jointly placing $N$ ABSs with limited coverage range. This problem becomes even more intricate when the coverage range becomes irregular due to site-specific obstructions (e.g., buildings) on the air-ground channel, and/or when the GUs are in motion. To address the above challenges, we study a multi-ABS movement optimization problem to maximize the average coverage rate of mobile GUs within a site-specific environment. We tackle this challenging problem by 1) constructing the global connectivity map (GCM) which contains the connectivity information between given pairs of ABS/GU locations; 2) partitioning the ABS movement problem into ABS placement sub-problems and formulate each sub-problem into a binary integer linear programing (BILP) problem based on GCM; 3) proposing a fast online algorithm to execute (one-pass) projected stochastic subgradient descent within the dual space to rapidly solve the BILP problem with near-optimal performance. Numerical results demonstrate that our proposed algorithm achieves a high CR performance close to that obtained by the open source solver (SCIP), yet with significantly reduced running time. In addition, the algorithm also notably outperforms one of the state-of-the-art deep reinforcement learning (DRL) methods and the K-means initiated evolutionary algorithm in terms of CR performance and/or time efficiency.

[178]  arXiv:2405.02659 [pdf, other]
Title: R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models
Subjects: Computation and Language (cs.CL)

Retrieval-augmented large language models (LLMs) leverage relevant content retrieved by information retrieval systems to generate correct responses, aiming to alleviate the hallucination problem. However, existing retriever-responder methods typically append relevant documents to the prompt of LLMs to perform text generation tasks without considering the interaction of fine-grained structural semantics between the retrieved documents and the LLMs. This issue is particularly important for accurate response generation as LLMs tend to ``lose in the middle'' when dealing with input prompts augmented with lengthy documents. In this work, we propose a new pipeline named ``Reinforced Retriever-Reorder-Responder'' (R$^4$) to learn document orderings for retrieval-augmented LLMs, thereby further enhancing their generation abilities while the large numbers of parameters of LLMs remain frozen. The reordering learning process is divided into two steps according to the quality of the generated responses: document order adjustment and document representation enhancement. Specifically, document order adjustment aims to organize retrieved document orderings into beginning, middle, and end positions based on graph attention learning, which maximizes the reinforced reward of response quality. Document representation enhancement further refines the representations of retrieved documents for responses of poor quality via document-level gradient adversarial learning. Extensive experiments demonstrate that our proposed pipeline achieves better factual question-answering performance on knowledge-intensive tasks compared to strong baselines across various public datasets. The source codes and trained models will be released upon paper acceptance.

[179]  arXiv:2405.02660 [pdf, other]
Title: AFDM Channel Estimation in Multi-Scale Multi-Lag Channels
Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a result of mobility, while ignoring the accompanied Doppler time scaling (DTS) on waveform. However, for underwater acoustic (UWA) communication at much lower CF and propagating at speed of sound, the DTS effect could not be ignored and poses significant challenges for channel estimation. This paper analyzes the channel frequency response (CFR) of AFDM under multi-scale multi-lag (MSML) channels, where each propagating path could have different delay and DFS/DTS. Based on the newly derived input-output formula and its characteristics, two new channel estimation methods are proposed, i.e., AFDM with iterative multi-index (AFDM-IMI) estimation under low to moderate DTS, and AFDM with orthogonal matching pursuit (AFDM-OMP) estimation under high DTS. Numerical results confirm the effectiveness of the proposed methods against the original AFDM channel estimation method. Moreover, the resulted AFDM system outperforms OFDM as well as Orthogonal Chirp Division Multiplexing (OCDM) in terms of channel estimation accuracy and bit error rate (BER), which is consistent with our theoretical analysis based on CFR overlap probability (COP), mutual incoherent property (MIP) and channel diversity gain under MSML channels.

[180]  arXiv:2405.02661 [pdf, other]
Title: DDE-Find: Learning Delay Differential Equations from Data
Authors: Robert Stephany
Comments: 42 pages, 19 tables, 8 figures
Subjects: Machine Learning (cs.LG)

Delay Differential Equations (DDEs) are a class of differential equations that can model diverse scientific phenomena. However, identifying the parameters, especially the time delay, that make a DDE's predictions match experimental results can be challenging. We introduce DDE-Find, a data-driven framework for learning a DDE's parameters, time delay, and initial condition function. DDE-Find uses an adjoint-based approach to efficiently compute the gradient of a loss function with respect to the model parameters. We motivate and rigorously prove an expression for the gradients of the loss using the adjoint. DDE-Find builds upon recent developments in learning DDEs from data and delivers the first complete framework for learning DDEs from data. Through a series of numerical experiments, we demonstrate that DDE-Find can learn DDEs from noisy, limited data.

[181]  arXiv:2405.02664 [pdf, other]
Title: MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineering
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

A major roadblock in the seamless digitization of medical records remains the lack of interoperability of existing records. Extracting relevant medical information required for further treatment planning or even research is a time consuming labour intensive task involving the much valuable time of doctors. In this demo paper we present, MedPromptExtract an automated tool using a combination of semi supervised learning, large language models, natural lanuguage processing and prompt engineering to convert unstructured medical records to structured data which is amenable to further analysis.

[182]  arXiv:2405.02665 [pdf, ps, other]
Title: Metric Differential Privacy at the User-Level
Subjects: Cryptography and Security (cs.CR)

Metric differential privacy (DP) provides heterogeneous privacy guarantees based on a distance between the pair of inputs. It is a widely popular notion of privacy since it captures the natural privacy semantics for many applications (such as, for location data) and results in better utility than standard DP. However, prior work in metric DP has primarily focused on the \textit{item-level} setting where every user only reports a single data item. A more realistic setting is that of user-level DP where each user contributes multiple items and privacy is then desired at the granularity of the user's \textit{entire} contribution. In this paper, we initiate the study of metric DP at the user-level. Specifically, we use the earth-mover's distance ($d_\textsf{EM}$) as our metric to obtain a notion of privacy as it captures both the magnitude and spatial aspects of changes in a user's data.
We make three main technical contributions. First, we design two novel mechanisms under $d_\textsf{EM}$-DP to answer linear queries and item-wise queries. Specifically, our analysis for the latter involves a generalization of the privacy amplification by shuffling result which may be of independent interest. Second, we provide a black-box reduction from the general unbounded to bounded $d_\textsf{EM}$-DP (size of the dataset is fixed and public) with a novel sampling based mechanism. Third, we show that our proposed mechanisms can provably provide improved utility over user-level DP, for certain types of linear queries and frequency estimation.

[183]  arXiv:2405.02670 [pdf, other]
Title: From Generalization Analysis to Optimization Designs for State Space Models
Subjects: Machine Learning (cs.LG)

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.

[184]  arXiv:2405.02672 [pdf, other]
Title: Effects of Realism and Representation on Self-Embodied Avatars in Immersive Virtual Environments
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)

Virtual Reality (VR) has recently gained traction with many new and ever more affordable devices being released. The increase in popularity of this paradigm of interaction has given birth to new applications and has attracted casual consumers to experience VR. Providing a self-embodied representation (avatar) of users' full bodies inside shared virtual spaces can improve the VR experience and make it more engaging to both new and experienced users . This is especially important in fully immersive systems, where the equipment completely occludes the real world making self awareness problematic. Indeed, the feeling of presence of the user is highly influenced by their virtual representations, even though small flaws could lead to uncanny valley side-effects. Following previous research, we would like to assess whether using a third-person perspective could also benefit the VR experience, via an improved spatial awareness of the user's virtual surroundings. In this paper we investigate realism and perspective of self-embodied representation in VR setups in natural tasks, such as walking and avoiding obstacles. We compare both First and Third-Person perspectives with three different levels of realism in avatar representation. These range from a stylized abstract avatar, to a "realistic" mesh-based humanoid representation and a point-cloud rendering. The latter uses data captured via depth-sensors and mapped into a virtual self inside the Virtual Environment. We present a throughout evaluation and comparison of these different representations, describing a series of guidelines for self-embodied VR applications. The effects of the uncanny valley are also discussed in the context of navigation and reflex-based tasks.

[185]  arXiv:2405.02673 [pdf, other]
Title: On the Information Redundancy in Non-Autoregressive Translation
Comments: 10 pages, 10 tables
Subjects: Computation and Language (cs.CL)

Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotating the NAT outputs, we identify two types of information redundancy errors that correspond well to lexical and reordering multi-modality problems. Since human annotation is time-consuming and labor-intensive, we propose automatic metrics to evaluate the two types of redundant errors. Our metrics allow future studies to evaluate new methods and gain a more comprehensive understanding of their effectiveness.

[186]  arXiv:2405.02675 [pdf, other]
Title: Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.

[187]  arXiv:2405.02676 [pdf, other]
Title: Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics
Comments: SIGGRAPH 2024 Conference Track
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.

[188]  arXiv:2405.02677 [pdf, other]
Title: Evaluating the Ability of Computationally Extracted Narrative Maps to Encode Media Framing
Comments: Text2Story Workshop 2024
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Narratives serve as fundamental frameworks in our understanding of the world and play a crucial role in collaborative sensemaking, providing a versatile foundation for sensemaking. Framing is a subtle yet potent mechanism that influences public perception through specific word choices, shaping interpretations of reported news events. Despite the recognized importance of narratives and framing, a significant gap exists in the literature with regard to the explicit consideration of framing within the context of computational extraction and representation. This article explores the capabilities of a specific narrative extraction and representation approach -- narrative maps -- to capture framing information from news data. The research addresses two key questions: (1) Does the narrative extraction method capture the framing distribution of the data set? (2) Does it produce a representation with consistent framing? Our results indicate that while the algorithm captures framing distributions, achieving consistent framing across various starting and ending events poses challenges. Our results highlight the potential of narrative maps to provide users with insights into the intricate framing dynamics within news narratives. However, we note that directly leveraging framing information in the computational narrative extraction process remains an open challenge.

[189]  arXiv:2405.02678 [pdf, other]
Title: Position Paper: Quo Vadis, Unsupervised Time Series Anomaly Detection?
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from pursuing only the novelty in model design to improving benchmarking practices, creating non-trivial datasets, and placing renewed emphasis on studying the utility of model architectures for specific tasks. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward.

[190]  arXiv:2405.02681 [pdf, other]
Title: Spider RIS: Mobilizing Intelligent Surfaces for Enhanced Wireless Communications
Comments: Accepted in 2024 EuCNC and 6G Summit, Antwerp, Belgium, 3-6 June 2024
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this study, we introduce Spider RIS technology, which offers an innovative solution to the challenges encountered in movable antennas (MAs) and unmanned aerial vehicle (UAV)-enabled communication systems. By combining the dynamic adaptation capability of MAs and the flexible location advantages of UAVs, this technology offers a dynamic and movable RIS, which can flexibly optimize physical locations within the two-dimensional movement platform. Spider RIS aims to enhance the communication efficiency and reliability of wireless networks, particularly in obstructive environments, by elevating the signal quality and achievable rate. The motivation of Spider RIS is based on the ability to fully exploit the spatial variability of wireless channels and maximize channel capacity even with a limited number of reflecting elements by overcoming the limitations of traditional fixed RIS and energy-intensive UAV systems. Considering the geometry-based millimeter wave channel model, we present the design of a three-stage angular-based hybrid beamforming system empowered by Spider RIS: First, analog beamformers are designed using angular information, followed by the generation of digital precoder/combiner based on the effective channel observed from baseband stage. Subsequently, the joint dynamic positioning with phase shift design of the Spider RIS is optimized using particle swarm optimization, maximizing the achievable rate of the systems.

[191]  arXiv:2405.02682 [pdf, other]
Title: Deduplicator: When Computation Reuse Meets Load Balancing at the Network Edge
Comments: Accepted for publication by IFIP Networking 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

Load balancing has been a fundamental building block of cloud and, more recently, edge computing environments. At the same time, in edge computing environments, prior research has highlighted that applications operate on similar (correlated) data. Based on this observation, prior research has advocated for the direction of "computation reuse", where the results of previously executed computational tasks are stored at the edge and are reused (if possible) to satisfy incoming tasks with similar input data, instead of executing incoming tasks from scratch. Both load balancing and computation reuse are critical to the deployment of scalable edge computing environments, yet they are contradictory in nature. In this paper, we propose the Deduplicator, a middlebox that aims to facilitate both load balancing and computation reuse at the edge. The Deduplicator features mechanisms to identify and deduplicate similar tasks offloaded by user devices, collect information about the usage of edge servers' resources, manage the addition of new edge servers and the failures of existing edge servers, and ultimately balance the load imposed on edge servers. Our evaluation results demonstrate that the Deduplicator achieves up to 20% higher percentages of computation reuse compared to several other load balancing approaches, while also effectively balancing the distribution of tasks among edge servers at line rate.

[192]  arXiv:2405.02683 [pdf, ps, other]
Title: Two-Dimensional Multi-Access Coded Caching with Multiple Transmit Antennas
Comments: A shorter version is accepted for presentation in ISIT 2024. 8 pages, 4 figures
Subjects: Information Theory (cs.IT)

This work introduces a multi-antenna coded caching problem in a two-dimensional multi-access network, where a server with $L$ transmit antennas and $N$ files communicates to $K_1K_2$ users, each with a single receive antenna, through a wireless broadcast link. The network consists of $K_1K_2$ cache nodes and $K_1K_2$ users. The cache nodes, each with capacity $M$, are placed on a rectangular grid with $K_1$ rows and $K_2$ columns, and the users are placed regularly on the square grid such that a user can access $r^2$ neighbouring caches in a cyclic wrap-around fashion. For a given cache memory $M$, the goal of the coded caching problem is to serve the user demands with a minimum delivery time. We propose a solution for the aforementioned coded caching problem by designing two arrays: a caching array and a delivery array. Further, we present two classes of caching and delivery arrays and obtain corresponding multi-access coded caching schemes. The first scheme achieves a normalized delivery time (NDT) $\frac{K_1K_2(1-r^2\frac{M}{N})}{L+K_1K_2\frac{M}{N}}$. The second scheme achieves an NDT $\frac{K_1K_2(1-r^2\frac{M}{N})}{L+K_1K_2r^2\frac{M}{N}}$ when $M/N=1/K_1K_2$ and $L=K_1K_2-r^2$, which is optimal under uncoded placement and one-shot delivery.

[193]  arXiv:2405.02685 [pdf, other]
Title: FedProK: Trustworthy Federated Class-Incremental Learning via Prototypical Feature Knowledge Transfer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Federated Class-Incremental Learning (FCIL) focuses on continually transferring the previous knowledge to learn new classes in dynamic Federated Learning (FL). However, existing methods do not consider the trustworthiness of FCIL, i.e., improving continual utility, privacy, and efficiency simultaneously, which is greatly influenced by catastrophic forgetting and data heterogeneity among clients. To address this issue, we propose FedProK (Federated Prototypical Feature Knowledge Transfer), leveraging prototypical feature as a novel representation of knowledge to perform spatial-temporal knowledge transfer. Specifically, FedProK consists of two components: (1) feature translation procedure on the client side by temporal knowledge transfer from the learned classes and (2) prototypical knowledge fusion on the server side by spatial knowledge transfer among clients. Extensive experiments conducted in both synchronous and asynchronous settings demonstrate that our FedProK outperforms the other state-of-the-art methods in three perspectives of trustworthiness, validating its effectiveness in selectively transferring spatial-temporal knowledge.

[194]  arXiv:2405.02686 [pdf, other]
Title: Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images
Comments: 3 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single neuron reconstruction. To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures. Specifically, in this work, we propose a novel training paradigm that leverages a 2D Vision Transformer model pre-trained on large-scale natural images to initialize our Transformer-based 3D neuron segmentation model with a tailored 2D-to-3D weight transferring strategy. Our method builds a knowledge sharing connection between the abundant natural and the scarce neuron image domains to improve the 3D neuron segmentation ability in a data-efficiency manner. Evaluated on a popular benchmark, BigNeuron, our method enhances neuron segmentation performance by 8.71% over the model trained from scratch with the same amount of training samples.

[195]  arXiv:2405.02687 [pdf, ps, other]
Title: Placement Delivery Arrays for Coded Caching with Shared and Private Caches
Comments: A shorter version is accepted for presentation in ISIT 2024. 11 pages, 4 figures
Subjects: Information Theory (cs.IT)

We consider a coded caching network consisting of a server with a library of $N$ files connected to $K$ users, where each user is equipped with a dedicated cache of size $M_p$ units. In addition to that, the network consists of $\Lambda\leq K$ helper caches, each with a size $M_h$ units. Each helper cache can serve an arbitrary number of users; however, each user can access only a single helper cache. Also, we assume that the server knows the user-to-helper cache association, defined as the sets of users connected to each helper cache, during the cache placement phase. We propose a solution for the aforementioned coded caching problem by introducing a combinatorial structure called a Shared and Private Placement Delivery Array (SP-PDA). These SP-PDAs describe the helper cache placement, private cache placement, and the server transmissions in a single array. Further, we propose a novel construction of SP-PDAs using two Placement Delivery Arrays (PDAs). Interestingly, we observe that the permutations of the columns of the two chosen PDAs result in SP-PDAs with different performances. Moreover, we characterize the conditions for selecting the best column permutations of the chosen PDAs. Furthermore, the coded caching schemes resulting from SP-PDAs subsume two existing coded caching schemes as special cases. Additionally, SP-PDAs enable the construction of coded caching schemes with much smaller subpacketization numbers -subpacketization number is defined as the number of subfiles to which a file is divided- compared to the existing schemes, without paying much in terms of rate (the size of the transmission in the delivery phase).

[196]  arXiv:2405.02688 [pdf, other]
Title: Semi-supervised Symmetric Matrix Factorization with Low-Rank Tensor Representation
Subjects: Machine Learning (cs.LG)

Semi-supervised symmetric non-negative matrix factorization (SNMF) utilizes the available supervisory information (usually in the form of pairwise constraints) to improve the clustering ability of SNMF. The previous methods introduce the pairwise constraints from the local perspective, i.e., they either directly refine the similarity matrix element-wisely or restrain the distance of the decomposed vectors in pairs according to the pairwise constraints, which overlook the global perspective, i.e., in the ideal case, the pairwise constraint matrix and the ideal similarity matrix possess the same low-rank structure. To this end, we first propose a novel semi-supervised SNMF model by seeking low-rank representation for the tensor synthesized by the pairwise constraint matrix and a similarity matrix obtained by the product of the embedding matrix and its transpose, which could strengthen those two matrices simultaneously from a global perspective. We then propose an enhanced SNMF model, making the embedding matrix tailored to the above tensor low-rank representation. We finally refine the similarity matrix by the strengthened pairwise constraints. We repeat the above steps to continuously boost the similarity matrix and pairwise constraint matrix, leading to a high-quality embedding matrix. Extensive experiments substantiate the superiority of our method. The code is available at https://github.com/JinaLeejnl/TSNMF.

[197]  arXiv:2405.02692 [pdf, ps, other]
Title: Diffeomorphic Transformer-based Abdomen MRI-CT Deformable Image Registration
Comments: 18 pages and 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

This paper aims to create a deep learning framework that can estimate the deformation vector field (DVF) for directly registering abdominal MRI-CT images. The proposed method assumed a diffeomorphic deformation. By using topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model, abdominal motion can be accurately obtained and utilized for DVF estimation. The model integrated Swin transformers, which have demonstrated superior performance in motion tracking, into the convolutional neural network (CNN) for deformation feature extraction. The model was optimized using a cross-modality image similarity loss and a surface matching loss. To compute the image loss, a modality-independent neighborhood descriptor (MIND) was used between the deformed MRI and CT images. The surface matching loss was determined by measuring the distance between the warped coordinates of the surfaces of contoured structures on the MRI and CT images. The deformed MRI image was assessed against the CT image using the target registration error (TRE), Dice similarity coefficient (DSC), and mean surface distance (MSD) between the deformed contours of the MRI image and manual contours of the CT image. When compared to only rigid registration, DIR with the proposed method resulted in an increase of the mean DSC values of the liver and portal vein from 0.850 and 0.628 to 0.903 and 0.763, a decrease of the mean MSD of the liver from 7.216 mm to 3.232 mm, and a decrease of the TRE from 26.238 mm to 8.492 mm. The proposed deformable image registration method based on a diffeomorphic transformer provides an effective and efficient way to generate an accurate DVF from an MRI-CT image pair of the abdomen. It could be utilized in the current treatment planning workflow for liver radiotherapy.

[198]  arXiv:2405.02693 [pdf, ps, other]
Title: TV White Space and LTE Network Optimization towards Energy Efficiency in Suburban and Rural Scenarios
Journal-ref: IEEE Transactions on Broadcasting, vol. 64, no. 1, pp. 164-171, 2018
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

The radio spectrum is a limited resource. Demand for wireless communication services is increasing exponentially, stressing the availability of radio spectrum to accommodate new services. TV White Space (TVWS) technologies allow a dynamic usage of the spectrum. These technologies provide wireless connectivity, in the channels of the Very High Frequency (VHF) and Ultra High Frequency (UHF) television broadcasting bands. In this paper, we investigate and compare the coverage range, network capacity, and network energy efficiency for TVWS technologies and LTE. We consider Ghent, Belgium and Boyeros, Havana, Cuba to evaluate a realistic outdoor suburban and rural area, respectively. The comparison shows that TVWS networks have an energy efficiency 9-12 times higher than LTE networks.

[199]  arXiv:2405.02694 [pdf, ps, other]
Title: Multi-objective Optimization of Cognitive Radio Networks
Journal-ref: Computer Networks, Volume 184, 2021
Subjects: Networking and Internet Architecture (cs.NI)

New generation networks, based on Cognitive Radio technology, allow dynamic allocation of the spectrum, alleviating spectrum scarcity. These networks also have a resilient potential for dynamic operation for energy saving. In this paper, we present a novel wireless network optimization algorithm for cognitive radio networks based on a cloud sharing-decision mechanism. Three Key Performance Indicators (KPIs) were optimized: spectrum usage, power consumption, and exposure of human beings. For a realistic suburban scenario in Ghent city, Belgium, we determine the optimality among the KPIs. Compared to a traditional Cognitive Radio network design, our optimization algorithm for the cloud-based architecture reduced the network power consumption by 27.5%, the average global exposure by 34.3%, and spectrum usage by 34.5% at the same time. Even for the worst optimization case, our solution performs better than the traditional architecture by 4.8% in terms of network power consumption, 7.3% in terms of spectrum usage and 4.3% in terms of global exposure.

[200]  arXiv:2405.02695 [pdf, ps, other]
Title: Improved All-Pairs Approximate Shortest Paths in Congested Clique
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper, we present new algorithms for approximating All-Pairs Shortest Paths (APSP) in the Congested Clique model. We present randomized algorithms for weighted undirected graphs.
Our first contribution is an $O(1)$-approximate APSP algorithm taking just $O(\log \log \log n)$ rounds. Prior to our work, the fastest algorithms that give an $O(1)$-approximation for APSP take $\operatorname{poly}(\log{n})$ rounds in weighted undirected graphs, and $\operatorname{poly}(\log \log n)$ rounds in unweighted undirected graphs.
If we terminate the execution of the algorithm early, we obtain an $O(t)$-round algorithm that yields an $O \big( (\log n)^{1/2^t} \big) $ distance approximation for a parameter $t$. The trade-off between $t$ and the approximation quality provides flexibility for different scenarios, allowing the algorithm to adapt to specific requirements. In particular, we can get an $O \big( (\log n)^{1/2^t} \big) $-approximation for any constant $t$ in $O(1)$-rounds. Such result was previously known only for the special case that $t=0$.
A key ingredient in our algorithm is a lemma that allows to improve an $O(a)$-approximation for APSP to an $O(\sqrt{a})$-approximation for APSP in $O(1)$ rounds. To prove the lemma, we develop several new tools, including $O(1)$-round algorithms for computing the $k$ closest nodes, a certain type of hopset, and skeleton graphs.

[201]  arXiv:2405.02696 [pdf, other]
Title: DiffuseTrace: A Transparent and Flexible Watermarking Scheme for Latent Diffusion Model
Subjects: Cryptography and Security (cs.CR)

Latent Diffusion Models (LDMs) enable a wide range of applications but raise ethical concerns regarding illegal utilization.Adding watermarks to generative model outputs is a vital technique employed for copyright tracking and mitigating potential risks associated with AI-generated content. However, post-hoc watermarking techniques are susceptible to evasion. Existing watermarking methods for LDMs can only embed fixed messages. Watermark message alteration requires model retraining. The stability of the watermark is influenced by model updates and iterations. Furthermore, the current reconstruction-based watermark removal techniques utilizing variational autoencoders (VAE) and diffusion models have the capability to remove a significant portion of watermarks. Therefore, we propose a novel technique called DiffuseTrace. The goal is to embed invisible watermarks in all generated images for future detection semantically. The method establishes a unified representation of the initial latent variables and the watermark information through training an encoder-decoder model. The watermark information is embedded into the initial latent variables through the encoder and integrated into the sampling process. The watermark information is extracted by reversing the diffusion process and utilizing the decoder. DiffuseTrace does not rely on fine-tuning of the diffusion model components. The watermark is embedded into the image space semantically without compromising image quality. The encoder-decoder can be utilized as a plug-in in arbitrary diffusion models. We validate through experiments the effectiveness and flexibility of DiffuseTrace. DiffuseTrace holds an unprecedented advantage in combating the latest attacks based on variational autoencoders and Diffusion Models.

[202]  arXiv:2405.02698 [pdf, ps, other]
Title: Stable Diffusion Dataset Generation for Downstream Classification Tasks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.

[203]  arXiv:2405.02699 [pdf, other]
Title: Platform Competition in the Autobidding World
Subjects: Computer Science and Game Theory (cs.GT)

We study the problem of auction design for advertising platforms that face strategic advertisers who are bidding across platforms. Each advertiser's goal is to maximize their total value or conversions while satisfying some constraint(s) across all the platforms they participates in. In this paper, we focus on advertisers with return-over-investment (henceforth, ROI) constraints, i.e. each advertiser is trying to maximize value while making sure that their ROI across all platforms is no less than some target value. An advertiser interacts with the platforms through autobidders -- for each platform, the advertiser strategically chooses a target ROI to report to the platform's autobidder, which in turn uses a uniform bid multiplier to bid on the advertiser's behalf on the queries owned by the given platform.
Our main result is that for a platform trying to maximize revenue, competition with other platforms is a key factor to consider when designing their auction. While first-price auctions are optimal (for both revenue and welfare) in the absence of competition, this no longer holds true in multi-platform settings. We show that there exists a large class of advertiser valuations over queries such that, from the platform's perspective, running a second price auction dominates running a first price auction.
Furthermore, our analysis reveals the key factors influencing platform choice of auction format: (i) intensity of competition among advertisers, (ii) sensitivity of bid landscapes to an auction change (driven by advertiser sensitivity to price changes), and (iii) relative inefficiency of second-price auctions compared to first-price auctions.

[204]  arXiv:2405.02700 [pdf, other]
Title: Towards a Scalable Identification of Novel Modes in Generative Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

An interpretable comparison of generative models requires the identification of sample types produced more frequently by each of the involved models. While several quantitative scores have been proposed in the literature to rank different generative models, such score-based evaluations do not reveal the nuanced differences between the generative models in capturing various sample types. In this work, we propose a method called Fourier-based Identification of Novel Clusters (FINC) to identify modes produced by a generative model with a higher frequency in comparison to a reference distribution. FINC provides a scalable stochastic algorithm based on random Fourier features to estimate the eigenspace of kernel covariance matrices of two generative models and utilize the principal eigendirections to detect the sample types present more dominantly in each model. We demonstrate the application of the FINC method to standard computer vision datasets and generative model frameworks. Our numerical results suggest the scalability and efficiency of the developed Fourier-based method in highlighting the sample types captured with different frequencies by widely-used generative models.

[205]  arXiv:2405.02703 [pdf, other]
Title: Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework
Comments: In ACM Conference on Fairness, Accountability, and Transparency 2024. ACM, Rio de Janeiro, Brazil
Subjects: Computers and Society (cs.CY)

Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning. In response, this paper examines data practices in machine learning dataset development through the lens of data curation. We evaluate data practices in machine learning as data curation practices. To do so, we develop a framework for evaluating machine learning datasets using data curation concepts and principles through a rubric. Through a mixed-methods analysis of evaluation results for 25 ML datasets, we study the feasibility of data curation principles to be adopted for machine learning data work in practice and explore how data curation is currently performed. We find that researchers in machine learning, which often emphasizes model development, struggle to apply standard data curation principles. Our findings illustrate difficulties at the intersection of these fields, such as evaluating dimensions that have shared terms in both fields but non-shared meanings, a high degree of interpretative flexibility in adapting concepts without prescriptive restrictions, obstacles in limiting the depth of data curation expertise needed to apply the rubric, and challenges in scoping the extent of documentation dataset creators are responsible for. We propose ways to address these challenges and develop an overall framework for evaluation that outlines how data curation concepts and methods can inform machine learning data practices.

[206]  arXiv:2405.02705 [pdf, other]
Title: Peak Age of Information under Tandem of Queues
Comments: Accepted at IEEE ISIT'24
Subjects: Information Theory (cs.IT)

This paper considers a communication system where a source sends time-sensitive information to its destination via queues in tandem. We assume that the arrival process as well as the service process (of each server) are memoryless, and each of the servers has no buffer. For this setup, we develop a recursive framework to characterize the mean peak age of information (PAoI) under preemptive and non-preemptive policies with $N$ servers having different service rates. For the preemptive case, the proposed framework also allows to obtain mean age of information (AoI).

[207]  arXiv:2405.02710 [pdf, other]
Title: Enhancing News Summarization with ELearnFit through Efficient In-Context Learning and Efficient Fine-Tuning
Comments: 9 Pages
Subjects: Computation and Language (cs.CL)

With the deluge of information delivered by the daily news cycle, there is a growing need to effectively and efficiently summarize news feeds for quick consumption. We leverage large language models (LLMs), with their advanced learning and generative abilities as compared to conventional language models, to generate concise and coherent summaries for news articles from the XSum dataset. Our paper focuses on two key aspects of LLMs: Efficient in-context Learning (ELearn) and Parameter Efficient Fine-tuning (EFit). Under ELearn, we find that increasing the number of shots in prompts and utilizing simple templates generally improve the quality of summaries. We also find that utilizing relevant examples in few-shot learning for ELearn does not improve model performance. In addition, we studied EFit using different methods and demonstrate that fine-tuning the first layer of LLMs produces better outcomes as compared to fine-tuning other layers or utilizing LoRA. We also find that leveraging more relevant training samples using selective layers does not result in better performance. By combining ELearn and EFit, we create a new model (ELearnFit) that leverages the benefits of both few-shot learning and fine-tuning and produces superior performance to either model alone. We also use ELearnFit to highlight the trade-offs between prompting and fine-tuning, especially for situations where only a limited number of annotated samples are available. Ultimately, our research provides practical techniques to optimize news summarization during the prompting and fine-tuning stages and enhances the synthesis of news articles.

[208]  arXiv:2405.02711 [pdf, other]
Title: The Role of AI in Peer Support for Young People: A Study of Preferences for Human- and AI-Generated Responses
Journal-ref: Proceedings of the CHI Conference on Human Factors in Computing Systems 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Generative Artificial Intelligence (AI) is integrated into everyday technology, including news, education, and social media. AI has further pervaded private conversations as conversational partners, auto-completion, and response suggestions. As social media becomes young people's main method of peer support exchange, we need to understand when and how AI can facilitate and assist in such exchanges in a beneficial, safe, and socially appropriate way. We asked 622 young people to complete an online survey and evaluate blinded human- and AI-generated responses to help-seeking messages. We found that participants preferred the AI-generated response to situations about relationships, self-expression, and physical health. However, when addressing a sensitive topic, like suicidal thoughts, young people preferred the human response. We also discuss the role of training in online peer support exchange and its implications for supporting young people's well-being. Disclaimer: This paper includes sensitive topics, including suicide ideation. Reader discretion is advised.

[209]  arXiv:2405.02712 [pdf, other]
Title: CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
Subjects: Computation and Language (cs.CL)

Recently, Large Language Models (LLMs) have been demonstrated to possess impressive capabilities in a variety of domains and tasks. We investigate the issue of prompt design in the multi-turn text-to-SQL task and attempt to enhance the LLMs' reasoning capacity when generating SQL queries. In the conversational context, the current SQL query can be modified from the preceding SQL query with only a few operations due to the context dependency. We introduce our method called CoE-SQL which can prompt LLMs to generate the SQL query based on the previously generated SQL query with an edition chain. We also conduct extensive ablation studies to determine the optimal configuration of our approach. Our approach outperforms different in-context learning baselines stably and achieves state-of-the-art performances on two benchmarks SParC and CoSQL using LLMs, which is also competitive to the SOTA fine-tuned models.

[210]  arXiv:2405.02713 [pdf, other]
Title: Set Transformation: Trade-off Between Repair Bandwidth and Sub-packetization
Subjects: Information Theory (cs.IT)

Maximum distance separable (MDS) codes facilitate the achievement of elevated levels of fault tolerance in storage systems while incurring minimal redundancy overhead. Reed-Solomon (RS) codes are typical MDS codes with the sub-packetization level being one, however, they require large repair bandwidth defined as the total amount of symbols downloaded from other surviving nodes during single-node failure/repair. In this paper, we present the {\em set transformation}, which can transform any MDS code into set transformed code such that (i) the sub-packetization level is flexible and ranges from 2 to $(n-k)^{\lfloor\frac{n}{n-k}\rfloor}$ in which $n$ is the number of nodes and $k$ is the number of data nodes, (ii) the new code is MDS code, (iii) the new code has lower repair bandwidth for any single-node failure. We show that our set transformed codes have both lower repair bandwidth and lower field size than the existing related MDS array codes, such as elastic transformed codes \cite{10228984}. Specifically, our set transformed codes have $2\%-6.6\%$ repair bandwidth reduction compared with elastic transformed codes \cite{10228984} for the evaluated typical parameters.

[211]  arXiv:2405.02714 [pdf, other]
Title: Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.

[212]  arXiv:2405.02716 [pdf, other]
Title: Sign-Guided Bipartite Graph Hashing for Hamming Space Search
Authors: Xueyi Wu
Subjects: Information Retrieval (cs.IR)

Bipartite graph hashing (BGH) is extensively used for Top-K search in Hamming space at low storage and inference costs. Recent research adopts graph convolutional hashing for BGH and has achieved the state-of-the-art performance. However, the contributions of its various influencing factors to hashing performance have not been explored in-depth, including the same/different sign count between two binary embeddings during Hamming space search (sign property), the contribution of sub-embeddings at each layer (model property), the contribution of different node types in the bipartite graph (node property), and the combination of augmentation methods. In this work, we build a lightweight graph convolutional hashing model named LightGCH by mainly removing the augmentation methods of the state-of-the-art model BGCH. By analyzing the contributions of each layer and node type to performance, as well as analyzing the Hamming similarity statistics at each layer, we find that the actual neighbors in the bipartite graph tend to have low Hamming similarity at the shallow layer, and all nodes tend to have high Hamming similarity at the deep layers in LightGCH. To tackle these problems, we propose a novel sign-guided framework SGBGH to make improvement, which uses sign-guided negative sampling to improve the Hamming similarity of neighbors, and uses sign-aware contrastive learning to help nodes learn more uniform representations. Experimental results show that SGBGH outperforms BGCH and LightGCH significantly in embedding quality.

[213]  arXiv:2405.02717 [pdf, other]
Title: AFter: Attention-based Fusion Router for RGBT Tracking
Comments: Peer review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emph{ter} called AFter, which optimizes the fusion structure to adapt to the dynamic challenging scenarios, for robust RGBT tracking. In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure. Through optimizing the combination of attention-based fusion units, we can dynamically select the fusion structure to adapt to various challenging scenarios. Unlike complex search of different structures in neural architecture search algorithms, we develop a dynamic routing algorithm, which equips each attention-based fusion unit with a router, to predict the combination weights for efficient optimization of the fusion structure. Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of the proposed AFter against state-of-the-art RGBT trackers. We release the code in https://github.com/Alexadlu/AFter.

[214]  arXiv:2405.02719 [pdf, other]
Title: Active Signal Emitter Placement In Complex Environments
Comments: Submitted to RA-L
Subjects: Robotics (cs.RO)

Placement of electromagnetic signal emitting devices, such as light sources, has important usage in for signal coverage tasks. Automatic placement of these devices is challenging because of the complex interaction of the signal and environment due to reflection, refraction and scattering. In this work, we iteratively improve the placement of these devices by interleaving device placement and sensing actions, correcting errors in the model of the signal propagation. To this end, we propose a novel factor-graph based belief model which combines the measurements taken by the robot and an analytical light propagation model. This model allows accurately modelling the uncertainty of the light propagation with respect to the obstacles, which greatly improves the informative path planning routine. Additionally, we propose a method for determining when to re-plan the emitter placements to balance a trade-off between information about a specific configuration and frequent updating of the configuration. This method incorporates the uncertainty from belief model to adaptively determine when re-configuration is needed. We find that our system has a 9.8% median error reduction compared to a baseline system in simulations in the most difficult environment. We also run on-robot tests and determine that our system performs favorably compared to the baseline.

[215]  arXiv:2405.02724 [pdf, ps, other]
Title: Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning
Authors: Yingjie Fei, Ruitu Xu
Comments: 29 pages
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)

We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret.

[216]  arXiv:2405.02726 [pdf, other]
Title: A Mathematical Model of the Hidden Feedback Loop Effect in Machine Learning Systems
Comments: 21 pages, 15 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Widespread deployment of societal-scale machine learning systems necessitates a thorough understanding of the resulting long-term effects these systems have on their environment, including loss of trustworthiness, bias amplification, and violation of AI safety requirements. We introduce a repeated learning process to jointly describe several phenomena attributed to unintended hidden feedback loops, such as error amplification, induced concept drift, echo chambers and others. The process comprises the entire cycle of obtaining the data, training the predictive model, and delivering predictions to end-users within a single mathematical model. A distinctive feature of such repeated learning setting is that the state of the environment becomes causally dependent on the learner itself over time, thus violating the usual assumptions about the data distribution. We present a novel dynamical systems model of the repeated learning process and prove the limiting set of probability distributions for positive and negative feedback loop modes of the system operation. We conduct a series of computational experiments using an exemplary supervised learning problem on two synthetic data sets. The results of the experiments correspond to the theoretical predictions derived from the dynamical model. Our results demonstrate the feasibility of the proposed approach for studying the repeated learning processes in machine learning systems and open a range of opportunities for further research in the area.

[217]  arXiv:2405.02727 [pdf, other]
Title: Using finite automata to compute the base-$b$ representation of the golden ratio and other quadratic irrationals
Subjects: Formal Languages and Automata Theory (cs.FL); Discrete Mathematics (cs.DM); Number Theory (math.NT)

We show that the $n$'th digit of the base-$b$ representation of the golden ratio is a finite-state function of the Zeckendorf representation of $b^n$, and hence can be computed by a finite automaton. Similar results can be proven for any quadratic irrational. We use a satisfiability (SAT) solver to prove, in some cases, that the automata we construct are minimal.

[218]  arXiv:2405.02730 [pdf, other]
Title: U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers
Comments: 11 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the paper and conduct extensive experiments to demonstrate the extraordinary performance of U-DiT models. The proposed U-DiT could outperform DiT-XL/2 with only 1/6 of its computation cost. Codes are available at https://github.com/YuchuanTian/U-DiT.

[219]  arXiv:2405.02731 [pdf, other]
Title: Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles
Comments: 17 pages, 2 tables, 5 figures
Subjects: Machine Learning (cs.LG)

This systematic review focuses on anomaly detection for connected and autonomous vehicles. The initial database search identified 2160 articles, of which 203 were included in this review after rigorous screening and assessment. This study revealed that the most commonly used Artificial Intelligence (AI) algorithms employed in anomaly detection are neural networks like LSTM, CNN, and autoencoders, alongside one-class SVM. Most anomaly-based models were trained using real-world operational vehicle data, although anomalies, such as attacks and faults, were often injected artificially into the datasets. These models were evaluated mostly using five key evaluation metrics: recall, accuracy, precision, F1-score, and false positive rate. The most frequently used selection of evaluation metrics used for anomaly detection models were accuracy, precision, recall, and F1-score. This systematic review presents several recommendations. First, there is a need to incorporate multiple evaluation metrics to provide a comprehensive assessment of the anomaly detection models. Second, only a small proportion of the studies have made their models open source, indicating a need to share models publicly to facilitate collaboration within the research community, and to validate and compare findings effectively. Third, there is a need for benchmarking datasets with predefined anomalies or cyberattacks to test and improve the effectiveness of the proposed anomaly-based detection models. Furthermore, there is a need for future research to investigate the deployment of anomaly detection to a vehicle to assess its performance on the road. There is a notable lack of research done on intrusion detection systems using different protocols to CAN, such as Ethernet and FlexRay.

[220]  arXiv:2405.02732 [pdf, other]
Title: Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.

[221]  arXiv:2405.02734 [pdf, ps, other]
Title: A Diagramming Technique for Teaching Students to Read Software Engineering Research Papers: an experience report
Authors: Mary Shaw
Comments: 6 pages, 6 figures, working paper
Subjects: Software Engineering (cs.SE)

Reading scientific research papers is a skill that many students do not learn before entering PhD programs, but it is critical to their success. This paper describes our diagramming technique for teaching this skill, which helps them identify the structure and the scientific argument of the paper. This has made our students more effective readers.

[222]  arXiv:2405.02738 [pdf, other]
Title: Relations Prediction for Knowledge Graph Completion using Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Knowledge Graphs have been widely used to represent facts in a structured format. Due to their large scale applications, knowledge graphs suffer from being incomplete. The relation prediction task obtains knowledge graph completion by assigning one or more possible relations to each pair of nodes. In this work, we make use of the knowledge graph node names to fine-tune a large language model for the relation prediction task. By utilizing the node names only we enable our model to operate sufficiently in the inductive settings. Our experiments show that we accomplish new scores on a widely used knowledge graph benchmark.

[223]  arXiv:2405.02743 [pdf, other]
Title: Beyond Performance: Quantifying and Mitigating Label Bias in LLMs
Comments: NAACL 2024
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplored. In this study, we evaluate different approaches to quantifying label bias in a model's predictions, conducting a comprehensive investigation across 279 classification tasks and ten LLMs. Our investigation reveals substantial label bias in models both before and after debiasing attempts, as well as highlights the importance of outcomes-based evaluation metrics, which were not previously used in this regard. We further propose a novel label bias calibration method tailored for few-shot prompting, which outperforms recent calibration approaches for both improving performance and mitigating label bias. Our results emphasize that label bias in the predictions of LLMs remains a barrier to their reliability.

[224]  arXiv:2405.02745 [pdf, other]
Title: Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation
Comments: Accepted in ICML2024
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Existing works in federated learning (FL) often assume an ideal system with either full client or uniformly distributed client participation. However, in practice, it has been observed that some clients may never participate in FL training (aka incomplete client participation) due to a myriad of system heterogeneity factors. A popular approach to mitigate impacts of incomplete client participation is the server-assisted federated learning (SA-FL) framework, where the server is equipped with an auxiliary dataset. However, despite SA-FL has been empirically shown to be effective in addressing the incomplete client participation problem, there remains a lack of theoretical understanding for SA-FL. Meanwhile, the ramifications of incomplete client participation in conventional FL are also poorly understood. These theoretical gaps motivate us to rigorously investigate SA-FL. Toward this end, we first show that conventional FL is {\em not} PAC-learnable under incomplete client participation in the worst case. Then, we show that the PAC-learnability of FL with incomplete client participation can indeed be revived by SA-FL, which theoretically justifies the use of SA-FL for the first time. Lastly, to provide practical guidance for SA-FL training under {\em incomplete client participation}, we propose the $\mathsf{SAFARI}$ (server-assisted federated averaging) algorithm that enjoys the same linear convergence speedup guarantees as classic FL with ideal client participation assumptions, offering the first SA-FL algorithm with convergence guarantee. Extensive experiments on different datasets show $\mathsf{SAFARI}$ significantly improves the performance under incomplete client participation.

[225]  arXiv:2405.02749 [pdf, other]
Title: Sub-goal Distillation: A Method to Improve Small Language Agents
Subjects: Machine Learning (cs.LG)

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

[226]  arXiv:2405.02750 [pdf, other]
Title: Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding
Comments: Accepted to NAACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) tend to inadequately integrate input context during text generation, relying excessively on encoded prior knowledge in model parameters, potentially resulting in generated text with factual inconsistencies or contextually unfaithful content. LLMs utilize two primary knowledge sources: 1) prior (parametric) knowledge from pretraining, and 2) contextual (non-parametric) knowledge from input prompts. The study addresses the open question of how LLMs effectively balance these knowledge sources during the generation process, specifically in the context of open-domain question answering. To address this issue, we introduce a novel approach integrating contrastive decoding with adversarial irrelevant passages as negative samples to enhance robust context grounding during generation. Notably, our method operates at inference time without requiring further training. We conduct comprehensive experiments to demonstrate its applicability and effectiveness, providing empirical evidence showcasing its superiority over existing methodologies. Our code is publicly available at: https://github.com/amazon-science/ContextualUnderstanding-ContrastiveDecoding.

[227]  arXiv:2405.02751 [pdf, other]
Title: Deep Image Restoration For Image Anti-Forensics
Authors: Eren Tahir, Mert Bal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While image forensics is concerned with whether an image has been tampered with, image anti-forensics attempts to prevent image forensics methods from detecting tampered images. The competition between these two fields started long before the advancement of deep learning. JPEG compression, blurring and noising, which are simple methods by today's standards, have long been used for anti-forensics and have been the subject of much research in both forensics and anti-forensics. Although these traditional methods are old, they make it difficult to detect fake images and are used for data augmentation in training deep image forgery detection models. In addition to making the image difficult to detect, these methods leave traces on the image and consequently degrade the image quality. Separate image forensics methods have also been developed to detect these traces. In this study, we go one step further and improve the image quality after these methods with deep image restoration models and make it harder to detect the forged image. We evaluate the impact of these methods on image quality. We then test both our proposed methods with deep learning and methods without deep learning on the two best existing image manipulation detection models. In the obtained results, we show how existing image forgery detection models fail against the proposed methods. Code implementation will be publicly available at https://github.com/99eren99/DIRFIAF .

[228]  arXiv:2405.02754 [pdf, other]
Title: Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning
Comments: submissions to Journal of Artificial Intelligence Research. arXiv admin note: text overlap with arXiv:2308.13140
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95\% \pm 9\%$ cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.

[229]  arXiv:2405.02756 [pdf, other]
Title: Efficient Open Modification Spectral Library Searching in High-Dimensional Space with Multi-Level-Cell Memory
Comments: Accepted by DAC'24
Subjects: Hardware Architecture (cs.AR)

Open Modification Search (OMS) is a promising algorithm for mass spectrometry analysis that enables the discovery of modified peptides. However, OMS encounters challenges as it exponentially extends the search scope. Existing OMS accelerators either have limited parallelism or struggle to scale effectively with growing data volumes. In this work, we introduce an OMS accelerator utilizing multi-level-cell (MLC) RRAM memory to enhance storage capacity by 3x. Through in-memory computing, we achieve up to 77x faster data processing with two to three orders of magnitude better energy efficiency. Testing was done on a fabricated MLC RRAM chip. We leverage hyperdimensional computing to tolerate up to 10% memory errors while delivering massive parallelism in hardware.

[230]  arXiv:2405.02759 [pdf, other]
Title: Region-Aware Color Smudging
Subjects: Graphics (cs.GR)

Color smudge operations from digital painting software enable users to create natural shading effects in high-fidelity paintings by interactively mixing colors. To precisely control results in traditional painting software, users tend to organize flat-filled color regions in multiple layers and smudge them to generate different color gradients. However, the requirement to carefully deal with regions makes the smudging process time-consuming and laborious, especially for non-professional users. This motivates us to investigate how to infer user-desired smudging effects when users smudge over regions in a single layer. To investigate improving color smudge performance, we first conduct a formative study. Following the findings of this study, we design SmartSmudge, a novel smudge tool that offers users dynamical smudge brushes and real-time region selection for easily generating natural and efficient shading effects. We demonstrate the efficiency and effectiveness of the proposed tool via a user study and quantitative analysis

[231]  arXiv:2405.02760 [pdf, other]
Title: GTFS2STN: Analyzing GTFS Transit Data by Generating Spatiotemporal Transit Network
Comments: 8 pages, 8 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Social and Information Networks (cs.SI)

GTFS, the General Transit Feed Specialization, is an open standard format to record transit information used by thousands of transit agencies across the world. By converting a static GTFS transit network to a spatiotemporal network connecting bus stops over space and time, a preliminary tool named GTFS2STN is implemented to analyze the accessibility of the transit system. Furthermore, a simple application is built for users to generate spatiotemporal network online. The online tool also supports some basic analysis including generate isochrone maps given origin, generate travel time variability over time given a pair of origin and destination, etc. Results show that the tool has a similar result compared with Mapnificent, another open source endeavour to generate isochrone maps given GTFS inputs. Compared with Mapnificent, the proposed GTFS2STN tool is suited for research and evaluation purposes because the users can upload any historical GTFS dataset by any transit agencies to evaluate the accessibility and travel time variability of transit networks over time.

[232]  arXiv:2405.02762 [pdf, other]
Title: TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes
Comments: 8 pages, submitted to IROS2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

In this paper, we present a new approach to bridge the domain gap between synthetic and real-world data for un- manned aerial vehicle (UAV)-based perception. Our formu- lation is designed for dynamic scenes, consisting of moving objects or human actions, where the goal is to recognize the pose or actions. We propose an extension of K-Planes Neural Radiance Field (NeRF), wherein our algorithm stores a set of tiered feature vectors. The tiered feature vectors are generated to effectively model conceptual information about a scene as well as an image decoder that transforms output feature maps into RGB images. Our technique leverages the information amongst both static and dynamic objects within a scene and is able to capture salient scene attributes of high altitude videos. We evaluate its performance on challenging datasets, including Okutama Action and UG2, and observe considerable improvement in accuracy over state of the art aerial perception algorithms.

[233]  arXiv:2405.02763 [pdf, other]
Title: Can Nuanced Language Lead to More Actionable Insights? Exploring the Role of Generative AI in Analytical Narrative Structure
Comments: 5 pages, 1 figure
Subjects: Human-Computer Interaction (cs.HC)

Relevant language describing trends in data can be useful for generating summaries to help with readers' takeaways. However, the language employed in these often template-generated summaries tends to be simple, ranging from describing simple statistical information (e.g., extrema and trends) without additional context and richer language to provide actionable insights. Recent advances in Large Language Models (LLMs) have shown promising capabilities in capturing subtle nuances in language when describing information. This workshop paper specifically explores how LLMs can provide more actionable insights when describing trends by focusing on three dimensions of analytical narrative structure: semantic, rhetorical, and pragmatic. Building on prior research that examines visual and linguistic signatures for univariate line charts, we examine how LLMs can further leverage the semantic dimension of analytical narratives using quantified semantics to describe shapes in trends as people intuitively view them. These semantic descriptions help convey insights in a way that leads to a pragmatic outcome, i.e., a call to action, persuasion, warning vs. alert, and situational awareness. Finally, we identify rhetorical implications for how well these generated narratives align with the perceived shape of the data, thereby empowering users to make informed decisions and take meaningful actions based on these data insights.

[234]  arXiv:2405.02764 [pdf, other]
Title: Assessing Adversarial Robustness of Large Language Models: An Empirical Study
Comments: 16 pages, 9 figures, 10 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems.

[235]  arXiv:2405.02765 [pdf, other]
Title: Detecting Edited Knowledge in Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Knowledge editing techniques (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KE also faces potential malicious applications, e.g. inserting misinformation and toxic content. Moreover, in the context of responsible AI, it is instructive for end-users to know whether a generated output is driven by edited knowledge or first-hand knowledge from pre-training. To this end, we study detecting edited knowledge in language models by introducing a novel task: given an edited model and a specific piece of knowledge the model generates, our objective is to classify the knowledge as either "non-edited" (based on the pre-training), or ``edited'' (based on subsequent editing). We initiate the task with two state-of-the-art KEs, two language models, and two datasets. We further propose a simple classifier, RepReg, a logistic regression model that takes hidden state representations as input features. Our results reveal that RepReg establishes a strong baseline, achieving a peak accuracy of 99.81%, and 97.79% in out-of-domain settings. Second, RepReg achieves near-optimal performance with a limited training set (200 training samples), and it maintains its performance even in out-of-domain settings. Last, we find it more challenging to separate edited and non-edited knowledge when they contain the same subject or object.

[236]  arXiv:2405.02766 [pdf, other]
Title: Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning
Comments: Accepted at 3rd Conference on Lifelong Learning Agents (CoLLAs), 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

While humans excel at continual learning (CL), deep neural networks (DNNs) exhibit catastrophic forgetting. A salient feature of the brain that allows effective CL is that it utilizes multiple modalities for learning and inference, which is underexplored in DNNs. Therefore, we study the role and interactions of multiple modalities in mitigating forgetting and introduce a benchmark for multimodal continual learning. Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations. This makes the model less vulnerable to modality-specific regularities and considerably mitigates forgetting. Furthermore, we observe that individual modalities exhibit varying degrees of robustness to distribution shift. Finally, we propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality. Our method sets a strong baseline that enables both single- and multimodal inference. Our study provides a promising case for further exploring the role of multiple modalities in enabling CL and provides a standard benchmark for future research.

[237]  arXiv:2405.02769 [pdf, other]
Title: Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC)

This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this work, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all the agents in the multi-agent system, leading to a game between agents. We assume all agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies the agents are more rational and behave closer to Nash policies. On the other hand, agents with larger regularization acts more randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium, our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.

[238]  arXiv:2405.02770 [pdf, other]
Title: PhilHumans: Benchmarking Machine Learning for Personal Health
Subjects: Machine Learning (cs.LG)

The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leveraging HUman-MAchine Natural interactions (PhilHumans), a holistic suite of benchmarks for machine learning across different Healthcare settings - talk therapy, diet coaching, emergency care, intensive care, obstetric sonography - as well as different learning settings, such as action anticipation, timeseries modeling, insight mining, language modeling, computer vision, reinforcement learning and program synthesis

[239]  arXiv:2405.02771 [pdf, other]
Title: MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Comments: Data and code is available on the project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that multi-modal pretraining notably improves the linear probing performance, e.g. 4pp on BigEarthNet and 16pp on So2Sat, compared to pretraining on optical satellite images only. We show that this also leads to better label and parameter efficiency which are crucial aspects in global scale applications.

[240]  arXiv:2405.02772 [pdf, other]
Title: SkinGrip: An Adaptive Soft Robotic Manipulator with Capacitive Sensing for Whole-Limb Bathing Assistance
Subjects: Robotics (cs.RO)

Robotics presents a promising opportunity for enhancing bathing assistance, potentially to alleviate labor shortages and reduce care costs, while offering consistent and gentle care for individuals with physical disabilities. However, ensuring flexible and efficient cleaning of the human body poses challenges as it involves direct physical contact between the human and the robot, and necessitates simple, safe, and effective control. In this paper, we introduce a soft, expandable robotic manipulator with embedded capacitive proximity sensing arrays, designed for safe and efficient bathing assistance. We conduct a thorough evaluation of our soft manipulator, comparing it with a baseline rigid end effector in a human study involving 12 participants across $96$ bathing trails. Our soft manipulator achieves an an average cleaning effectiveness of 88.8% on arms and 81.4% on legs, far exceeding the performance of the baseline. Participant feedback further validates the manipulator's ability to maintain safety, comfort, and thorough cleaning.

[241]  arXiv:2405.02774 [pdf, other]
Title: Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Comments: Published as a conference paper at ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerging methods do cater to language data scales. However, they often prioritize data that aligns with the target distribution. While this strategy may be effective when training a model from scratch, it can yield limited results when the model has already been pre-trained on a different distribution. Differing from prior work, our key idea is to select data that nudges the pre-training distribution closer to the target distribution. We show the optimality of this approach for fine-tuning tasks under certain conditions. We demonstrate the efficacy of our methodology across a diverse array of tasks (NLU, NLG, zero-shot) with models up to 2.7B, showing that it consistently surpasses other selection methods. Moreover, our proposed method is significantly faster than existing techniques, scaling to millions of samples within a single GPU hour. Our code is open-sourced (Code repository: https://anonymous.4open.science/r/DV4LLM-D761/ ). While fine-tuning offers significant potential for enhancing performance across diverse tasks, its associated costs often limit its widespread adoption; with this work, we hope to lay the groundwork for cost-effective fine-tuning, making its benefits more accessible.

[242]  arXiv:2405.02778 [pdf, other]
Title: Improve Temporal Awareness of LLMs for Sequential Recommendation
Comments: 10 pages
Subjects: Information Retrieval (cs.IR)

Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. However, it is empirically found that LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data, such as sequential recommendation. In this paper, we aim to improve temporal awareness of LLMs by designing a principled prompting framework inspired by human cognitive processes. Specifically, we propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation. Besides, we emulate divergent thinking by aggregating LLM ranking results derived from these strategies. Evaluations on MovieLens-1M and Amazon Review datasets indicate that our proposed method significantly enhances the zero-shot capabilities of LLMs in sequential recommendation tasks.

[243]  arXiv:2405.02781 [pdf, other]
Title: Instantaneous Perception of Moving Objects in 3D
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The perception of 3D motion of surrounding traffic participants is crucial for driving safety. While existing works primarily focus on general large motions, we contend that the instantaneous detection and quantification of subtle motions is equally important as they indicate the nuances in driving behavior that may be safety critical, such as behaviors near a stop sign of parking positions. We delve into this under-explored task, examining its unique challenges and developing our solution, accompanied by a carefully designed benchmark. Specifically, due to the lack of correspondences between consecutive frames of sparse Lidar point clouds, static objects might appear to be moving - the so-called swimming effect. This intertwines with the true object motion, thereby posing ambiguity in accurate estimation, especially for subtle motions. To address this, we propose to leverage local occupancy completion of object point clouds to densify the shape cue, and mitigate the impact of swimming artifacts. The occupancy completion is learned in an end-to-end fashion together with the detection of moving objects and the estimation of their motion, instantaneously as soon as objects start to move. Extensive experiments demonstrate superior performance compared to standard 3D motion estimation approaches, particularly highlighting our method's specialized treatment of subtle motions.

[244]  arXiv:2405.02782 [pdf, ps, other]
Title: A self-supervised text-vision framework for automated brain abnormality detection
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors.

[245]  arXiv:2405.02785 [pdf, other]
Title: Fused attention mechanism-based ore sorting network
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning has had a significant impact on the identification and classification of mineral resources, especially playing a key role in efficiently and accurately identifying different minerals, which is important for improving the efficiency and accuracy of mining. However, traditional ore sorting meth- ods often suffer from inefficiency and lack of accuracy, especially in complex mineral environments. To address these challenges, this study proposes a method called OreYOLO, which incorporates an attentional mechanism and a multi-scale feature fusion strategy, based on ore data from gold and sul- fide ores. By introducing the progressive feature pyramid structure into YOLOv5 and embedding the attention mechanism in the feature extraction module, the detection performance and accuracy of the model are greatly improved. In order to adapt to the diverse ore sorting scenarios and the deployment requirements of edge devices, the network structure is designed to be lightweight, which achieves a low number of parameters (3.458M) and computational complexity (6.3GFLOPs) while maintaining high accuracy (99.3% and 99.2%, respectively). In the experimental part, a target detection dataset containing 6000 images of gold and sulfuric iron ore is constructed for gold and sulfuric iron ore classification training, and several sets of comparison experiments are set up, including the YOLO series, EfficientDet, Faster-RCNN, and CenterNet, etc., and the experiments prove that OreYOLO outperforms the commonly used high-performance object detection of these architectures

[246]  arXiv:2405.02787 [pdf, ps, other]
Title: Light Field Spatial Resolution Enhancement Framework
Comments: 5 pages, 6 figures, accepted in IEEE Conference on Signal Processing and Communications Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Light field (LF) imaging captures both angular and spatial light distributions, enabling advanced photographic techniques. However, micro-lens array (MLA)- based cameras face a spatial-angular resolution tradeoff due to a single shared sensor. We propose a novel light field framework for resolution enhancement, employing a modular approach. The first module generates a high-resolution, all-in-focus image. The second module, a texture transformer network, enhances the resolution of each light field perspective independently using the output of the first module as a reference image. The final module leverages light field regularity to jointly improve resolution across all LF image perspectives. Our approach demonstrates superior performance to existing methods in both qualitative and quantitative evaluations.

[247]  arXiv:2405.02790 [pdf, other]
Title: Confidential and Protected Disease Classifier using Fully Homomorphic Encryption
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

With the rapid surge in the prevalence of Large Language Models (LLMs), individuals are increasingly turning to conversational AI for initial insights across various domains, including health-related inquiries such as disease diagnosis. Many users seek potential causes on platforms like ChatGPT or Bard before consulting a medical professional for their ailment. These platforms offer valuable benefits by streamlining the diagnosis process, alleviating the significant workload of healthcare practitioners, and saving users both time and money by avoiding unnecessary doctor visits. However, Despite the convenience of such platforms, sharing personal medical data online poses risks, including the presence of malicious platforms or potential eavesdropping by attackers. To address privacy concerns, we propose a novel framework combining FHE and Deep Learning for a secure and private diagnosis system. Operating on a question-and-answer-based model akin to an interaction with a medical practitioner, this end-to-end secure system employs Fully Homomorphic Encryption (FHE) to handle encrypted input data. Given FHE's computational constraints, we adapt deep neural networks and activation functions to the encryted domain. Further, we also propose a faster algorithm to compute summation of ciphertext elements. Through rigorous experiments, we demonstrate the efficacy of our approach. The proposed framework achieves strict security and privacy with minimal loss in performance.

[248]  arXiv:2405.02791 [pdf, other]
Title: Efficient Text-driven Motion Generation via Latent Consistency Training
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent representations that are bounded, regular, and well-reconstructed compared to traditional variational constraints. Furthermore, we propose a conditional PF-ODE trajectory simulation method, which improves the conditional generation performance with minimal additional training costs. Extensive experiments on two human motion generation benchmarks show that the proposed model achieves state-of-the-art performance with less than 10\% time cost.

[249]  arXiv:2405.02792 [pdf, ps, other]
Title: Jointly Learning Spatial, Angular, and Temporal Information for Enhanced Lane Detection
Comments: 5 pages, 3 Figures , Accepted IEEE Conference on Signal Processing and Communications Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a novel approach for enhanced lane detection by integrating spatial, angular, and temporal information through light field imaging and novel deep learning models. Utilizing lenslet-inspired 2D light field representations and LSTM networks, our method significantly improves lane detection in challenging conditions. We demonstrate the efficacy of this approach with modified CNN architectures, showing superior per- formance over traditional methods. Our findings suggest this integrated data approach could advance lane detection technologies and inspire new models that leverage these multidimensional insights for autonomous vehicle percep- tion.

[250]  arXiv:2405.02793 [pdf, other]
Title: ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Comments: Webpage (this https URL), GitHub (this https URL), HuggingFace (this https URL)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replete with missing information, visual inconsistencies, and hallucinations. To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness. Our dataset significantly improves across these dimensions compared to recently released datasets (+66%) and GPT-4V outputs (+48%). Furthermore, models fine-tuned with IIW data excel by +31% against prior work along the same human evaluation dimensions. Given our fine-tuned models, we also evaluate text-to-image generation and vision-language reasoning. Our model's descriptions can generate images closest to the original, as judged by both automated and human metrics. We also find our model produces more compositionally rich descriptions, outperforming the best baseline by up to 6% on ARO, SVO-Probes, and Winoground datasets.

[251]  arXiv:2405.02794 [pdf, other]
Title: Octopi: Object Property Reasoning with Large Tactile-Language Models
Comments: 17 pages
Subjects: Robotics (cs.RO)

Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they are limited to physical properties that can be inferred from visual or language inputs. In this work, we investigate combining tactile perception with language, which enables embodied systems to obtain physical properties through interaction and apply common-sense reasoning. We contribute a new dataset PhysiCleAR, which comprises both physical/property reasoning tasks and annotated tactile videos obtained using a GelSight tactile sensor. We then introduce Octopi, a system that leverages both tactile representation learning and large vision-language models to predict and reason about tactile inputs with minimal language fine-tuning. Our evaluations on PhysiCleAR show that Octopi is able to effectively use intermediate physical property predictions to improve physical reasoning in both trained tasks and for zero-shot reasoning. PhysiCleAR and Octopi are available on https://github.com/clear-nus/octopi.

[252]  arXiv:2405.02795 [pdf, other]
Title: Graph as Point Set
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)

Graph is a fundamental data structure to model interconnections between entities. Set, on the contrary, stores independent elements. To learn graph representations, current Graph Neural Networks (GNNs) primarily use message passing to encode the interconnections. In contrast, this paper introduces a novel graph-to-set conversion method that bijectively transforms interconnected nodes into a set of independent points and then uses a set encoder to learn the graph representation. This conversion method holds dual significance. Firstly, it enables using set encoders to learn from graphs, thereby significantly expanding the design space of GNNs. Secondly, for Transformer, a specific set encoder, we provide a novel and principled approach to inject graph information losslessly, different from all the heuristic structural/positional encoding methods adopted in previous graph transformers. To demonstrate the effectiveness of our approach, we introduce Point Set Transformer (PST), a transformer architecture that accepts a point set converted from a graph as input. Theoretically, PST exhibits superior expressivity for both short-range substructure counting and long-range shortest path distance tasks compared to existing GNNs. Extensive experiments further validate PST's outstanding real-world performance. Besides Transformer, we also devise a Deepset-based set encoder, which achieves performance comparable to representative GNNs, affirming the versatility of our graph-to-set method.

[253]  arXiv:2405.02797 [pdf, other]
Title: Adapting to Distribution Shift by Visual Domain Prompt Generation
Comments: ICLR2024, code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. Additionally, domain-centric designs are not flavored in their works. Furthermore, they employ the process of modelling source domains and the process of learning to adapt independently into disjoint training stages. In this work, we propose an approach on top of the pre-computed features of the foundation model. Specifically, we build a knowledge bank to learn the transferable knowledge from source domains. Conditioned on few-shot target data, we introduce a domain prompt generator to condense the knowledge bank into a domain-specific prompt. The domain prompt then directs the visual features towards a particular domain via a guidance module. Moreover, we propose a domain-aware contrastive loss and employ meta-learning to facilitate domain knowledge extraction. Extensive experiments are conducted to validate the domain knowledge extraction. The proposed method outperforms previous work on 5 large-scale benchmarks including WILDS and DomainNet.

[254]  arXiv:2405.02798 [pdf, other]
Title: Structural Balance in Real-World Social Networks: Incorporating Direction and Transitivity in Measuring Partial Balance
Comments: arXiv admin note: text overlap with arXiv:2006.02565
Subjects: Social and Information Networks (cs.SI)

Structural balance theory predicts that triads in networks gravitate towards stable configurations. The theory has been verified for undirected graphs. Since real-world networks are often directed, we introduce a novel method for considering both transitivity and sign consistency for evaluating partial balance in signed digraphs. We test our approach on graphs constructed by using different methods for identifying edge signs: natural language processing to infer signs from underlying text data, and self-reported survey data. Our results show that for various social contexts and edge sign detection methods, partial balance of these digraphs are moderately high, ranging from 61% to 96%. Our approach not only enhances the theoretical framework of structural balance but also provides practical insights into the stability of social networks, enabling a deeper understanding of interpersonal and group dynamics across different communication platforms.

[255]  arXiv:2405.02800 [pdf, other]
Title: Designing Distinguishable Mid-Air Ultrasound Tactons with Temporal Parameters
Subjects: Human-Computer Interaction (cs.HC)

Mid-air ultrasound technology offers new design opportunities for contactless tactile patterns (i.e., Tactons) in user applications. Yet, few guidelines exist for making ultrasound Tactons easy to distinguish for users. In this paper, we investigated the distinguishability of temporal parameters of ultrasound Tactons in five studies (n=72 participants). Study 1 established the discrimination thresholds for amplitude-modulated (AM) frequencies. In Studies 2-5, we investigated distinguishable ultrasound Tactons by creating four Tacton sets based on mechanical vibrations in the literature and collected similarity ratings for the ultrasound Tactons. We identified a subset of temporal parameters, such as rhythm and low envelope frequency, that could create distinguishable ultrasound Tactons. Also, a strong correlation (mean Spearman's $\rho$=0.75) existed between similarity ratings for ultrasound Tactons and similarities of mechanical Tactons from the literature, suggesting vibrotactile designers can transfer their knowledge to ultrasound design. We present design guidelines and future directions for creating distinguishable mid-air ultrasound Tactons.

[256]  arXiv:2405.02801 [pdf, other]
Title: Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Comments: 7 pages, 2 figures, submitted to ACM MM 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the cross-modality inputs, such as images, videos and text. Mozart's Touch is composed of three main components: Multi-modal Captioning Module, Large Language Model (LLM) Understanding & Bridging Module, and Music Generation Module. Unlike traditional approaches, Mozart's Touch requires no training or fine-tuning pre-trained models, offering efficiency and transparency through clear, interpretable prompts. We also introduce "LLM-Bridge" method to resolve the heterogeneous representation problems between descriptive texts of different modalities. We conduct a series of objective and subjective evaluations on the proposed model, and results indicate that our model surpasses the performance of current state-of-the-art models. Our codes and examples is availble at: https://github.com/WangTooNaive/MozartsTouch

[257]  arXiv:2405.02803 [pdf, other]
Title: Is Flash Attention Stable?
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of instability during training, often taking the form of loss spikes. Numeric deviation has emerged as a potential cause of this training instability, although quantifying this is especially challenging given the costly nature of training runs. In this work, we develop a principled approach to understanding the effects of numeric deviation, and construct proxies to put observations into context when downstream effects are difficult to quantify. As a case study, we apply this framework to analyze the widely-adopted Flash Attention optimization. We find that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16 when measured during an isolated forward pass. We then use a data-driven analysis based on the Wasserstein Distance to provide upper bounds on how this numeric deviation impacts model weights during training, finding that the numerical deviation present in Flash Attention is 2-5 times less significant than low-precision training.

[258]  arXiv:2405.02805 [pdf, other]
Title: Verlet Flows: Exact-Likelihood Integrators for Flow-Based Generative Models
Comments: ICLR AI4DifferentialEqautions In Science workshop 2024
Subjects: Machine Learning (cs.LG)

Approximations in computing model likelihoods with continuous normalizing flows (CNFs) hinder the use of these models for importance sampling of Boltzmann distributions, where exact likelihoods are required. In this work, we present Verlet flows, a class of CNFs on an augmented state-space inspired by symplectic integrators from Hamiltonian dynamics. When used with carefully constructed Taylor-Verlet integrators, Verlet flows provide exact-likelihood generative models which generalize coupled flow architectures from a non-continuous setting while imposing minimal expressivity constraints. On experiments over toy densities, we demonstrate that the variance of the commonly used Hutchinson trace estimator is unsuitable for importance sampling, whereas Verlet flows perform comparably to full autograd trace computations while being significantly faster.

[259]  arXiv:2405.02807 [pdf, ps, other]
Title: Kinematic analysis of structural mechanics based on convolutional neural network
Comments: 9 pages, 13 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Attempt to use convolutional neural network to achieve kinematic analysis of plane bar structure. Through 3dsMax animation software and OpenCV module, self-build image dataset of geometrically stable system and geometrically unstable system. we construct and train convolutional neural network model based on the TensorFlow and Keras deep learning platform framework. The model achieves 100% accuracy on the training set, validation set, and test set. The accuracy on the additional test set is 93.7%, indicating that convolutional neural network can learn and master the relevant knowledge of kinematic analysis of structural mechanics. In the future, the generalization ability of the model can be improved through the diversity of dataset, which has the potential to surpass human experts for complex structures. Convolutional neural network has certain practical value in the field of kinematic analysis of structural mechanics. Using visualization technology, we reveal how convolutional neural network learns and recognizes structural features. Using pre-trained VGG16 model for feature extraction and fine-tuning, we found that the generalization ability is inferior to the self-built model.

[260]  arXiv:2405.02808 [pdf, other]
Title: An Interactive Tool for Simulating Mid-Air Ultrasound Tactons on the Skin
Subjects: Human-Computer Interaction (cs.HC)

Mid-air ultrasound haptic technology offers a myriad of temporal and spatial parameters for contactless haptic design. Yet, predicting how these parameters interact to render an ultrasound signal is difficult before testing them on a mid-air ultrasound haptic device. Thus, haptic designers often use a trial-and-error process with different parameter combinations to obtain desired tactile patterns (i.e., Tactons) for user applications. We propose an interactive tool with five temporal and three spatiotemporal design parameters that can simulate the temporal and spectral properties of stimulation at specific skin points. As a preliminary verification, we measured vibrations induced from the ultrasound Tactons varying on one temporal and two spatiotemporal parameters. The measurements and simulation showed similar results for three different ultrasound rendering techniques, suggesting the efficacy of the simulation tool. We present key insights from the simulation and discuss future directions for enhancing the capabilities of simulations.

[261]  arXiv:2405.02809 [pdf, other]
Title: Does Optimal Control Always Benefit from Better Prediction? An Analysis Framework for Predictive Optimal Control
Subjects: Systems and Control (eess.SY)

The ``prediction + optimal control'' scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer questions like ``what if the predictions are wrong''. This paper presents an analysis framework for predictive optimal control where the subjective belief about the future is no longer considered perfect. A novel concept called the hidden prediction state is proposed to establish connections among the predictors, the subjective beliefs, the control policies and the objective control performance. Based on this framework, the predictor evaluation problem is analyzed. Three commonly-used predictor evaluation measures, including the mean squared error, the regret and the log-likelihood, are considered. It is shown that neither using the mean square error nor using the likelihood can guarantee a monotonic relationship between the predictor error and the optimal control cost. To guarantee control cost improvement, it is suggested the predictor should be evaluated with the control performance, e.g., using the optimal control cost or the regret to evaluate predictors. Numerical examples and examples from automotive applications with real-world driving data are provided to illustrate the ideas and the results.

[262]  arXiv:2405.02810 [pdf, other]
Title: Adaptive deep density approximation for stochastic dynamical systems
Comments: 24 pages, 13 figures
Subjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)

In this paper we consider adaptive deep neural network approximation for stochastic dynamical systems. Based on the Liouville equation associated with the stochastic dynamical systems, a new temporal KRnet (tKRnet) is proposed to approximate the probability density functions (PDFs) of the state variables. The tKRnet gives an explicit density model for the solution of the Liouville equation, which alleviates the curse of dimensionality issue that limits the application of traditional grid based numerical methods. To efficiently train the tKRnet, an adaptive procedure is developed to generate collocation points for the corresponding residual loss function, where samples are generated iteratively using the approximate density function at each iteration. A temporal decomposition technique is also employed to improve the long-time integration. Theoretical analysis of our proposed method is provided, and numerical examples are presented to demonstrate its performance.

[263]  arXiv:2405.02811 [pdf, other]
Title: PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection. Our key idea is to replace the PointNet pooling operation with an attention module, leading to a better point-to-voxel aggregation function. Our design respects the permutation invariance of sparse 3D points while being more expressive than the pooling-based PointNet. Experimental results show our PVTransformer achieves much better performance compared to the latest 3D object detectors. On the widely used Waymo Open Dataset, our PVTransformer achieves state-of-the-art 76.5 mAPH L2, outperforming the prior art of SWFormer by +1.7 mAPH L2.

[264]  arXiv:2405.02813 [pdf, ps, other]
Title: Model Predictive Control for Joint Ramping and Regulation-Type Service from Distributed Energy Resource Aggregations
Comments: 10 pages, 3 figures, to be presented at IEEE PES GM 2024
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Distributed energy resources (DERs) such as grid-responsive loads and batteries can be harnessed to provide ramping and regulation services across the grid. This paper concerns the problem of optimal allocation of different classes of DERs, where each class is an aggregation of similar DERs, to balance net-demand forecasts. The resulting resource allocation problem is solved using model-predictive control (MPC) that utilizes a rolling sequence of finite time-horizon constrained optimizations. This is based on the concept that we have more accurate estimates of the load forecast in the short term, so each optimization in the rolling sequence of optimization problems uses more accurate short term load forecasts while ensuring satisfaction of capacity and dynamical constraints. Simulations demonstrate that the MPC solution can indeed reduce the ramping required from bulk generation, while mitigating near-real time grid disturbances.

[265]  arXiv:2405.02814 [pdf, other]
Title: NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli
Comments: This paper has been accepted by IJCAI 2024
Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12.89% in Instruction Induction tasks and 46.25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https://github.com/wangxu0820/NegativePrompt.

[266]  arXiv:2405.02815 [pdf, other]
Title: Region-specific Risk Quantification for Interpretable Prognosis of COVID-19
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The COVID-19 pandemic has strained global public health, necessitating accurate diagnosis and intervention to control disease spread and reduce mortality rates. This paper introduces an interpretable deep survival prediction model designed specifically for improved understanding and trust in COVID-19 prognosis using chest X-ray (CXR) images. By integrating a large-scale pretrained image encoder, Risk-specific Grad-CAM, and anatomical region detection techniques, our approach produces regional interpretable outcomes that effectively capture essential disease features while focusing on rare but critical abnormal regions. Our model's predictive results provide enhanced clarity and transparency through risk area localization, enabling clinicians to make informed decisions regarding COVID-19 diagnosis with better understanding of prognostic insights. We evaluate the proposed method on a multi-center survival dataset and demonstrate its effectiveness via quantitative and qualitative assessments, achieving superior C-indexes (0.764 and 0.727) and time-dependent AUCs (0.799 and 0.691). These results suggest that our explainable deep survival prediction model surpasses traditional survival analysis methods in risk prediction, improving interpretability for clinical decision making and enhancing AI system trustworthiness.

[267]  arXiv:2405.02816 [pdf, other]
Title: Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization
Comments: To appear in the proceedings of SIGIR 2024
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

This paper introduces Stochastic RAG--a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence, made in most prior work. Stochastic RAG casts the retrieval process in RAG as a stochastic sampling without replacement process. Through this formulation, we employ straight-through Gumbel-top-k that provides a differentiable approximation for sampling without replacement and enables effective end-to-end optimization for RAG. We conduct extensive experiments on seven diverse datasets on a wide range of tasks, from open-domain question answering to fact verification to slot-filling for relation extraction and to dialogue systems. By applying this optimization method to a recent and effective RAG model, we advance state-of-the-art results on six out of seven datasets.

[268]  arXiv:2405.02817 [pdf, other]
Title: HuixiangDou-CR: Coreference Resolution in Group Chats
Authors: Huanjun Kong
Comments: 5 pages, 3 tables, 3 figures
Subjects: Computation and Language (cs.CL)

How to eliminate pronominal reference in group chats? In this work, we have preprocessed 58k authentic chat data and manually annotated 2.3k questions. The reliability of this annotation was confirmed by the scaling law. After this, we conducted fine-tuning on Qwen models, ranging from 0.5B to 32B parameters. The optimal version improved 29.07 in F1 score. This confirms the viability of fine-tuning Large Language Model (LLM) for downstream Natural Language Processing (NLP) tasks. Our contributions are: 1) Created Supervised Fine-Tuning (SFT) training data in alpaca format, along with a set of Low-Rank Adaptation (LoRA) weights, and 2) Developed a method for acquiring high-quality data leveraging scaling law principle. The script, raw data with alpaca format and experiments track are open-sourced on Github https://github.com/InternLM/HuixiangDou/tree/main/web/tools, HuggingFace https://huggingface.co/tpoisonooo and WandB https://wandb.ai/tpoisonooo/huixiangdou-cr/table?nw=nwusertpoisonooo . The privacy of the data involved has been authorized by users.

[269]  arXiv:2405.02818 [pdf, other]
Title: Site-Specific Deployment Optimization of Intelligent Reflecting Surface for Coverage Enhancement
Comments: 7 pages, 7 figures. To appear in VTC2024-Spring
Subjects: Information Theory (cs.IT)

Intelligent Reflecting Surface (IRS) is a promising technology for next generation wireless networks. Despite substantial research in IRS-aided communications, the assumed antenna and channel models are typically simplified without considering site-specific characteristics, which in turn critically affect the IRS deployment and performance in a given environment. In this paper, we first investigate the link-level performance of active or passive IRS taking into account the IRS element radiation pattern (ERP) as well as the antenna radiation pattern of the access point (AP). Then the network-level coverage performance is evaluated/optimized in site-specific multi-building scenarios, by properly deploying multiple IRSs on candidate building facets to serve a given set of users or Points of Interests (PoIs). The problem is reduced to an integer linear programming (ILP) based on given link-level metrics, which is then solved efficiently under moderate network sizes. Numerical results confirm the impact of AP antenna/IRS element pattern on the link-level performance. In addition, it is found that active IRSs, though associated with higher hardware complexity and cost, significantly improve the site-specific network coverage performance in terms of average ergodic rate and fairness among the PoIs as well as the range of serving area, compared with passive IRSs that have a much larger number of elements.

[270]  arXiv:2405.02821 [pdf, other]
Title: Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)

Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation. We first validate our design choice in the SoundSpaces simulator and show improvement on the Continuous AudioGoal navigation benchmark. We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input. We further propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured spectral difference and the energy distribution of the received audio, which improves the performance on the real data. Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects. This work demonstrates the potential of building intelligent agents that can see, hear, and act entirely from simulation, and transferring them to the real world.

[271]  arXiv:2405.02823 [pdf, other]
Title: Reconfigurable Massive MIMO: Precoding Design and Channel Estimation in the Electromagnetic Domain
Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Reconfigurable massive multiple-input multiple-output (RmMIMO) technology offers increased flexibility for future communication systems by exploiting previously untapped degrees of freedom in the electromagnetic (EM) domain. The representation of the traditional spatial domain channel state information (sCSI) limits the insights into the potential of EM domain channel properties, constraining the base station's (BS) utmost capability for precoding design. This paper leverages the EM domain channel state information (eCSI) for radiation pattern design at the BS. We develop an orthogonal decomposition method based on spherical harmonic functions to decompose the radiation pattern into a linear combination of orthogonal bases. By formulating the radiation pattern design as an optimization problem for the projection coefficients over these bases, we develop a manifold optimization-based method for iterative radiation pattern and digital precoder design. To address the eCSI estimation problem, we capitalize on the inherent structure of the channel. Specifically, we propose a subspace-based scheme to reduce the pilot overhead for wideband sCSI estimation. Given the estimated full-band sCSI, we further employ parameterized methods for angle of arrival estimation. Subsequently, the complete eCSI can be reconstructed after estimating the equivalent channel gain via the least squares method. Simulation results demonstrate that, in comparison to traditional mMIMO systems with fixed antenna radiation patterns, the proposed RmMIMO architecture offers significant throughput gains for multi-user transmission at a low channel estimation overhead.

[272]  arXiv:2405.02824 [pdf, other]
Title: Adaptive Guidance Learning for Camouflaged Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Camouflaged object detection (COD) aims to segment objects visually embedded in their surroundings, which is a very challenging task due to the high similarity between the objects and the background. To address it, most methods often incorporate additional information (e.g., boundary, texture, and frequency clues) to guide feature learning for better detecting camouflaged objects from the background. Although progress has been made, these methods are basically individually tailored to specific auxiliary cues, thus lacking adaptability and not consistently achieving high segmentation performance. To this end, this paper proposes an adaptive guidance learning network, dubbed \textit{AGLNet}, which is a unified end-to-end learnable model for exploring and adapting different additional cues in CNN models to guide accurate camouflaged feature learning. Specifically, we first design a straightforward additional information generation (AIG) module to learn additional camouflaged object cues, which can be adapted for the exploration of effective camouflaged features. Then we present a hierarchical feature combination (HFC) module to deeply integrate additional cues and image features to guide camouflaged feature learning in a multi-level fusion manner.Followed by a recalibration decoder (RD), different features are further aggregated and refined for accurate object prediction. Extensive experiments on three widely used COD benchmark datasets demonstrate that the proposed method achieves significant performance improvements under different additional cues, and outperforms the recent 20 state-of-the-art methods by a large margin. Our code will be made publicly available at: \textcolor{blue}{{https://github.com/ZNan-Chen/AGLNet}}.

[273]  arXiv:2405.02826 [pdf, other]
Title: Nip in the Bud: Forecasting and Interpreting Post-exploitation Attacks in Real-time through Cyber Threat Intelligence Reports
Subjects: Cryptography and Security (cs.CR)

Advanced Persistent Threat (APT) attacks have caused significant damage worldwide. Various Endpoint Detection and Response (EDR) systems are deployed by enterprises to fight against potential threats. However, EDR suffers from high false positives. In order not to affect normal operations, analysts need to investigate and filter detection results before taking countermeasures, in which heavy manual labor and alarm fatigue cause analysts miss optimal response time, thereby leading to information leakage and destruction. Therefore, we propose Endpoint Forecasting and Interpreting (EFI), a real-time attack forecast and interpretation system, which can automatically predict next move during post-exploitation and explain it in technique-level, then dispatch strategies to EDR for advance reinforcement. First, we use Cyber Threat Intelligence (CTI) reports to extract the attack scene graph (ASG) that can be mapped to low-level system logs to strengthen attack samples. Second, we build a serialized graph forecast model, which is combined with the attack provenance graph (APG) provided by EDR to generate an attack forecast graph (AFG) to predict the next move. Finally, we utilize the attack template graph (ATG) and graph alignment plus algorithm for technique-level interpretation to automatically dispatch strategies for EDR to reinforce system in advance. EFI can avoid the impact of existing EDR false positives, and can reduce the attack surface of system without affecting the normal operations. We collect a total of 3,484 CTI reports, generate 1,429 ASGs, label 8,000 sentences, tag 10,451 entities, and construct 256 ATGs. Experimental results on both DARPA Engagement and large scale CTI dataset show that the alignment score between the AFG predicted by EFI and the real attack graph is able to exceed 0.8, the forecast and interpretation precision of EFI can reach 91.8%.

[274]  arXiv:2405.02827 [pdf, other]
Title: Probabilistic tube-based control synthesis of stochastic multi-agent systems under signal temporal logic
Comments: Submitted to CDC24
Subjects: Systems and Control (eess.SY)

We consider the control design of stochastic discrete-time linear multi-agent systems (MASs) under a global signal temporal logic (STL) specification to be satisfied at a predefined probability. By decomposing the dynamics into deterministic and error components, we construct a probabilistic reachable tube (PRT) as the Cartesian product of reachable sets of the individual error systems driven by disturbances lying in confidence regions (CRs) with a fixed probability. By bounding the PRT probability with the specification probability, we tighten all state constraints induced by the STL specification by solving tractable optimization problems over segments of the PRT, and convert the underlying stochastic problem into a deterministic one. This approach reduces conservatism compared to tightening guided by the STL structure. Additionally, we propose a recursively feasible algorithm to attack the resulting problem by decomposing it into agent-level subproblems, which are solved iteratively according to a scheduling policy. We demonstrate our method on a ten-agent system, where existing approaches are impractical.

[275]  arXiv:2405.02828 [pdf, other]
Title: Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy
Comments: arXiv admin note: substantial text overlap with arXiv:2305.03803
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)

Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization.
This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design.

[276]  arXiv:2405.02829 [pdf, ps, other]
Title: An FPT Algorithm for the Exact Matching Problem and NP-hardness of Related Problems
Comments: 13 pages
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

The exact matching problem is a constrained variant of the maximum matching problem: given a graph with each edge having a weight $0$ or $1$ and an integer $k$, the goal is to find a perfect matching of weight exactly $k$. Mulmuley, Vazirani, and Vazirani (1987) proposed a randomized polynomial-time algorithm for this problem, and it is still open whether it can be derandomized. Very recently, El Maalouly, Steiner, and Wulf (2023) showed that for bipartite graphs there exists a deterministic FPT algorithm parameterized by the (bipartite) independence number. In this paper, by extending a part of their work, we propose a deterministic FPT algorithm in general parameterized by the minimum size of an odd cycle transversal in addition to the (bipartite) independence number. We also consider a relaxed problem called the correct parity matching problem, and show that a slight generalization of an equivalent problem is NP-hard.

[277]  arXiv:2405.02830 [pdf, other]
Title: You Only Need Half: Boosting Data Augmentation by Using Partial Content
Authors: Juntao Hu, Yuan Wu
Comments: Technical report,16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel data augmentation method termed You Only Need hAlf (YONA), which simplifies the augmentation process. YONA bisects an image, substitutes one half with noise, and applies data augmentation techniques to the remaining half. This method reduces the redundant information in the original image, encourages neural networks to recognize objects from incomplete views, and significantly enhances neural networks' robustness. YONA is distinguished by its properties of parameter-free, straightforward application, enhancing various existing data augmentation strategies, and thereby bolstering neural networks' robustness without additional computational cost. To demonstrate YONA's efficacy, extensive experiments were carried out. These experiments confirm YONA's compatibility with diverse data augmentation methods and neural network architectures, yielding substantial improvements in CIFAR classification tasks, sometimes outperforming conventional image-level data augmentation methods. Furthermore, YONA markedly increases the resilience of neural networks to adversarial attacks. Additional experiments exploring YONA's variants conclusively show that masking half of an image optimizes performance. The code is available at https://github.com/HansMoe/YONA.

[278]  arXiv:2405.02832 [pdf, other]
Title: Fast One-Stage Unsupervised Domain Adaptive Person Search
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increases model complexity. To address this issue, we propose a Fast One-stage Unsupervised person Search (FOUS) which complementary integrates domain adaptaion with label adaptaion within an end-to-end manner without iterative clustering. To minimize the domain discrepancy, FOUS introduced an Attention-based Domain Alignment Module (ADAM) which can not only align various domains for both detection and ReID tasks but also construct an attention mechanism to reduce the adverse impacts of low-quality candidates resulting from unsupervised detection. Moreover, to avoid the redundant iterative clustering mode, FOUS adopts a prototype-guided labeling method which minimizes redundant correlation computations for partial samples and assigns noisy coarse label groups efficiently. The coarse label groups will be continuously refined via label-flexible training network with an adaptive selection strategy. With the adapted domains and labels, FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW. The code is available at https://github.com/whbdmu/FOUS.

[279]  arXiv:2405.02834 [pdf, other]
Title: Scene-Adaptive Person Search via Bilateral Modulations
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a Scene-Adaptive Person Search (SEAS) model by introducing bilateral modulations to simultaneously eliminate scene noise and maintain a consistent person representation to adapt to various scenes. In SEAS, a Background Modulation Network (BMN) is designed to encode the feature extracted from the detected bounding box into a multi-granularity embedding, which reduces the input of background noise from multiple levels with norm-aware. Additionally, to mitigate the effect of foreground noise on the person feature, SEAS introduces a Foreground Modulation Network (FMN) to compute the clutter reduction offset for the person embedding based on the feature map of the scene image. By bilateral modulations on both background and foreground within an end-to-end manner, SEAS obtains consistent feature representations without scene noise. SEAS can achieve state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU with 97.1\% mAP and PRW with 60.5\% mAP. The code is available at https://github.com/whbdmu/SEAS.

[280]  arXiv:2405.02835 [pdf, other]
Title: Algorithmic collusion in a two-sided market: A rideshare example
Comments: 8 pages, 6 figures
Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)

With dynamic pricing on the rise, firms are using sophisticated algorithms for price determination. These algorithms are often non-interpretable and there has been a recent interest in their seemingly emergent ability to tacitly collude with each other without any prior communication whatsoever. Most of the previous works investigate algorithmic collusion on simple reinforcement learning (RL) based algorithms operating on a basic market model. Instead, we explore the collusive tendencies of Proximal Policy Optimization (PPO), a state-of-the-art continuous state/action space RL algorithm, on a complex double-sided hierarchical market model of rideshare. For this purpose, we extend a mathematical program network (MPN) based rideshare model to a temporal multi origin-destination setting and use PPO to solve for a repeated duopoly game. Our results indicate that PPO can either converge to a competitive or a collusive equilibrium depending upon the underlying market characteristics, even when the hyper-parameters are held constant.

[281]  arXiv:2405.02841 [pdf, ps, other]
Title: Two-weight rank-metric codes
Comments: Accepted for publication in ISIT 2024
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

Two-weight linear codes are linear codes in which any nonzero codeword can have only two possible distinct weights. Those in the Hamming metric have proven to be very interesting for their connections with authentication codes, association schemes, strongly regular graphs, and secret sharing schemes. In this paper, we characterize two-weight codes in the rank metric, answering a recent question posed by Pratihar and Randrianarisoa.

[282]  arXiv:2405.02842 [pdf, other]
Title: IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Subjects: Machine Learning (cs.LG)

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/.

[283]  arXiv:2405.02843 [pdf, other]
Title: Residual-Conditioned Optimal Transport: Towards Structure-preserving Unpaired and Paired Image Restoration
Comments: ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning-based image restoration methods have achieved promising performance. However, how to faithfully preserve the structure of the original image remains challenging. To address this challenge, we propose a novel Residual-Conditioned Optimal Transport (RCOT) approach, which models the image restoration as an optimal transport (OT) problem for both unpaired and paired settings, integrating the transport residual as a unique degradation-specific cue for both the transport cost and the transport map. Specifically, we first formalize a Fourier residual-guided OT objective by incorporating the degradation-specific information of the residual into the transport cost. Based on the dual form of the OT formulation, we design the transport map as a two-pass RCOT map that comprises a base model and a refinement process, in which the transport residual is computed by the base model in the first pass and then encoded as a degradation-specific embedding to condition the second-pass restoration. By duality, the RCOT problem is transformed into a minimax optimization problem, which can be solved by adversarially training neural networks. Extensive experiments on multiple restoration tasks show the effectiveness of our approach in terms of both distortion measures and perceptual quality. Particularly, RCOT restores images with more faithful structural details compared to state-of-the-art methods.

[284]  arXiv:2405.02844 [pdf, other]
Title: SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD framework, we propose Diffusion-based Content Consistency Loss and Content Consistency Loss to assist the overall framework's training. Finally, we conduct extensive experiments. The results reveal that our method surpasses state-of-the-art methods in both qualitative and quantitative comparisons, capable of generating more realistic motion sequences.

[285]  arXiv:2405.02845 [pdf, other]
Title: Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Subjects: Machine Learning (cs.LG); Molecular Networks (q-bio.MN)

Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. We propose to use multi-level embeddings to reflect such hierarchical features based on the adoption of the recent textual inversion technique in the visual domain, which achieves data-efficient image generation. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution. We then generate molecules based on the interpolation of the multi-level token embeddings. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction.

[286]  arXiv:2405.02846 [pdf, ps, other]
Title: Responsible AI: Portraits with Intelligent Bibliometrics
Comments: 14 pages, 9 figures
Subjects: Artificial Intelligence (cs.AI)

Shifting the focus from principles to practical implementation, responsible artificial intelligence (AI) has garnered considerable attention across academia, industry, and society at large. Despite being in its nascent stages, this emerging field grapples with nebulous concepts and intricate knowledge frameworks. By analyzing three prevailing concepts - explainable AI, trustworthy AI, and ethical AI, this study defined responsible AI and identified its core principles. Methodologically, this study successfully demonstrated the implementation of leveraging AI's capabilities into bibliometrics for enhanced knowledge discovery and the cross-validation of experimentally examined models with domain insights. Empirically, this study investigated 17,799 research articles contributed by the AI community since 2015. This involves recognizing key technological players and their relationships, unveiling the topical landscape and hierarchy of responsible AI, charting its evolution, and elucidating the interplay between the responsibility principles and primary AI techniques. An analysis of a core cohort comprising 380 articles from multiple disciplines captures the most recent advancements in responsible AI. As one of the pioneering bibliometric studies dedicated to exploring responsible AI, this study will provide comprehensive macro-level insights, enhancing the understanding of responsible AI while furnishing valuable knowledge support for AI regulation and governance initiatives.

[287]  arXiv:2405.02850 [pdf, other]
Title: Halfway Escape Optimization: A Quantum-Inspired Solution for Complex Optimization Problems
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

This paper first proposes the Halfway Escape Optimization (HEO) algorithm, a novel quantum-inspired metaheuristic designed to address complex optimization problems characterized by rugged landscapes and high-dimensionality with an efficient convergence rate. The study presents a comprehensive comparative evaluation of HEO's performance against established optimization algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Artificial Fish Swarm Algorithm (AFSA), Grey Wolf Optimizer (GWO), and Quantum behaved Particle Swarm Optimization (QPSO). The primary analysis encompasses 14 benchmark functions with dimension 30, demonstrating HEO's effectiveness and adaptability in navigating complex optimization landscapes and providing valuable insights into its performance. The simple test of HEO in Traveling Salesman Problem (TSP) also infers its feasibility in real-time applications.

[288]  arXiv:2405.02858 [pdf, ps, other]
Title: Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation
Comments: Accepted by IEEE WCCI 2024
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)

Social media platforms such as Twitter, Reddit, and Sina Weibo play a crucial role in global communication but often encounter strict regulations in geopolitically sensitive regions. This situation has prompted users to ingeniously modify their way of communicating, frequently resorting to coded language in these regulated social media environments. This shift in communication is not merely a strategy to counteract regulation, but a vivid manifestation of language evolution, demonstrating how language naturally evolves under societal and technological pressures. Studying the evolution of language in regulated social media contexts is of significant importance for ensuring freedom of speech, optimizing content moderation, and advancing linguistic research. This paper proposes a multi-agent simulation framework using Large Language Models (LLMs) to explore the evolution of user language in regulated social media environments. The framework employs LLM-driven agents: supervisory agent who enforce dialogue supervision and participant agents who evolve their language strategies while engaging in conversation, simulating the evolution of communication styles under strict regulations aimed at evading social media regulation. The study evaluates the framework's effectiveness through a range of scenarios from abstract scenarios to real-world situations. Key findings indicate that LLMs are capable of simulating nuanced language dynamics and interactions in constrained settings, showing improvement in both evading supervision and information accuracy as evolution progresses. Furthermore, it was found that LLM agents adopt different strategies for different scenarios.

[289]  arXiv:2405.02859 [pdf, other]
Title: MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Comments: 14 pages, 10 figures, conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the emergence of successful NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions, these methods are inherently constrained by the capabilities of their underlying 2D inpainters. This is due to two key reasons: (i) independently inpainting constituent images results in view-inconsistent imagery, and (ii) 2D inpainters struggle to ensure high-quality geometry completion and alignment with inpainted RGB images.
To overcome these limitations, we propose a novel approach called MVIP-NeRF that harnesses the potential of diffusion priors for NeRF inpainting, addressing both appearance and geometry aspects. MVIP-NeRF performs joint inpainting across multiple views to reach a consistent solution, which is achieved via an iterative optimization process based on Score Distillation Sampling (SDS). Apart from recovering the rendered RGB images, we also extract normal maps as a geometric representation and define a normal SDS loss that motivates accurate geometry inpainting and alignment with the appearance. Additionally, we formulate a multi-view SDS score function to distill generative priors simultaneously from different view images, ensuring consistent visual completion when dealing with large view variations. Our experimental results show better appearance and geometry recovery than previous NeRF inpainting methods.

[290]  arXiv:2405.02861 [pdf, other]
Title: Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models
Comments: 24 pages, 17 figures, 10 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce LexBench, a comprehensive evaluation suite enabled to test language models (LMs) on ten semantic phrase processing tasks. Unlike prior studies, it is the first work to propose a framework from the comparative perspective to model the general semantic phrase (i.e., lexical collocation) and three fine-grained semantic phrases, including idiomatic expression, noun compound, and verbal construction. Thanks to \ourbenchmark, we assess the performance of 15 LMs across model architectures and parameter scales in classification, extraction, and interpretation tasks. Through the experiments, we first validate the scaling law and find that, as expected, large models excel better than the smaller ones in most tasks. Second, we investigate further through the scaling semantic relation categorization and find that few-shot LMs still lag behind vanilla fine-tuned models in the task. Third, through human evaluation, we find that the performance of strong models is comparable to the human level regarding semantic phrase processing. Our benchmarking findings can serve future research aiming to improve the generic capability of LMs on semantic phrase comprehension. Our source code and data are available at https://github.com/jacklanda/LexBench

[291]  arXiv:2405.02865 [pdf, ps, other]
Title: Non cooperative Liquidity Games and their application to bond market trading
Comments: 9 pages
Subjects: Computer Science and Game Theory (cs.GT); Trading and Market Microstructure (q-fin.TR)

We present a new type of game, the Liquidity Game. We draw inspiration from the UK government bond market and apply game theoretic approaches to its analysis. In Liquidity Games, market participants (agents) use non-cooperative games where the players' utility is directly defined by the liquidity of the game itself, offering a paradigm shift in our understanding of market dynamics. Each player's utility is intricately linked to the liquidity generated within the game, making the utility endogenous and dynamic. Players are not just passive recipients of utility based on external factors but active participants whose strategies and actions collectively shape and are shaped by the liquidity of the market. This reflexivity introduces a level of complexity and realism previously unattainable in conventional models.
We apply Liquidity Game theoretic approaches to a simple UK bond market interaction and present results for market design and strategic behavior of participants. We tackle one of the largest issues within this mechanism, namely what strategy should market makers utilize when uncertain about the type of market maker they are interacting with, and what structure might regulators wish to see.

[292]  arXiv:2405.02867 [pdf, other]
Title: Practices, Challenges, and Opportunities When Inferring Requirements From Regulations in the FinTech Sector - An Industrial Study
Subjects: Software Engineering (cs.SE)

[Context and motivation]: Understanding and interpreting regulatory norms and inferring software requirements from them is a critical step towards regulatory compliance, a matter of significant importance in various industrial sectors. [Question/ problem]: However, interpreting regulations still largely depends on individual legal expertise and experience within the respective domain, with little to no systematic methodologies and supportive tools to guide this practice. In fact, research in this area is too often detached from practitioners' experiences, rendering the proposed solutions not transferable to industrial practice. As we argue, one reason is that we still lack a profound understanding of industry- and domain-specific practices and challenges. [Principal ideas/ results]: We aim to close this gap and provide such an investigation at the example of the banking and insurance domain. We conduct an industrial multi-case study as part of a long-term academia-industry collaboration with a medium-sized software development and renovation company. We explore contemporary industrial practices and challenges when inferring requirements from regulations to support more problem-driven research. Our study investigates the complexities of requirement engineering in regulatory contexts, pinpointing various issues and discussing them in detail. We highlight the gathered insights and the practical challenges encountered and suggest avenues for future research. [Contribution]: Our contribution is a comprehensive case study focused on the FinTech domain, offering a detailed understanding of the specific needs within this sector. We have identified key practices for managing regulatory requirements in software development, and have pinpointed several challenges. We conclude by offering a set of recommendations for future problem-driven research directions.

[293]  arXiv:2405.02868 [pdf, ps, other]
Title: Continuous Monitoring for Road Flooding With Satellite Onboard Computing For Navigation for OrbitalAI Φsat-2 challenge
Subjects: Robotics (cs.RO)

Continuous monitoring for road flooding could be achieved through onboard computing of satellite imagery to generate near real-time insights made available to generate dynamic information for maps used for navigation. Given the existing computing hardware like the one considered for the PhiSat-2 mission, the paper describes the feasibility of running the road flooding detection. The simulated onboard imagery dataset development and its annotation process for the OrbitalAI {\Phi}sat-2 challenge is described. The flooding events in the city of Bengaluru, India were considered for this challenge. This is followed by the model architecture selection, training, optimization and accuracy results for the model. The results indicate that it is possible to build low size, high accuracy models for the road flooding use case.

[294]  arXiv:2405.02872 [pdf, ps, other]
Title: The weighted and shifted seven-step BDF method for parabolic equations
Comments: 23 pages
Subjects: Numerical Analysis (math.NA)

Stability of the BDF methods of order up to five for parabolic equations can be established by the energy technique via Nevanlinna--Odeh multipliers. The nonexistence of Nevanlinna--Odeh multipliers makes the six-step BDF method special; however, the energy technique was recently extended by the authors in [Akrivis et al., SIAM J. Numer. Anal. \textbf{59} (2021) 2449--2472] and covers all six stable BDF methods. The seven-step BDF method is unstable for parabolic equations, since it is not even zero-stable. In this work, we construct and analyze a stable linear combination of two non zero-stable schemes, the seven-step BDF method and its shifted counterpart, referred to as WSBDF7 method. The stability regions of the WSBDF$q, q\leqslant 7$, with a weight $\vartheta\geqslant1$, increase as $\vartheta$ increases, are larger than the stability regions of the classical BDF$q,$ corresponding to $\vartheta=1$. We determine novel and suitable multipliers for the WSBDF7 method and establish stability for parabolic equations by the energy technique. The proposed approach is applicable for mean curvature flow, gradient flows, fractional equations and nonlinear equations.

[295]  arXiv:2405.02875 [pdf, ps, other]
Title: Insights Gained after a Decade of Cellular Automata-based Cryptography
Authors: Luca Mariot
Comments: 20 pages, 2 figures. Invited paper at AUTOMATA 2024
Subjects: Cryptography and Security (cs.CR)

Cellular Automata (CA) have been extensively used to implement symmetric cryptographic primitives, such as pseudorandom number generators and S-boxes. However, most of the research in this field, except the very early works, seems to be published in non-cryptographic venues. This phenomenon poses a problem of relevance: are CA of any use to cryptographers nowadays? This paper provides insights into this question by briefly outlining the history of CA-based cryptography. In doing so, the paper identifies some shortcomings in the research addressing the design of symmetric primitives exclusively from a CA standpoint, alongside some recommendations for future research. Notably, the paper remarks that researchers working in CA and cryptography often tackle similar problems, albeit under different perspectives and terminologies. This observation indicates that there is still ample room for fruitful collaborations between the CA and cryptography communities in the future.

[296]  arXiv:2405.02876 [pdf, ps, other]
Title: Exploring the Improvement of Evolutionary Computation via Large Language Models
Comments: accepted by GECCO 2024
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Evolutionary computation (EC), as a powerful optimization algorithm, has been applied across various domains. However, as the complexity of problems increases, the limitations of EC have become more apparent. The advent of large language models (LLMs) has not only transformed natural language processing but also extended their capabilities to diverse fields. By harnessing LLMs' vast knowledge and adaptive capabilities, we provide a forward-looking overview of potential improvements LLMs can bring to EC, focusing on the algorithms themselves, population design, and additional enhancements. This presents a promising direction for future research at the intersection of LLMs and EC.

[297]  arXiv:2405.02880 [pdf, other]
Title: Blending Distributed NeRFs with Tri-stage Robust Pose Optimization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Due to the limited model capacity, leveraging distributed Neural Radiance Fields (NeRFs) for modeling extensive urban environments has become a necessity. However, current distributed NeRF registration approaches encounter aliasing artifacts, arising from discrepancies in rendering resolutions and suboptimal pose precision. These factors collectively deteriorate the fidelity of pose estimation within NeRF frameworks, resulting in occlusion artifacts during the NeRF blending stage. In this paper, we present a distributed NeRF system with tri-stage pose optimization. In the first stage, precise poses of images are achieved by bundle adjusting Mip-NeRF 360 with a coarse-to-fine strategy. In the second stage, we incorporate the inverting Mip-NeRF 360, coupled with the truncated dynamic low-pass filter, to enable the achievement of robust and precise poses, termed Frame2Model optimization. On top of this, we obtain a coarse transformation between NeRFs in different coordinate systems. In the third stage, we fine-tune the transformation between NeRFs by Model2Model pose optimization. After obtaining precise transformation parameters, we proceed to implement NeRF blending, showcasing superior performance metrics in both real-world and simulation scenarios. Codes and data will be publicly available at https://github.com/boilcy/Distributed-NeRF.

[298]  arXiv:2405.02881 [pdf, other]
Title: FedConPE: Efficient Federated Conversational Bandits with Heterogeneous Clients
Comments: Accepted in the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based federated conversational bandit algorithm, where $M$ agents collaboratively solve a global contextual linear bandit problem with the help of a central server while ensuring secure data management. To effectively coordinate all the clients and aggregate their collected data, FedConPE uses an adaptive approach to construct key terms that minimize uncertainty across all dimensions in the feature space. Furthermore, compared with existing federated linear bandit algorithms, FedConPE offers improved computational and communication efficiency as well as enhanced privacy protections. Our theoretical analysis shows that FedConPE is minimax near-optimal in terms of cumulative regret. We also establish upper bounds for communication costs and conversation frequency. Comprehensive evaluations demonstrate that FedConPE outperforms existing conversational bandit algorithms while using fewer conversations.

[299]  arXiv:2405.02882 [pdf, other]
Title: A drone detector with modified backbone and multiple pyramid featuremaps enhancement structure (MDDPE)
Authors: Chenhao Wu
Comments: 20 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work presents a drone detector with modified backbone and multiple pyramid feature maps enhancement structure (MDDPE). Novel feature maps improve modules that uses different levels of information to produce more robust and discriminatory features is proposed. These module includes the feature maps supplement function and the feature maps recombination enhancement function.To effectively handle the drone characteristics, auxiliary supervisions that are implemented in the early stages by employing tailored anchors designed are utilized. To further improve the modeling of real drone detection scenarios and initialization of the regressor, an updated anchor matching technique is introduced to match anchors and ground truth drone as closely as feasible. To show the proposed MDDPE's superiority over the most advanced detectors, extensive experiments are carried out using well-known drone detection benchmarks.

[300]  arXiv:2405.02887 [pdf, other]
Title: Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English
Comments: 6 pages, 3 Figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

People communicate in more than 7,000 languages around the world, with around 780 languages spoken in India alone. Despite this linguistic diversity, research on Sentiment Analysis has predominantly focused on English text data, resulting in a disproportionate availability of sentiment resources for English. This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages. We also discuss the shortcomings and potential for future work towards the end.

[301]  arXiv:2405.02893 [pdf, ps, other]
Title: Exploring the ethical sensitivity of Ph.D. students in robotics
Comments: 15 pages
Subjects: Computers and Society (cs.CY)

Ethical sensitivity, generally defined as a person's ability to recognize ethical issues and attribute importance to them, is considered to be a crucial competency in the life of professionals and academics and an essential prerequisite to successfully meeting ethical challenges. A concept that first emerged in moral psychology almost 40 years ago, ethical sensitivity has been widely studied in healthcare, business, and other domains. Conversely, it appears to have received little to no attention within the robotics community, even though choices in the design and deployment of robots are likely to have wide-ranging, profound ethical impacts on society. Due to the negative repercussions that a lack of ethical sensitivity can have in these contexts, promoting the development of ethical sensitivity among roboticists is imperative, and endeavoring to train this competency becomes a critical undertaking. Therefore, as a first step in this direction and within the context of a broader effort aimed at developing an online interactive ethics training module for roboticists, we conducted a qualitative exploration of the ethical sensitivity of a sample of Ph.D. students in robotics using case vignettes that exemplified ethical tensions in disaster robotics.

[302]  arXiv:2405.02897 [pdf, other]
Title: DexiTac: Soft Dexterous Tactile Gripping
Comments: 11 pages, 12 figures
Subjects: Robotics (cs.RO)

Grasping object,whether they are flat, round, or narrow and whether they have regular or irregular shapes,introduces difficulties in determining the ideal grasping posture, even for the most state-of-the-art grippers. In this article, we presented a reconfigurable pneumatic gripper with fingers that could be set in various configurations, such as hooking, supporting, closuring, and pinching. Each finger incorporates a dexterous joint, a rotating joint, and a customized plug-and-play visuotactile sensor, the DigiTac-v1.5, to control manipulation in real time. We propose a tactile kernel density manipulation strategy for simple and versatile control, including detecting grasp stability, responding to disturbances and guiding dexterous manipulations. We develop a double closed-loop control system that separately focuses on secure grasping and task management, demonstrated with tasks that highlight the capabilities above. The gripper is relatively easy to fabricate and customize, offering a promising and extensible way to combine soft dexterity and tactile sensing for diverse applications in robotic manipulation.

[303]  arXiv:2405.02903 [pdf, other]
Title: Predicting Open-Hole Laminates Failure Using Support Vector Machines With Classical and Quantum Kernels
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Modeling open hole failure of composites is a complex task, consisting in a highly nonlinear response with interacting failure modes. Numerical modeling of this phenomenon has traditionally been based on the finite element method, but requires to tradeoff between high fidelity and computational cost. To mitigate this shortcoming, recent work has leveraged machine learning to predict the strength of open hole composite specimens. Here, we also propose using data-based models but to tackle open hole composite failure from a classification point of view. More specifically, we show how to train surrogate models to learn the ultimate failure envelope of an open hole composite plate under in-plane loading. To achieve this, we solve the classification problem via support vector machine (SVM) and test different classifiers by changing the SVM kernel function. The flexibility of kernel-based SVM also allows us to integrate the recently developed quantum kernels in our algorithm and compare them with the standard radial basis function (RBF) kernel. Finally, thanks to kernel-target alignment optimization, we tune the free parameters of all kernels to best separate safe and failure-inducing loading states. The results show classification accuracies higher than 90% for RBF, especially after alignment, followed closely by the quantum kernel classifiers.

[304]  arXiv:2405.02904 [pdf, ps, other]
Title: Distributed Structured Matrix Multiplication
Authors: Derya Malak
Comments: Proc., IEEE ISIT 2024
Subjects: Information Theory (cs.IT)

We devise achievable encoding schemes for distributed source compression for computing inner products, symmetric matrix products, and more generally, square matrix products, which are a class of nonlinear transformations. To that end, our approach relies on devising nonlinear mappings of distributed sources, which are then followed by the structured linear encoding scheme, introduced by K\"orner and Marton. For different computation scenarios, we contrast our findings on the achievable sum rate with the state of the art to demonstrate the possible savings in compression rate. When the sources have special correlation structures, it is possible to achieve unbounded gains, as demonstrated by the analysis and numerical simulations.

[305]  arXiv:2405.02906 [pdf, other]
Title: SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
Comments: 9 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Salient object detection (SOD) remains an important task in computer vision, with applications ranging from image segmentation to autonomous driving. Fully convolutional network (FCN)-based methods have made remarkable progress in visual saliency detection over the last few decades. However, these methods have limitations in accurately detecting salient objects, particularly in challenging scenes with multiple objects, small objects, or objects with low resolutions. To address this issue, we proposed a Saliency Fusion Attention U-Net (SalFAU-Net) model, which incorporates a saliency fusion module into each decoder block of the attention U-net model to generate saliency probability maps from each decoder block. SalFAU-Net employs an attention mechanism to selectively focus on the most informative regions of an image and suppress non-salient regions. We train SalFAU-Net on the DUTS dataset using a binary cross-entropy loss function. We conducted experiments on six popular SOD evaluation datasets to evaluate the effectiveness of the proposed method. The experimental results demonstrate that our method, SalFAU-Net, achieves competitive performance compared to other methods in terms of mean absolute error (MAE), F-measure, s-measure, and e-measure.

[306]  arXiv:2405.02911 [pdf, other]
Title: Multimodal Sense-Informed Prediction of 3D Human Motions
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Predicting future human pose is a fundamental application for machine intelligence, which drives robots to plan their behavior and paths ahead of time to seamlessly accomplish human-robot collaboration in real-world 3D scenarios. Despite encouraging results, existing approaches rarely consider the effects of the external scene on the motion sequence, leading to pronounced artifacts and physical implausibilities in the predictions. To address this limitation, this work introduces a novel multi-modal sense-informed motion prediction approach, which conditions high-fidelity generation on two modal information: external 3D scene, and internal human gaze, and is able to recognize their salience for future human activity. Furthermore, the gaze information is regarded as the human intention, and combined with both motion and scene features, we construct a ternary intention-aware attention to supervise the generation to match where the human wants to reach. Meanwhile, we introduce semantic coherence-aware attention to explicitly distinguish the salient point clouds and the underlying ones, to ensure a reasonable interaction of the generated sequence with the 3D scene. On two real-world benchmarks, the proposed method achieves state-of-the-art performance both in 3D human pose and trajectory prediction.

[307]  arXiv:2405.02913 [pdf, ps, other]
Title: Fast TILs estimation in lung cancer WSIs based on semi-stochastic patch sampling
Comments: 18 pages, 7 figures, 6 appendix pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Addressing the critical need for accurate prognostic biomarkers in cancer treatment, quantifying tumor-infiltrating lymphocytes (TILs) in non-small cell lung cancer (NSCLC) presents considerable challenges. Manual TIL quantification in whole slide images (WSIs) is laborious and subject to variability, potentially undermining patient outcomes. Our study introduces an automated pipeline that utilizes semi-stochastic patch sampling, patch classification to retain prognostically relevant patches, and cell quantification using the HoVer-Net model to streamline the TIL evaluation process. This pipeline efficiently excludes approximately 70% of areas not relevant for prognosis and requires only 5% of the remaining patches to maintain prognostic accuracy (c-index 0.65 +- 0.01). The computational efficiency achieved does not sacrifice prognostic accuracy, as demonstrated by the TILs score's strong correlation with patient survival, which surpasses traditional CD8 IHC scoring methods. While the pipeline demonstrates potential for enhancing NSCLC prognostication and personalization of treatment, comprehensive clinical validation is still required. Future research should focus on verifying its broader clinical utility and investigating additional biomarkers to improve NSCLC prognosis.

[308]  arXiv:2405.02914 [pdf, other]
Title: Simulation of Optical Tactile Sensors Supporting Slip and Rotation using Path Tracing and IMPM
Subjects: Robotics (cs.RO)

Optical tactile sensors are extensively utilized in intelligent robot manipulation due to their ability to acquire high-resolution tactile information at a lower cost. However, achieving adequate reality and versatility in simulating optical tactile sensors is challenging. In this paper, we propose a simulation method and validate its effectiveness through experiments. We utilize path tracing for image rendering, achieving higher similarity to real data than the baseline method in simulating pressing scenarios. Additionally, we apply the improved Material Point Method(IMPM) algorithm to simulate the relative rest between the object and the elastomer surface when the object is in motion, enabling more accurate simulation of complex manipulations such as slip and rotation.

[309]  arXiv:2405.02917 [pdf, other]
Title: Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models
Comments: 8 pages, with appendix. To appear in TrustNLP workshop @ NAACL 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting. We propose the new Japanese Uncertain Scenes (JUS) dataset, aimed at testing VLM capabilities via difficult queries and object counting, and the Net Calibration Error (NCE) to measure direction of miscalibration. Results show that both LLMs and VLMs have a high calibration error and are overconfident most of the time, indicating a poor capability for uncertainty estimation. Additionally we develop prompts for regression tasks, and we show that VLMs have poor calibration when producing mean/standard deviation and 95% confidence intervals.

[310]  arXiv:2405.02918 [pdf, other]
Title: MERIT: Multi-view Evidential learning for Reliable and Interpretable liver fibrosis sTaging
Comments: Submitted to Medical Image Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate staging of liver fibrosis from magnetic resonance imaging (MRI) is crucial in clinical practice. While conventional methods often focus on a specific sub-region, multi-view learning captures more information by analyzing multiple patches simultaneously. However, previous multi-view approaches could not typically calculate uncertainty by nature, and they generally integrate features from different views in a black-box fashion, hence compromising reliability as well as interpretability of the resulting models. In this work, we propose a new multi-view method based on evidential learning, referred to as MERIT, which tackles the two challenges in a unified framework. MERIT enables uncertainty quantification of the predictions to enhance reliability, and employs a logic-based combination rule to improve interpretability. Specifically, MERIT models the prediction from each sub-view as an opinion with quantified uncertainty under the guidance of the subjective logic theory. Furthermore, a distribution-aware base rate is introduced to enhance performance, particularly in scenarios involving class distribution shifts. Finally, MERIT adopts a feature-specific combination rule to explicitly fuse multi-view predictions, thereby enhancing interpretability. Results have showcased the effectiveness of the proposed MERIT, highlighting the reliability and offering both ad-hoc and post-hoc interpretability. They also illustrate that MERIT can elucidate the significance of each view in the decision-making process for liver fibrosis staging.

[311]  arXiv:2405.02922 [pdf, other]
Title: Easy over Hard: A Simple Baseline for Test Failures Causes Prediction
Subjects: Software Engineering (cs.SE)

The test failure causes analysis is critical since it determines the subsequent way of handling different types of bugs, which is the prerequisite to get the bugs properly analyzed and fixed. After a test case fails, software testers have to inspect the test execution logs line by line to identify its root cause. However, manual root cause determination is often tedious and time-consuming, which can cost 30-40% of the time needed to fix a problem. Therefore, there is a need for automatically predicting the test failure causes to lighten the burden of software testers. In this paper, we present a simple but hard-to-beat approach, named NCChecker to automatically identify the failure causes for failed test logs. Our approach can help developers efficiently identify the test failure causes, and flag the most probable log lines of indicating the root causes for investigation. Our approach has three main stages: log abstraction, lookup table construction, and failure causes prediction. We first perform log abstraction to parse the unstructured log messages into structured log events. NCChecker then automatically maintains and updates a lookup table via employing our heuristic rules, which record the matching score between different log events and test failure causes. When it comes to the failure cause prediction stage, for a newly generated failed test log, NCChecker can easily infer its failed reason by checking out the associated log events' scores from the lookup table. We have developed a prototype and evaluated our tool on a real-world industrial dataset with more than 10K test logs. The extensive experiments show the promising performance of our model over a set of benchmarks. Moreover, our approach is highly efficient and memory-saving, and can successfully handle the data imbalance problem.

[312]  arXiv:2405.02923 [pdf, other]
Title: Constructing $(h,d)$ cooperative MSR codes with sub-packetization $(d-k+h)(d-k+1)^{\lceil n/2 \rceil}$
Subjects: Information Theory (cs.IT)

We address the multi-node failure repair challenges for MDS array codes. Presently, two primary models are employed for multi-node repairs: the centralized model where all failed nodes are restored in a singular data center, and the cooperative model where failed nodes acquire data from auxiliary nodes and collaborate amongst themselves for the repair process.This paper focuses on the cooperative model, and we provide explicit constructions of optimal MDS array codes with $d$ helper nodes under this model. The sub-packetization level of our new codes is $(d-k+h)(d-k+1)^{\lceil n/2 \rceil}$ where $h$ is the number of failed nodes, $k$ the number of information nodes and $n$ the code length. This improves upon recent constructions given by Liu \emph{et al.} (IEEE Transactions on Information Theory, Vol. 69, 2023).

[313]  arXiv:2405.02924 [pdf, other]
Title: Optimal Sampling for Uncertainty-of-Information Minimization in a Remote Monitoring System
Subjects: Information Theory (cs.IT)

In this paper, we study a remote monitoring system where a receiver observes a remote binary Markov source and decides whether to sample and fetch the source's state over a randomly delayed channel. Due to transmission delay, the observation of the source is imperfect, resulting in the uncertainty of the source's state at the receiver. We thus use uncertainty of information as the metric to characterize the performance of the system. Measured by Shannon's entropy, uncertainty of information reflects how much we do not know about the latest source's state in the absence of new information. The current research for uncertainty of information idealizes the transmission delay as one time slot, but not under random delay. Moreover, uncertainty of information varies with the latest observation of the source's state, making it different from other age of information related functions. Motivated by the above reasons, we formulate a uncertainty of information minimization problem under random delay. Typically, such a problem which takes actions based on the imperfect observations can be modeled as a partially observed Markov decision process. By introducing belief state, we transform this process into a semi-Markov decision process. To solve this problem, we first provide an optimal sampling policy employing a two layered bisection relative value iteration algorithm. Furthermore, we propose a sub-optimal index policy with low complexity based on the special properties of belief state. Numerical simulations illustrate that both of the proposed sampling policies outperforms two other benchmarks. Moreover, the performance of the sub-optimal policy approaches to that of the optimal policy, particularly under large delay.

[314]  arXiv:2405.02925 [pdf, other]
Title: A Two-Stage Prediction-Aware Contrastive Learning Framework for Multi-Intent NLU
Comments: LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

Multi-intent natural language understanding (NLU) presents a formidable challenge due to the model confusion arising from multiple intents within a single utterance. While previous works train the model contrastively to increase the margin between different multi-intent labels, they are less suited to the nuances of multi-intent NLU. They ignore the rich information between the shared intents, which is beneficial to constructing a better embedding space, especially in low-data scenarios. We introduce a two-stage Prediction-Aware Contrastive Learning (PACL) framework for multi-intent NLU to harness this valuable knowledge. Our approach capitalizes on shared intent information by integrating word-level pre-training and prediction-aware contrastive fine-tuning. We construct a pre-training dataset using a word-level data augmentation strategy. Subsequently, our framework dynamically assigns roles to instances during contrastive fine-tuning while introducing a prediction-aware contrastive loss to maximize the impact of contrastive learning. We present experimental results and empirical analysis conducted on three widely used datasets, demonstrating that our method surpasses the performance of three prominent baselines on both low-data and full-data scenarios.

[315]  arXiv:2405.02929 [pdf, other]
Title: Unified Dynamic Scanpath Predictors Outperform Individually Trained Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Previous research on scanpath prediction has mainly focused on group models, disregarding the fact that the scanpaths and attentional behaviors of individuals are diverse. The disregard of these differences is especially detrimental to social human-robot interaction, whereby robots commonly emulate human gaze based on heuristics or predefined patterns. However, human gaze patterns are heterogeneous and varying behaviors can significantly affect the outcomes of such human-robot interactions. To fill this gap, we developed a deep learning-based social cue integration model for saliency prediction to instead predict scanpaths in videos. Our model learned scanpaths by recursively integrating fixation history and social cues through a gating mechanism and sequential attention. We evaluated our approach on gaze datasets of dynamic social scenes, observed under the free-viewing condition. The introduction of fixation history into our models makes it possible to train a single unified model rather than the resource-intensive approach of training individual models for each set of scanpaths. We observed that the late neural integration approach surpasses early fusion when training models on a large dataset, in comparison to a smaller dataset with a similar distribution. Results also indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models. We hypothesize that this outcome is a result of the group saliency representations instilling universal attention in the model, while the supervisory signal guides it to learn personalized attentional behaviors, providing the unified model a benefit over individual models due to its implicit representation of universal attention.

[316]  arXiv:2405.02931 [pdf, other]
Title: Optimal Signals and Detectors Based on Correlation and Energy
Comments: 27 pages, 5 figures; submitted for publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In continuation of an earlier study, we explore a Neymann-Pearson hypothesis testing scenario where, under the null hypothesis ($\cal{H}_0$), the received signal is a white noise process $N_t$, which is not Gaussian in general, and under the alternative hypothesis ($\cal{H}_1$), the received signal comprises a deterministic transmitted signal $s_t$ corrupted by additive white noise, the sum of $N_t$ and another noise process originating from the transmitter, denoted as $Z_t$, which is not necessarily Gaussian either. Our approach focuses on detectors that are based on the correlation and energy of the received signal, which are motivated by implementation simplicity. We optimize the detector parameters to achieve the best trade-off between missed-detection and false-alarm error exponents. First, we optimize the detectors for a given signal, resulting in a non-linear relation between the signal and correlator weights to be optimized. Subsequently, we optimize the transmitted signal and the detector parameters jointly, revealing that the optimal signal is a balanced ternary signal and the correlator has at most three different coefficients, thus facilitating a computationally feasible solution.

[317]  arXiv:2405.02933 [pdf, other]
Title: Relay Decoding: Concatenating Large Language Models for Machine Translation
Comments: Work in progress
Subjects: Computation and Language (cs.CL)

Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitigate these expenses, we propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages. By incorporating a simple mapping layer to facilitate the connection between these two models and utilizing a limited amount of parallel data for training, we successfully achieve superior results in the machine translation task. Experimental results conducted on the Multi30k and WikiMatrix datasets validate the effectiveness of our proposed method.

[318]  arXiv:2405.02935 [pdf, other]
Title: Enabling Patient-side Disease Prediction via the Integration of Patient Narratives
Subjects: Computation and Language (cs.CL)

Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a complex task from the standpoint of a patient and is always only available post-patient consultation. To make disease prediction available from patient-side, we propose Personalized Medical Disease Prediction (PoMP), which predicts diseases using patient health narratives including textual descriptions and demographic information. By applying PoMP, patients can gain a clearer comprehension of their conditions, empowering them to directly seek appropriate medical specialists and thereby reducing the time spent navigating healthcare communication to locate suitable doctors. We conducted extensive experiments using real-world data from Haodf to showcase the effectiveness of PoMP.

[319]  arXiv:2405.02936 [pdf, ps, other]
Title: On the tractability of SHAP explanations under Markovian distributions
Comments: Accepted at ICML'24 (This version is a pre-print)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Thanks to its solid theoretical foundation, the SHAP framework is arguably one the most widely utilized frameworks for local explainability of ML models. Despite its popularity, its exact computation is known to be very challenging, proven to be NP-Hard in various configurations. Recent works have unveiled positive complexity results regarding the computation of the SHAP score for specific model families, encompassing decision trees, random forests, and some classes of boolean circuits. Yet, all these positive results hinge on the assumption of feature independence, often simplistic in real-world scenarios. In this article, we investigate the computational complexity of the SHAP score by relaxing this assumption and introducing a Markovian perspective. We show that, under the Markovian assumption, computing the SHAP score for the class of Weighted automata, Disjoint DNFs and Decision Trees can be performed in polynomial time, offering a first positive complexity result for the problem of SHAP score computation that transcends the limitations of the feature independence assumption.

[320]  arXiv:2405.02937 [pdf, other]
Title: Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study
Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)
Subjects: Computation and Language (cs.CL)

Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two statements, known as the premise and hypothesis. When the premise logically implies the hypothesis, the pair is labeled ``entailment''. If the hypothesis contradicts the premise, the pair receives the ``contradiction'' label. When there is insufficient evidence to establish a connection, the pair is described as ``neutral''. Despite the success of Large Language Models (LLMs) in various tasks, their effectiveness in NLI remains constrained by issues like low-resource domain accuracy, model overconfidence, and difficulty in capturing human judgment disagreements. This study addresses the underexplored area of evaluating LLMs in low-resourced languages such as Bengali. Through a comprehensive evaluation, we assess the performance of prominent LLMs and state-of-the-art (SOTA) models in Bengali NLP tasks, focusing on natural language inference. Utilizing the XNLI dataset, we conduct zero-shot and few-shot evaluations, comparing LLMs like GPT-3.5 Turbo and Gemini 1.5 Pro with models such as BanglaBERT, Bangla BERT Base, DistilBERT, mBERT, and sahajBERT. Our findings reveal that while LLMs can achieve comparable or superior performance to fine-tuned SOTA models in few-shot scenarios, further research is necessary to enhance our understanding of LLMs in languages with modest resources like Bengali. This study underscores the importance of continued efforts in exploring LLM capabilities across diverse linguistic contexts.

[321]  arXiv:2405.02941 [pdf, other]
Title: Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into \textit{semantic high-frequency} that adheres to a Boundary distribution and \textit{non-semantic high-frequency} counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution.Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by $4.4$ dB and the SSIM by $0.1$ on average over GRAIN, utilizing only 74\% of the parameters and 20\% of the computation. The code will be available at https://github.com/THU-Kingmin/BAFlow.

[322]  arXiv:2405.02944 [pdf, other]
Title: Imaging Signal Recovery Using Neural Network Priors Under Uncertain Forward Model Parameters
Comments: Accepted by PBDL-CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Inverse imaging problems (IIPs) arise in various applications, with the main objective of reconstructing an image from its compressed measurements. This problem is often ill-posed for being under-determined with multiple interchangeably consistent solutions. The best solution inherently depends on prior knowledge or assumptions, such as the sparsity of the image. Furthermore, the reconstruction process for most IIPs relies significantly on the imaging (i.e. forward model) parameters, which might not be fully known, or the measurement device may undergo calibration drifts. These uncertainties in the forward model create substantial challenges, where inaccurate reconstructions usually happen when the postulated parameters of the forward model do not fully match the actual ones. In this work, we devoted to tackling accurate reconstruction under the context of a set of possible forward model parameters that exist. Here, we propose a novel Moment-Aggregation (MA) framework that is compatible with the popular IIP solution by using a neural network prior. Specifically, our method can reconstruct the signal by considering all candidate parameters of the forward model simultaneously during the update of the neural network. We theoretically demonstrate the convergence of the MA framework, which has a similar complexity with reconstruction under the known forward model parameters. Proof-of-concept experiments demonstrate that the proposed MA achieves performance comparable to the forward model with the known precise parameter in reconstruction across both compressive sensing and phase retrieval applications, with a PSNR gap of 0.17 to 1.94 over various datasets, including MNIST, X-ray, Glas, and MoNuseg. This highlights our method's significant potential in reconstruction under an uncertain forward model.

[323]  arXiv:2405.02945 [pdf, other]
Title: Invertible Residual Rescaling Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0.3 dB over HCFlow and IRN in the $\times 4$ rescaling while only using 60\% parameters and 50\% FLOPs. The code will be available at https://github.com/THU-Kingmin/IRRM.

[324]  arXiv:2405.02951 [pdf, other]
Title: iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval
Comments: Extended version of the ICCV2023 paper arXiv:2303.15247
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. The reliance of supervised methods on labor-intensive manually labeled datasets hinders their broad applicability. In this work, we introduce a new task, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeled training dataset. We propose an approach named iSEARLE (improved zero-Shot composEd imAge Retrieval with textuaL invErsion) that involves mapping the visual information of the reference image into a pseudo-word token in CLIP token embedding space and combining it with the relative caption. To foster research on ZS-CIR, we present an open-domain benchmarking dataset named CIRCO (Composed Image Retrieval on Common Objects in context), the first CIR dataset where each query is labeled with multiple ground truths and a semantic categorization. The experimental results illustrate that iSEARLE obtains state-of-the-art performance on three different CIR datasets -- FashionIQ, CIRR, and the proposed CIRCO -- and two additional evaluation settings, namely domain conversion and object composition. The dataset, the code, and the model are publicly available at https://github.com/miccunifi/SEARLE.

[325]  arXiv:2405.02952 [pdf, other]
Title: Accelerating Legacy Numerical Solvers by Non-intrusive Gradient-based Meta-solving
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Scientific computing is an essential tool for scientific discovery and engineering design, and its computational cost is always a main concern in practice. To accelerate scientific computing, it is a promising approach to use machine learning (especially meta-learning) techniques for selecting hyperparameters of traditional numerical methods. There have been numerous proposals to this direction, but many of them require automatic-differentiable numerical methods. However, in reality, many practical applications still depend on well-established but non-automatic-differentiable legacy codes, which prevents practitioners from applying the state-of-the-art research to their own problems. To resolve this problem, we propose a non-intrusive methodology with a novel gradient estimation technique to combine machine learning and legacy numerical codes without any modification. We theoretically and numerically show the advantage of the proposed method over other baselines and present applications of accelerating established non-automatic-differentiable numerical solvers implemented in PETSc, a widely used open-source numerical software library.

[326]  arXiv:2405.02953 [pdf, other]
Title: Analysis of the Identifying Regulation with Adversarial Surrogates Algorithm
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Given a time-series of noisy measured outputs of a dynamical system z[k], k=1...N, the Identifying Regulation with Adversarial Surrogates (IRAS) algorithm aims to find a non-trivial first integral of the system, namely, a scalar function g() such that g(z[i]) = g(z[j]), for all i,j. IRAS has been suggested recently and was used successfully in several learning tasks in models from biology and physics. Here, we give the first rigorous analysis of this algorithm in a specific setting. We assume that the observations admit a linear first integral and that they are contaminated by Gaussian noise. We show that in this case the IRAS iterations are closely related to the self-consistent-field (SCF) iterations for solving a generalized Rayleigh quotient minimization problem. Using this approach, we derive several sufficient conditions guaranteeing local convergence of IRAS to the correct first integral.

[327]  arXiv:2405.02954 [pdf, other]
Title: Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training
Comments: Extension of ICCV paper arXiv:2212.07585, submitted to IJCV
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern pre-trained networks into the adaptation process to leverage their stronger representation learning capabilities. For adaptation, we propose the Co-learn algorithm to improve target pseudolabel quality collaboratively through the source model and a pre-trained feature extractor. Building on the recent success of the vision-language model CLIP in zero-shot image recognition, we present an extension Co-learn++ to further incorporate CLIP's zero-shot classification decisions. We evaluate on 3 benchmark datasets and include more challenging scenarios such as open-set, partial-set and open-partial SFDA. Experimental results demonstrate that our proposed strategy improves adaptation performance and can be successfully integrated with existing SFDA methods.

[328]  arXiv:2405.02957 [pdf, other]
Title: Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Subjects: Artificial Intelligence (cs.AI)

In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep accumulating experience from both successful and unsuccessful cases. Simulation experiments show that the treatment performance of doctor agents consistently improves on various tasks. More interestingly, the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), the evolved doctor agent achieves a state-of-the-art accuracy of 93.06% on a subset of the MedQA dataset that covers major respiratory diseases. This work paves the way for advancing the applications of LLM-powered agent techniques in medical scenarios.

[329]  arXiv:2405.02958 [pdf, ps, other]
Title: Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Score matching with Langevin dynamics (SMLD) method has been successfully applied to accelerated MRI. However, the hyperparameters in the sampling process require subtle tuning, otherwise the results can be severely corrupted by hallucination artifacts, particularly with out-of-distribution test data. In this study, we propose a novel workflow in which SMLD results are regarded as additional priors to guide model-driven network training. First, we adopted a pretrained score network to obtain samples as preliminary guidance images (PGI) without the need for network retraining, parameter tuning and in-distribution test data. Although PGIs are corrupted by hallucination artifacts, we believe that they can provide extra information through effective denoising steps to facilitate reconstruction. Therefore, we designed a denoising module (DM) in the second step to improve the quality of PGIs. The features are extracted from the components of Langevin dynamics and the same score network with fine-tuning; hence, we can directly learn the artifact patterns. Third, we designed a model-driven network whose training is guided by denoised PGIs (DGIs). DGIs are densely connected with intermediate reconstructions in each cascade to enrich the features and are periodically updated to provide more accurate guidance. Our experiments on different sequences revealed that despite the low average quality of PGIs, the proposed workflow can effectively extract valuable information to guide the network training, even with severely reduced training data and sampling steps. Our method outperforms other cutting-edge techniques by effectively mitigating hallucination artifacts, yielding robust and high-quality reconstruction results.

[330]  arXiv:2405.02961 [pdf, other]
Title: JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos
Comments: Submitted to the International Journal of Computer Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Due to the ever-increasing availability of video surveillance cameras and the growing need for crime prevention, the violence detection task is attracting greater attention from the research community. With respect to other action recognition tasks, violence detection in surveillance videos shows additional issues, such as the presence of a significant variety of real fight scenes. Unfortunately, available datasets seem to be very small compared with other action recognition datasets. Moreover, in surveillance applications, people in the scenes always differ for each video and the background of the footage differs for each camera. Also, violent actions in real-life surveillance videos must be detected quickly to prevent unwanted consequences, thus models would definitely benefit from a reduction in memory usage and computational costs. Such problems make classical action recognition methods difficult to be adopted. To tackle all these issues, we introduce JOSENet, a novel self-supervised framework that provides outstanding performance for violence detection in surveillance videos. The proposed model receives two spatiotemporal video streams, i.e., RGB frames and optical flows, and involves a new regularized self-supervised learning approach for videos. JOSENet provides improved performance compared to self-supervised state-of-the-art methods, while requiring one-fourth of the number of frames per video segment and a reduced frame rate. The source code and the instructions to reproduce our experiments are available at https://github.com/ispamm/JOSENet.

[331]  arXiv:2405.02962 [pdf, other]
Title: VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovatively, we conceptualize the stylization process as the rearrangement of vectorized strokes extracted from the reference image. VectorPainter employs an optimization-based pipeline. It begins by extracting vectorized strokes from the reference image, which are then used to initialize the synthesis process. To ensure fidelity to the reference style, a novel style preservation loss is introduced. Extensive experiments have been conducted to demonstrate that our method is capable of aligning with the text description while remaining faithful to the reference image.

[332]  arXiv:2405.02963 [pdf, ps, other]
Title: Preventive Audits for Data Applications Before Data Sharing in the Power IoT
Comments: 19 pages, 18 figures
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

With the increase in data volume, more types of data are being used and shared, especially in the power Internet of Things (IoT). However, the processes of data sharing may lead to unexpected information leakage because of the ubiquitous relevance among the different data, thus it is necessary for data owners to conduct preventive audits for data applications before data sharing to avoid the risk of key information leakage. Considering that the same data may play completely different roles in different application scenarios, data owners should know the expected data applications of the data buyers in advance and provide modified data that are less relevant to the private information of the data owners and more relevant to the nonprivate information that the data buyers need. In this paper, data sharing in the power IoT is regarded as the background, and the mutual information of the data and their implicit information is selected as the data feature parameter to indicate the relevance between the data and their implicit information or the ability to infer the implicit information from the data. Therefore, preventive audits should be conducted based on changes in the data feature parameters before and after data sharing. The probability exchange adjustment method is proposed as the theoretical basis of preventive audits under simplified consumption, and the corresponding optimization models are constructed and extended to more practical scenarios with multivariate characteristics. Finally, case studies are used to validate the effectiveness of the proposed preventive audits.

[333]  arXiv:2405.02965 [pdf, other]
Title: Robust Collaborative Perception without External Localization and Clock Devices
Comments: 6pages, accepted to ICRA 2024
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.

[334]  arXiv:2405.02967 [pdf, other]
Title: Exploring Text-based Realistic Building Facades Editing Applicaiton
Authors: Jing Wang, Xin Zhang
Subjects: Human-Computer Interaction (cs.HC)

This paper explores the utilization of diffusion models and textual guidance for achieving localized editing of building facades, addressing the escalating demand for sophisticated editing methodologies in architectural design and urban planning. Leveraging the robust generative capabilities of diffusion models, this study presents a promising avenue for realistically synthesizing and modifying architectural facades. Through iterative diffusion and text descriptions, these models adeptly capture both the intricate global and local structures inherent in architectural facades, thus effectively navigating the complexity of such designs. Additionally, the paper examines the expansive potential of diffusion models in various facets, including the generation of novel facade designs, the enhancement of existing facades, and the realization of personalized customization. Despite their promise, diffusion models encounter obstacles such as computational resource constraints and data imbalances. To address these challenges, the study introduces the innovative Blended Latent Diffusion method for architectural facade editing, accompanied by a comprehensive visual analysis of its viability and efficacy. Through these endeavors, we aims to propel forward the field of architectural facade editing, contributing to its advancement and practical application.

[335]  arXiv:2405.02968 [pdf, other]
Title: CoverLib: Classifiers-equipped Experience Library by Iterative Problem Distribution Coverage Maximization for Domain-tuned Motion Planning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Library-based methods are known to be very effective for fast motion planning by adapting an experience retrieved from a precomputed library. This article presents CoverLib, a principled approach for constructing and utilizing such a library. CoverLib iteratively adds an experience-classifier-pair to the library, where each classifier corresponds to an adaptable region of the experience within the problem space. This iterative process is an active procedure, as it selects the next experience based on its ability to effectively cover the uncovered region. During the query phase, these classifiers are utilized to select an experience that is expected to be adaptable for a given problem. Experimental results demonstrate that CoverLib effectively mitigates the trade-off between plannability and speed observed in global (e.g. sampling-based) and local (e.g. optimization-based) methods. As a result, it achieves both fast planning and high success rates over the problem domain. Moreover, due to its adaptation-algorithm-agnostic nature, CoverLib seamlessly integrates with various adaptation methods, including nonlinear programming-based and sampling-based algorithms.

[336]  arXiv:2405.02969 [pdf, other]
Title: Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

We propose NeuronaBox, a flexible, user-friendly, and high-fidelity approach to emulate DNN training workloads. We argue that to accurately observe performance, it is possible to execute the training workload on a subset of real nodes and emulate the networked execution environment along with the collective communication operations. Initial results from a proof-of-concept implementation show that NeuronaBox replicates the behavior of actual systems with high accuracy, with an error margin of less than 1% between the emulated measurements and the real system.

[337]  arXiv:2405.02971 [pdf, ps, other]
Title: Achieving Narrative Change Through AR: Displacing the Single Story to Create Spatial Justice
Comments: Presented at CHI 2024 (arXiv:2404.05889)
Subjects: Human-Computer Interaction (cs.HC)

The ability of Augmented Reality to overcome the bias of single stories through multidimensionality is explored in the artifacts of a youth gun violence prevention project and its goal of narrative change.

[338]  arXiv:2405.02972 [pdf, other]
Title: Multi-Agent RL-Based Industrial AIGC Service Offloading over Wireless Edge Networks
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

Currently, the generative model has garnered considerable attention due to its application in addressing the challenge of scarcity of abnormal samples in the industrial Internet of Things (IoT). However, challenges persist regarding the edge deployment of generative models and the optimization of joint edge AI-generated content (AIGC) tasks. In this paper, we focus on the edge optimization of AIGC task execution and propose GMEL, a generative model-driven industrial AIGC collaborative edge learning framework. This framework aims to facilitate efficient few-shot learning by leveraging realistic sample synthesis and edge-based optimization capabilities. First, a multi-task AIGC computational offloading model is presented to ensure the efficient execution of heterogeneous AIGC tasks on edge servers. Then, we propose an attention-enhanced multi-agent reinforcement learning (AMARL) algorithm aimed at refining offloading policies within the IoT system, thereby supporting generative model-driven edge learning. Finally, our experimental results demonstrate the effectiveness of the proposed algorithm in optimizing the total system latency of the edge-based AIGC task completion.

[339]  arXiv:2405.02973 [pdf, other]
Title: FairRelay: Fair and Cost-Efficient Peer-to-Peer Content Delivery through Payment Channel Networks
Comments: 27 pages, 21 figures
Subjects: Cryptography and Security (cs.CR)

Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to high on-chain costs and over-simplified network assumptions. In this paper, we introduce FairRelay, a fair and cost-efficient protocol that ensures all participants get fair payoff in complex content delivery network settings. We introduce a novel primitive, Enforceable Accumulative Hashed TimeLock Contract (Enforceable A-HTLC), designed to guarantee payment atomicity - ensuring all participants receive their payments upon successful content delivery.
The fairness of FairRelay is proved using the Universal Composability (UC) framework. Our evaluation demonstrates that, in optimistic scenarios, FairRelay employs zero on-chain costs. In pessimistic scenarios, the on-chain dispute costs for relayers and customers are constant, irrespective of the network complexity. Specifically, empirical results indicate that the on-chain dispute costs for relayers and customers are 24,902 gas (equivalent to 0.01 USD on Optimism L2) and 290,797 gas (0.07 USD), respectively. In a 10-hop relay path, FairRelay introduces less than 1.5% additional overhead compared to pure data transmission, showcasing the efficiency of FairRelay.

[340]  arXiv:2405.02977 [pdf, other]
Title: SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences
Comments: 8 pages, 5 figures, 7 tables, submitted to IEEE conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Numerous sign language datasets exist, yet they typically cover only a limited selection of the thousands of signs used globally. Moreover, creating diverse sign language datasets is an expensive and challenging task due to the costs associated with gathering a varied group of signers. Motivated by these challenges, we aimed to develop a solution that addresses these limitations. In this context, we focused on textually describing body movements from skeleton keypoint sequences, leading to the creation of a new dataset. We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset. We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements. This model processes the skeleton keypoints data as a vector, applies a fully connected layer for embedding, and utilizes a transformer neural network for sequence-to-sequence modeling. We conducted extensive evaluations of our model, including signer-agnostic and sign-agnostic assessments. The model achieved promising results, with a ROUGE-L score of 0.98 and a BLEU-4 score of 0.94 in the signer-agnostic evaluation. The dataset we have prepared, namely the AUTSL-SkelCap, will be made publicly available soon.

[341]  arXiv:2405.02979 [pdf, other]
Title: A Long-Short-Term Mixed-Integer Formulation for Highway Lane Change Planning
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work considers the problem of optimal lane changing in a structured multi-agent road environment. A novel motion planning algorithm that can capture long-horizon dependencies as well as short-horizon dynamics is presented. Pivotal to our approach is a geometric approximation of the long-horizon combinatorial transition problem which we formulate in the continuous time-space domain. Moreover, a discrete-time formulation of a short-horizon optimal motion planning problem is formulated and combined with the long-horizon planner. Both individual problems, as well as their combination, are formulated as MIQP and solved in real-time by using state-of-the-art solvers. We show how the presented algorithm outperforms two other state-of-the-art motion planning algorithms in closed-loop performance and computation time in lane changing problems. Evaluations are performed using the traffic simulator SUMO, a custom low-level tracking model predictive controller, and high-fidelity vehicle models and scenarios, provided by the CommonRoad environment.

[342]  arXiv:2405.02980 [pdf, other]
Title: Self-Organized Construction by Minimal Surprise
Comments: Published in 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)

For the robots to achieve a desired behavior, we can program them directly, train them, or give them an innate driver that makes the robots themselves desire the targeted behavior. With the minimal surprise approach, we implant in our robots the desire to make their world predictable. Here, we apply minimal surprise to collective construction. Simulated robots push blocks in a 2D torus grid world. In two variants of our experiment we either allow for emergent behaviors or predefine the expected environment of the robots. In either way, we evolve robot behaviors that move blocks to structure their environment and make it more predictable. The resulting controllers can be applied in collective construction by robots.

[343]  arXiv:2405.02981 [pdf, other]
Title: Over-the-Air Majority Vote Computation with Modulation on Conjugate-Reciprocal Zeros
Authors: Alphan Sahin
Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this study, we propose a new approach to compute the majority vote (MV) function based on modulation on conjugate-reciprocal zeros (MOCZ) and introduce three different methods. The proposed methods rely on the fact that when a linear combination of polynomials is evaluated at one of the roots of a polynomial in the combination, that polynomial does contribute to the evaluation. To utilize this property, each transmitter maps the votes to the zeros of a Huffman polynomial, and the corresponding polynomial coefficients are transmitted. The receiver evaluates the polynomial constructed by the elements of the superposed sequence at conjugate-reciprocal zero pairs and detects the MV with a direct zero-testing (DiZeT) decoder. With differential and index-based encoders, we eliminate the need for power-delay information at the receiver while improving the computation error rate (CER) performance. The proposed methods do not use instantaneous channel state information at the transmitters and receiver. Thus, they provide robustness against phase and time synchronization errors. We theoretically analyze the CERs of the proposed methods. Finally, we demonstrate their efficacy in a distributed median computation scenario in a fading channel.

[344]  arXiv:2405.02982 [pdf, other]
Title: Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aesthetics, the field of aesthetic evaluation for paintings and Drawings has seen limited attention until the introduction of the BAID dataset in March 2023. This dataset solely comprises overall scores for high-quality artistic images. Our research marks the pioneering introduction of a multi-attribute, multi-category dataset specifically tailored to the field of painting: Aesthetics of Paintings and Drawings Dataset (APDD). The construction of APDD received active participation from 28 professional artists worldwide, along with dozens of students specializing in the field of art. This dataset encompasses 24 distinct artistic categories and 10 different aesthetic attributes. Each image in APDD has been evaluated by six professionally trained experts in the field of art, including assessments for both total aesthetic scores and aesthetic attribute scores. The final APDD dataset comprises a total of 4985 images, with an annotation count exceeding 31100 entries. Concurrently, we propose an innovative approach: Art Assessment Network for Specific Painting Styles (AANSPS), designed for the assessment of aesthetic attributes in mixed-attribute art datasets. Through this research, our goal is to catalyze advancements in the field of aesthetic evaluation for paintings and drawings, while enriching the available resources and methodologies for its further development and application.

[345]  arXiv:2405.02984 [pdf, other]
Title: E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods
Comments: 7 pages, 3 figures, 4 tables, submitted to IEEE conference
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This study introduces the continuous Educational Turkish Sign Language (E-TSL) dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades. The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers. Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64% are singleton words and 85% are rare words, appearing less than five times. We developed two baseline models to address these challenges: the Pose to Text Transformer (P2T-T) and the Graph Neural Network based Transformer (GNN-T) models. The GNN-T model achieved 19.13% BLEU-1 score and 3.28% BLEU-4 score, presenting a significant challenge compared to existing benchmarks. The P2T-T model, while demonstrating slightly lower performance in BLEU scores, achieved a higher ROUGE-L score of 22.09%. Additionally, we benchmarked our model using the well-known PHOENIX-Weather 2014T dataset to validate our approach.

[346]  arXiv:2405.02985 [pdf, ps, other]
Title: Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper presents reports on a series of experiments with a novel dataset evaluating how well Large Language Models (LLMs) can mark (i.e. grade) open text responses to short answer questions, Specifically, we explore how well different combinations of GPT version and prompt engineering strategies performed at marking real student answers to short answer across different domain areas (Science and History) and grade-levels (spanning ages 5-16) using a new, never-used-before dataset from Carousel, a quizzing platform. We found that GPT-4, with basic few-shot prompting performed well (Kappa, 0.70) and, importantly, very close to human-level performance (0.75). This research builds on prior findings that GPT-4 could reliably score short answer reading comprehension questions at a performance-level very close to that of expert human raters. The proximity to human-level performance, across a variety of subjects and grade levels suggests that LLMs could be a valuable tool for supporting low-stakes formative assessment tasks in K-12 education and has important implications for real-world education delivery.

[347]  arXiv:2405.02986 [pdf, other]
Title: Harvesting Energy from Soil-Air Temperature Differences for Batteryless IoT Devices: A Case Study
Comments: This paper is under review at IEEE Access. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Systems and Control (eess.SY)

The temperature difference between soil and air holds the potential to generate energy to power many low-power IoT devices. However, there is a lack of studies in the literature that explore the nuances of soil-air thermal energy harvesting. This paper offers a comprehensive discussion on soil-air thermal energy harvesting. We engineer a custom Soil-air Thermoelectric Generator (SoTEG) that incorporates an off-the-shelf TEG and an efficient heat transfer network. A detailed discussion of the design and analysis of SoTEG is presented along with a versatile simulation model which can be used to simulate the performance of the harvester under different ambient conditions. Investigations using the model and results gathered from experiments demonstrate that the SoTEG has a heat transfer efficiency of 34.5% with room for improvement and can power a load from temperature differences as low as 3 {\deg}C between soil and air, or 1 {\deg}C across the TEG. Power generated by SoTEG at 3 {\deg}C difference amounts to 110 {\mu}Wor a power density of 11.58mW/m2. When connected to a Power Management Unit (PMU), the combined system generates around 30 {\mu}Wat 3 {\deg}C. During a 14-day outdoor deployment in a winter month, the maximum power generated by the combined system is 337 {\mu}W when the temperature difference across the TEG is 2.75 {\deg}C. Additionally, the model analysis reveals that the weather conditions have an impact on the harvester. While Solar radiation enhances power generation, wind can either improve or diminish the harvested energy depending on whether it is day or night.

[348]  arXiv:2405.02989 [pdf, other]
Title: Defense against Joint Poison and Evasion Attacks: A Case Study of DERMS
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

There is an upward trend of deploying distributed energy resource management systems (DERMS) to control modern power grids. However, DERMS controller communication lines are vulnerable to cyberattacks that could potentially impact operational reliability. While a data-driven intrusion detection system (IDS) can potentially thwart attacks during deployment, also known as the evasion attack, the training of the detection algorithm may be corrupted by adversarial data injected into the database, also known as the poisoning attack. In this paper, we propose the first framework of IDS that is robust against joint poisoning and evasion attacks. We formulate the defense mechanism as a bilevel optimization, where the inner and outer levels deal with attacks that occur during training time and testing time, respectively. We verify the robustness of our method on the IEEE-13 bus feeder model against a diverse set of poisoning and evasion attack scenarios. The results indicate that our proposed method outperforms the baseline technique in terms of accuracy, precision, and recall for intrusion detection.

[349]  arXiv:2405.02991 [pdf, other]
Title: Steered Response Power for Sound Source Localization: A Tutorial Review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.

[350]  arXiv:2405.02994 [pdf, other]
Title: Extended State Observer for Mismatch Disturbances Using Taylor Approximation of the Integral
Authors: Cuong Duc Nguyen
Subjects: Systems and Control (eess.SY)

The development of disturbance estimators using extended state observers (ESOs) typically assumes that the system is observable. This paper introduces an improved method for systems that are initially unobservable, leveraging Taylor expansion to approximate the integral of disturbance dynamics. A new extended system is formulated based on this approximation, enabling the design of an observer that achieves exponential stability of the error dynamics. The proposed method's efficacy is demonstrated through a practical example, highlighting its potential for robust disturbance estimation in dynamic systems.

[351]  arXiv:2405.02995 [pdf, other]
Title: Analysis about Theoretical Foundations for Method to Enhancing ASR Performance using OCR Word Frequency Differences
Subjects: Numerical Analysis (math.NA)

As interest in large language models (LLMs) grows, the importance of accuracy in automatic speech recognition (ASR) has become more pronounced. This is particularly true for lectures that include specialized terminology, where the success rate of traditional ASR models tends to be low, posing a challenging problem. A method to improve ASR performance for specialized terminology using the word frequency difference approach has been proposed. Through experiments and data analysis, we investigate whether this proposal effectively addresses the issue. Additionally, we introduce the power law as the theoretical foundation for the relative frequency

[352]  arXiv:2405.02996 [pdf, other]
Title: RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
Comments: Accepted EMBC 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.

[353]  arXiv:2405.03000 [pdf, other]
Title: MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning
Comments: Work in Progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Despite their improved capabilities in generation and reasoning, adapting large language models (LLMs) to the biomedical domain remains challenging due to their immense size and corporate privacy. In this work, we propose MedAdapter, a unified post-hoc adapter for test-time adaptation of LLMs towards biomedical applications. Instead of fine-tuning the entire LLM, MedAdapter effectively adapts the original model by fine-tuning only a small BERT-sized adapter to rank candidate solutions generated by LLMs. Experiments demonstrate that MedAdapter effectively adapts both white-box and black-box LLMs in biomedical reasoning, achieving average performance improvements of 25.48% and 11.31%, respectively, without requiring extensive computational resources or sharing data with third parties. MedAdapter also yields superior performance when combined with train-time adaptation, highlighting a flexible and complementary solution to existing adaptation methods. Faced with the challenges of balancing model performance, computational resources, and data privacy, MedAdapter provides an efficient, privacy-preserving, cost-effective, and transparent solution for adapting LLMs to the biomedical domain.

[354]  arXiv:2405.03003 [pdf, other]
Title: Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $\Delta W=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $\Delta W$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $\Delta W$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}.

[355]  arXiv:2405.03004 [pdf, other]
Title: Exploring prompts to elicit memorization in masked language model-based named entity recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Training data memorization in language models impacts model capability (generalization) and safety (privacy risk). This paper focuses on analyzing prompts' impact on detecting the memorization of 6 masked language model-based named entity recognition models. Specifically, we employ a diverse set of 400 automatically generated prompts, and a pairwise dataset where each pair consists of one person's name from the training set and another name out of the set. A prompt completed with a person's name serves as input for getting the model's confidence in predicting this name. Finally, the prompt performance of detecting model memorization is quantified by the percentage of name pairs for which the model has higher confidence for the name from the training set. We show that the performance of different prompts varies by as much as 16 percentage points on the same model, and prompt engineering further increases the gap. Moreover, our experiments demonstrate that prompt performance is model-dependent but does generalize across different name sets. A comprehensive analysis indicates how prompt performance is influenced by prompt properties, contained tokens, and the model's self-attention weights on the prompt.

[356]  arXiv:2405.03005 [pdf, other]
Title: Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action trajectories. Our key contributions are: first, we design a safety model that specifically performs credit assignment to assess contributions of partial state-action trajectories on safety. This safety model is trained using a labeled safety dataset. Second, using RL-as-inference strategy we derive an effective algorithm for optimizing a safe policy using the learned safety model. Finally, we devise a method to dynamically adapt the tradeoff coefficient between reward maximization and safety compliance. We rewrite the constrained optimization problem into its dual problem and derive a gradient-based method to dynamically adjust the tradeoff coefficient during training. Our empirical results demonstrate that this approach is highly scalable and able to satisfy sophisticated non-Markovian safety constraints.

[357]  arXiv:2405.03007 [pdf, other]
Title: On the performativity of SDG classifications in large bibliometric databases
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large bibliometric databases, such as Web of Science, Scopus, and OpenAlex, facilitate bibliometric analyses, but are performative, affecting the visibility of scientific outputs and the impact measurement of participating entities. Recently, these databases have taken up the UN's Sustainable Development Goals (SDGs) in their respective classifications, which have been criticised for their diverging nature. This work proposes using the feature of large language models (LLMs) to learn about the "data bias" injected by diverse SDG classifications into bibliometric data by exploring five SDGs. We build a LLM that is fine-tuned in parallel by the diverse SDG classifications inscribed into the databases' SDG classifications. Our results show high sensitivity in model architecture, classified publications, fine-tuning process, and natural language generation. The wide arbitrariness at different levels raises concerns about using LLM in research practice.

[358]  arXiv:2405.03009 [pdf, other]
Title: Explainable Malware Detection with Tailored Logic Explained Networks
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Malware detection is a constant challenge in cybersecurity due to the rapid development of new attack techniques. Traditional signature-based approaches struggle to keep pace with the sheer volume of malware samples. Machine learning offers a promising solution, but faces issues of generalization to unseen samples and a lack of explanation for the instances identified as malware. However, human-understandable explanations are especially important in security-critical fields, where understanding model decisions is crucial for trust and legal compliance. While deep learning models excel at malware detection, their black-box nature hinders explainability. Conversely, interpretable models often fall short in performance. To bridge this gap in this application domain, we propose the use of Logic Explained Networks (LENs), which are a recently proposed class of interpretable neural networks providing explanations in the form of First-Order Logic (FOL) rules. This paper extends the application of LENs to the complex domain of malware detection, specifically using the large-scale EMBER dataset. In the experimental results we show that LENs achieve robustness that exceeds traditional interpretable methods and that are rivaling black-box models. Moreover, we introduce a tailored version of LENs that is shown to generate logic explanations with higher fidelity with respect to the model's predictions.

[359]  arXiv:2405.03010 [pdf, other]
Title: High Order Reasoning for Time Critical Recommendation in Evidence-based Medicine
Authors: Manjiang Yu, Xue Li
Comments: 13 pages, 15 figures
Subjects: Artificial Intelligence (cs.AI)

In time-critical decisions, human decision-makers can interact with AI-enabled situation-aware software to evaluate many imminent and possible scenarios, retrieve billions of facts, and estimate different outcomes based on trillions of parameters in a fraction of a second. In high-order reasoning, "what-if" questions can be used to challenge the assumptions or pre-conditions of the reasoning, "why-not" questions can be used to challenge on the method applied in the reasoning, "so-what" questions can be used to challenge the purpose of the decision, and "how-about" questions can be used to challenge the applicability of the method. When above high-order reasoning questions are applied to assist human decision-making, it can help humans to make time-critical decisions and avoid false-negative or false-positive types of errors. In this paper, we present a model of high-order reasoning to offer recommendations in evidence-based medicine in a time-critical fashion for the applications in ICU. The Large Language Model (LLM) is used in our system. The experiments demonstrated the LLM exhibited optimal performance in the "What-if" scenario, achieving a similarity of 88.52% with the treatment plans of human doctors. In the "Why-not" scenario, the best-performing model tended to opt for alternative treatment plans in 70% of cases for patients who died after being discharged from the ICU. In the "So-what" scenario, the optimal model provided a detailed analysis of the motivation and significance of treatment plans for ICU patients, with its reasoning achieving a similarity of 55.6% with actual diagnostic information. In the "How-about" scenario, the top-performing LLM demonstrated a content similarity of 66.5% in designing treatment plans transferring for similar diseases. Meanwhile, LLMs managed to predict the life status of patients after their discharge from the ICU with an accuracy of 70%.

[360]  arXiv:2405.03011 [pdf, ps, other]
Title: AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation
Comments: 15 pages, 7 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Skin lesion segmentation is a critical task in computer-aided diagnosis systems for dermatological diseases. Accurate segmentation of skin lesions from medical images is essential for early detection, diagnosis, and treatment planning. In this paper, we propose a new model for skin lesion segmentation namely AC-MambaSeg, an enhanced model that has the hybrid CNN-Mamba backbone, and integrates advanced components such as Convolutional Block Attention Module (CBAM), Attention Gate, and Selective Kernel Bottleneck. AC-MambaSeg leverages the Vision Mamba framework for efficient feature extraction, while CBAM and Selective Kernel Bottleneck enhance its ability to focus on informative regions and suppress background noise. We evaluate the performance of AC-MambaSeg on diverse datasets of skin lesion images including ISIC-2018 and PH2; then compare it against existing segmentation methods. Our model shows promising potential for improving computer-aided diagnosis systems and facilitating early detection and treatment of dermatological diseases. Our source code will be made available at: https://github.com/vietthanh2710/AC-MambaSeg.

[361]  arXiv:2405.03016 [pdf, ps, other]
Title: Pathwise uniform convergence of a full discretization for a three-dimensional stochastic Allen-Cahn equation with multiplicative noise
Authors: Binjie Li, Qin Zhou
Subjects: Numerical Analysis (math.NA)

This paper analyzes a full discretization of a three-dimensional stochastic Allen-Cahn equation with multiplicative noise. The discretization uses the Euler scheme for temporal discretization and the finite element method for spatial discretization. By deriving a stability estimate of a discrete stochastic convolution and utilizing this stability estimate along with the discrete stochastic maximal $L^p$-regularity estimate, a pathwise uniform convergence rate with the general spatial $ L^q $-norms is derived.

[362]  arXiv:2405.03017 [pdf, ps, other]
Title: Fault-tolerant Consensus in Anonymous Dynamic Network
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper studies the feasibility of reaching consensus in an anonymous dynamic network. In our model, $n$ anonymous nodes proceed in synchronous rounds. We adopt a hybrid fault model in which up to $f$ nodes may suffer crash or Byzantine faults, and the dynamic message adversary chooses a communication graph for each round.
We introduce a stability property of the dynamic network -- $(T,D)$-dynaDegree for $T \geq 1$ and $n-1 \geq D \geq 1$ -- which requires that for every $T$ consecutive rounds, any fault-free node must have incoming directed links from at least $D$ distinct neighbors. These links might occur in different rounds during a $T$-round interval. $(1,n-1)$-dynaDegree means that the graph is a complete graph in every round. $(1,1)$-dynaDegree means that each node has at least one incoming neighbor in every round, but the set of incoming neighbor(s) at each node may change arbitrarily between rounds.
We show that exact consensus is impossible even with $(1,n-2)$-dynaDegree. For an arbitrary $T$, we show that for crash-tolerant approximate consensus, $(T,\lfloor n/2 \rfloor)$-dynaDegree and $n > 2f$ are together necessary and sufficient, whereas for Byzantine approximate consensus, $(T,\lfloor (n+3f)/2 \rfloor)$-dynaDegree and $n > 5f$ are together necessary and sufficient.

[363]  arXiv:2405.03018 [pdf, other]
Title: TSP Escapes the $O(2^n n^2)$ Curse
Authors: Mihail Stoian
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

The dynamic programming solution to the traveling salesman problem due to Bellman, and independently Held and Karp, runs in time $O(2^n n^2)$, with no improvement in the last sixty years. We break this barrier for the first time by designing an algorithm that runs in deterministic time $2^n n^2 / 2^{\Omega(\sqrt{\log n})}$. We achieve this by strategically remodeling the dynamic programming recursion as a min-plus matrix product, for which faster-than-na\"ive algorithms exist.

[364]  arXiv:2405.03025 [pdf, other]
Title: Matten: Video Generation with Mamba-Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten.

[365]  arXiv:2405.03026 [pdf, other]
Title: Enhanced Detection Classification via Clustering SVM for Various Robot Collaboration Task
Comments: This paper has been received by CISCE 2024 Conference
Subjects: Robotics (cs.RO)

We introduce an advanced, swift pattern recognition strategy for various multiple robotics during curve negotiation. This method, leveraging a sophisticated k-means clustering-enhanced Support Vector Machine algorithm, distinctly categorizes robotics into flying or mobile robots. Initially, the paradigm considers robot locations and features as quintessential parameters indicative of divergent robot patterns. Subsequently, employing the k-means clustering technique facilitates the efficient segregation and consolidation of robotic data, significantly optimizing the support vector delineation process and expediting the recognition phase. Following this preparatory phase, the SVM methodology is adeptly applied to construct a discriminative hyperplane, enabling precise classification and prognostication of the robot category. To substantiate the efficacy and superiority of the k-means framework over traditional SVM approaches, a rigorous cross-validation experiment was orchestrated, evidencing the former's enhanced performance in robot group classification.

[366]  arXiv:2405.03029 [pdf, other]
Title: Optimal Box Contraction for Solving Linear Systems via Simulated and Quantum Annealing
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Solving linear systems of equations is an important problem in science and engineering. Many quantum algorithms, such as the Harrow-Hassidim-Lloyd (HHL) algorithm (for quantum-gate computers) and the box algorithm (for quantum-annealing machines), have been proposed for solving such systems.
The focus of this paper is on improving the efficiency of the box algorithm. The basic principle behind this algorithm is to transform the linear system into a series of quadratic unconstrained binary optimization (QUBO) problems, which are then solved on annealing machines.
The computational efficiency of the box algorithm is entirely determined by the number of iterations, which, in turn, depends on the box contraction ratio, typically set to 0.5. Here, we show through theory that a contraction ratio of 0.5 is sub-optimal and that we can achieve a speed-up with a contraction ratio of 0.2. This is confirmed through numerical experiments where a speed-up between $20 \%$ to $60 \%$ is observed when the optimal contraction ratio is used.

[367]  arXiv:2405.03031 [pdf, other]
Title: Distributed Learning for Dynamic Congestion Games
Comments: This paper has been accepted by IEEE ISIT 2024. arXiv admin note: substantial text overlap with arXiv:2404.15599
Subjects: Computer Science and Game Theory (cs.GT)

Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Google Maps and Waze). Yet such platforms myopically recommend the currently shortest path to users, and selfish users are unwilling to travel to longer paths of varying traffic conditions to explore. Prior studies focus on one-shot congestion games without information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a distributed manner. Our analysis shows that, as compared to the social optimum in minimizing the long-term social cost via optimal exploration-exploitation tradeoff, the myopic routing policy leads to severe under-exploration of stochastic paths with the price of anarchy (PoA) greater than \(2\). Besides, it fails to ensure the correct learning convergence about users' traffic hazard beliefs. To mitigate the efficiency loss, we first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even \(\text{PoA}=\infty\). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than \(\frac{5}{4}\), which cannot be further reduced by any other informational mechanism. Additionally, we experiment with real-world data to verify our CHAR's good average performance.

[368]  arXiv:2405.03034 [pdf, other]
Title: FlexKalmanNet: A Modular AI-Enhanced Kalman Filter Framework Applied to Spacecraft Motion Estimation
Subjects: Robotics (cs.RO); Earth and Planetary Astrophysics (astro-ph.EP)

The estimation of relative motion between spacecraft increasingly relies on feature-matching computer vision, which feeds data into a recursive filtering algorithm. Kalman filters, although efficient in noise compensation, demand extensive tuning of system and noise models. This paper introduces FlexKalmanNet, a novel modular framework that bridges this gap by integrating a deep fully connected neural network with Kalman filter-based motion estimation algorithms. FlexKalmanNet's core innovation is its ability to learn any Kalman filter parameter directly from measurement data, coupled with the flexibility to utilize various Kalman filter variants. This is achieved through a notable design decision to outsource the sequential computation from the neural network to the Kalman filter variant, enabling a purely feedforward neural network architecture. This architecture, proficient at handling complex, nonlinear features without the dependency on recurrent network modules, captures global data patterns more effectively. Empirical evaluation using data from NASA's Astrobee simulation environment focuses on learning unknown parameters of an Extended Kalman filter for spacecraft pose and twist estimation. The results demonstrate FlexKalmanNet's rapid training convergence, high accuracy, and superior performance against manually tuned Extended Kalman filters.

[369]  arXiv:2405.03035 [pdf, other]
Title: Probabilistic Finite Automaton Emptiness is undecidable
Authors: Günter Rote
Comments: 45 pages, 10 figures, 2 tables, 30 footnotes, 10 sections plus 1 appendix
Subjects: Formal Languages and Automata Theory (cs.FL)

It is undecidable whether the language recognized by a probabilistic finite automaton is empty. Several other undecidability results, in particular regarding problems about matrix products, are based on this important theorem. We present two proofs of this theorem from the literature in a self-contained way, and we derive some strengthenings. For example, we show that the problem remains undecidable for a fixed probabilistic finite automaton with 11 states, where only the starting distribution is given as input.

[370]  arXiv:2405.03038 [pdf, other]
Title: On the use of dynamical systems in cryptography
Authors: Samuel Everett
Comments: 16 pages, 1 figure, to appear in Chaos, Solitons and Fractals
Subjects: Cryptography and Security (cs.CR)

Ever since the link between nonlinear science and cryptography became apparent, the problem of applying chaotic dynamics to the construction of cryptographic systems has gained a broad audience and has been the subject of thousands of papers. Yet, the field has not found its place in mainstream cryptography, largely due to persistent weaknesses in the presented systems. The goal of this paper is to help remedy this problem in two ways. The first is by providing a new algorithm that can be used to attack -- and hence test the security of -- stream ciphers based on the iteration of a chaotic map of the interval. The second is to cast discrete dynamical systems problems in a modern cryptographic and complexity theoretic language, so that researchers working in chaos-based cryptography can begin designing cryptographic protocols that have a better chance of meeting the extreme standards of modern cryptography.

[371]  arXiv:2405.03039 [pdf, ps, other]
Title: Performance Evaluation of Real-Time Object Detection for Electric Scooters
Comments: 10 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)

Electric scooters (e-scooters) have rapidly emerged as a popular mode of transportation in urban areas, yet they pose significant safety challenges. In the United States, the rise of e-scooters has been marked by a concerning increase in related injuries and fatalities. Recently, while deep-learning object detection holds paramount significance in autonomous vehicles to avoid potential collisions, its application in the context of e-scooters remains relatively unexplored. This paper addresses this gap by assessing the effectiveness and efficiency of cutting-edge object detectors designed for e-scooters. To achieve this, the first comprehensive benchmark involving 22 state-of-the-art YOLO object detectors, including five versions (YOLOv3, YOLOv5, YOLOv6, YOLOv7, and YOLOv8), has been established for real-time traffic object detection using a self-collected dataset featuring e-scooters. The detection accuracy, measured in terms of mAP@0.5, ranges from 27.4% (YOLOv7-E6E) to 86.8% (YOLOv5s). All YOLO models, particularly YOLOv3-tiny, have displayed promising potential for real-time object detection in the context of e-scooters. Both the traffic scene dataset (https://zenodo.org/records/10578641) and software program codes (https://github.com/DongChen06/ScooterDet) for model benchmarking in this study are publicly available, which will not only improve e-scooter safety with advanced object detection but also lay the groundwork for tailored solutions, promising a safer and more sustainable urban micromobility landscape.

[372]  arXiv:2405.03045 [pdf, other]
Title: Swipe2Pair: Secure and Fast In-Band Wireless Device Pairing
Authors: Yaqi He (1), Kai Zeng (1), Long Jiao (2), Brian L. Mark (1), Khaled N. Khasawneh (1) ((1) George Mason University, (2) University of Massachusetts Dartmouth)
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

Wireless device pairing is a critical security mechanism to bootstrap the secure communication between two devices without a pre-shared secret. It has been widely used in many Internet of Things (IoT) applications, such as smart-home and smart-health. Most existing device pairing mechanisms are based on out-of-band channels, e.g., extra sensors or hardware, to validate the proximity of pairing devices. However, out-of-band channels are not universal across all wireless devices, so such a scheme is limited to certain application scenarios or conditions. On the other hand, in-band channel-based device pairing seeks universal applicability by only relying on wireless interfaces. Existing in-band channel-based pairing schemes either require multiple antennas separated by a good distance on one pairing device, which is not feasible in certain scenarios, or require users to repeat multiple sweeps, which is not optimal in terms of usability.
Therefore, an in-band wireless device pairing scheme providing high security while maintaining high usability (simple pairing process and minimal user intervention) is highly desired. In this work, we propose an easy-to-use mutual authentication device pairing scheme, named Swipe2Pair, based on the proximity of pairing devices and randomization of wireless transmission power. We conduct extensive security analysis and collect considerable experimental data under various settings across different environments. Experimental results show that Swipe2Pair achieves high security and usability. It only takes less than one second to complete the pairing process with a simple swipe of one device in front of the other.

[373]  arXiv:2405.03052 [pdf, other]
Title: A View on Out-of-Distribution Identification from a Statistical Testing Theory Perspective
Subjects: Machine Learning (cs.LG)

We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.

[374]  arXiv:2405.03055 [pdf, other]
Title: Multi-hop graph transformer network for 3D human pose estimation
Journal-ref: Journal of Visual Communication and Image Representation, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi-hop graph transformer network designed for 2D-to-3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi-hop graph convolutional block comprised of multi-hop convolutional and dilated convolutional layers. The combination of multi-head self-attention and multi-hop graph convolutional layers enables the model to capture both local and global dependencies, while the integration of dilated convolutional layers enhances the model's ability to handle spatial details required for accurate localization of the human body joints. Extensive experiments demonstrate the effectiveness and generalization ability of our model, achieving competitive performance on benchmark datasets.

[375]  arXiv:2405.03056 [pdf, other]
Title: Convolutional Learning on Directed Acyclic Graphs
Subjects: Machine Learning (cs.LG)

We develop a novel convolutional architecture tailored for learning from data defined over directed acyclic graphs (DAGs). DAGs can be used to model causal relationships among variables, but their nilpotent adjacency matrices pose unique challenges towards developing DAG signal processing and machine learning tools. To address this limitation, we harness recent advances offering alternative definitions of causal shifts and convolutions for signals on DAGs. We develop a novel convolutional graph neural network that integrates learnable DAG filters to account for the partial ordering induced by the graph topology, thus providing valuable inductive bias to learn effective representations of DAG-supported data. We discuss the salient advantages and potential limitations of the proposed DAG convolutional network (DCN) and evaluate its performance on two learning tasks using synthetic data: network diffusion estimation and source identification. DCN compares favorably relative to several baselines, showcasing its promising potential.

[376]  arXiv:2405.03057 [pdf, ps, other]
Title: Verifying SQL Queries using Theories of Tables and Relations
Subjects: Logic in Computer Science (cs.LO)

We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite relations. Furthermore, we add the ability to reason about tables with null values by introducing a theory of nullable sorts based on an extension of the theory of algebraic datatypes. We implemented solvers for these theories in the SMT solver cvc5 and evaluated them on a set of benchmarks derived from public sets of SQL equivalence problems.

[377]  arXiv:2405.03058 [pdf, other]
Title: Enhancing High-Level Synthesis with Automated Pragma Insertion and Code Transformation Framework
Subjects: Software Engineering (cs.SE)

High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations.
To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.

[378]  arXiv:2405.03059 [pdf, other]
Title: Active Preference Learning for Ordering Items In- and Out-of-sample
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Learning an ordering of items based on noisy pairwise comparisons is useful when item-specific labels are difficult to assign, for example, when annotators have to make subjective assessments. Algorithms have been proposed for actively sampling comparisons of items to minimize the number of annotations necessary for learning an accurate ordering. However, many ignore shared structure between items, treating them as unrelated, limiting sample efficiency and precluding generalization to new items. In this work, we study active learning with pairwise preference feedback for ordering items with contextual attributes, both in- and out-of-sample. We give an upper bound on the expected ordering error incurred by active learning strategies under a logistic preference model, in terms of the aleatoric and epistemic uncertainty in comparisons, and propose two algorithms designed to greedily minimize this bound. We evaluate these algorithms in two realistic image ordering tasks, including one with comparisons made by human annotators, and demonstrate superior sample efficiency compared to non-contextual ranking approaches and active preference learning baselines.

[379]  arXiv:2405.03060 [pdf, other]
Title: Tree-based Ensemble Learning for Out-of-distribution Detection
Subjects: Machine Learning (cs.LG)

Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples. The TOOD detection mechanism is based on computing pairwise hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model through in-distribution training samples. Our approach is interpretable and robust for its tree-based nature. Furthermore, our approach is efficient, flexible to various machine learning tasks, and can be easily generalized to unsupervised setting. Extensive experiments are conducted to show the proposed method outperforms other state-of-the-art out-of-distribution detection methods in distinguishing the in-distribution from out-of-distribution on various tabular, image, and text data.

[380]  arXiv:2405.03064 [pdf, other]
Title: RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.

[381]  arXiv:2405.03065 [pdf, ps, other]
Title: Powering the Future of IoT: Federated Learning for Optimized Power Consumption and Enhanced Privacy
Subjects: Cryptography and Security (cs.CR)

The widespread use of the Internet of Things has led to the development of large amounts of perception data, making it necessary to develop effective and scalable data analysis tools. Federated Learning emerges as a promising paradigm to address the inherent challenges of power consumption and data privacy in IoT environments. This paper explores the transformative potential of FL in enhancing the longevity of IoT devices by mitigating power consumption and enhancing privacy and security measures. We delve into the intricacies of FL, elucidating its components and applications within IoT ecosystems. Additionally, we discuss the critical characteristics and challenges of IoT, highlighting the need for such machine learning solutions in processing perception data. While FL introduces many benefits for IoT sustainability, it also has limitations. Through a comprehensive discussion and analysis, this paper elucidates the opportunities and constraints of FL in shaping the future of sustainable and secure IoT systems. Our findings highlight the importance of developing new approaches and conducting additional research to maximise the benefits of FL in creating a secure and privacy-focused IoT environment.

[382]  arXiv:2405.03066 [pdf, ps, other]
Title: A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)
Subjects: Emerging Technologies (cs.ET)

Electronic Health Records (EHRs) play an important role in the healthcare system. However, their complexity and vast volume pose significant challenges to data interpretation and analysis. Recent advancements in Artificial Intelligence (AI), particularly the development of Large Language Models (LLMs), open up new opportunities for researchers in this domain. Although prior studies have demonstrated their potential in language understanding and processing in the context of EHRs, a comprehensive scoping review is lacking. This study aims to bridge this research gap by conducting a scoping review based on 329 related papers collected from OpenAlex. We first performed a bibliometric analysis to examine paper trends, model applications, and collaboration networks. Next, we manually reviewed and categorized each paper into one of the seven identified topics: named entity recognition, information extraction, text similarity, text summarization, text classification, dialogue system, and diagnosis and prediction. For each topic, we discussed the unique capabilities of LLMs, such as their ability to understand context, capture semantic relations, and generate human-like text. Finally, we highlighted several implications for researchers from the perspectives of data resources, prompt engineering, fine-tuning, performance measures, and ethical concerns. In conclusion, this study provides valuable insights into the potential of LLMs to transform EHR research and discusses their applications and ethical considerations.

[383]  arXiv:2405.03067 [pdf, other]
Title: Automated Deep Learning Optimization via DSL-Based Source Code Transformation
Comments: 12 pages, 6 figures
Journal-ref: In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)
Subjects: Software Engineering (cs.SE)

As deep learning models become increasingly bigger and more complex, it is critical to improve model training and inference efficiency. Though a variety of highly optimized libraries and packages (known as DL kernels) have been developed, it is tedious and time-consuming to figure out which kernel to use, where to use, and how to use them correctly. To address this challenge, we propose an Automated Deep learning OPTimization approach called Adopter. We design a Domain-Specific Language (DSL) to represent DL model architectures and leverage this DSL to specify model transformation rules required to integrate a DL kernel into a model. Given the source code of a DL model and the transformation rules for a set of kernels, Adopter first performs inter-procedural analysis to identify and express the model architecture in our DSL. Then, Adopter performs scope analysis and sub-sequence matching to identify locations in the model architecture where the transformation rules can be applied. Finally, Adopter proposes a synthesis-based code transformation method to apply the transformation rule. We curated a benchmark with 199 models from Hugging Face and a diverse set of DL kernels. We found that, compared to a state-of-the-art automated code transformation technique, Adopter helps improve the precision and recall by 3% and 56%, respectively. An in-depth analysis of 9 models revealed that on average, Adopter improved the training speed by 22.7% while decreasing the GPU memory usage by 10.5%.

[384]  arXiv:2405.03070 [pdf, other]
Title: Layered Graph Security Games
Comments: In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. AAAI Press, 2024
Subjects: Computer Science and Game Theory (cs.GT)

Security games model strategic interactions in adversarial real-world applications. Such applications often involve extremely large but highly structured strategy sets (e.g., selecting a distribution over all patrol routes in a given graph). In this paper, we represent each player's strategy space using a layered graph whose paths represent an exponentially large strategy space. Our formulation entails not only classic pursuit-evasion games, but also other security games, such as those modeling anti-terrorism and logistical interdiction. We study two-player zero-sum games under two distinct utility models: linear and binary utilities. We show that under linear utilities, Nash equilibrium can be computed in polynomial time, while binary utilities may lead to situations where even computing a best-response is computationally intractable. To this end, we propose a practical algorithm based on incremental strategy generation and mixed integer linear programs. We show through extensive experiments that our algorithm efficiently computes $\epsilon$-equilibrium for many games of interest. We find that target values and graph structure often have a larger influence on running times as compared to the size of the graph per se.

[385]  arXiv:2405.03075 [pdf, other]
Title: AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection
Comments: 12 pages, 6 figures, accepted as Short paper at HCII 2024 (this https URL)
Subjects: Machine Learning (cs.LG)

Anomaly detection, a critical facet in data analysis, involves identifying patterns that deviate from expected behavior. This research addresses the complexities inherent in anomaly detection, exploring challenges and adapting to sophisticated malicious activities. With applications spanning cybersecurity, healthcare, finance, and surveillance, anomalies often signify critical information or potential threats. Inspired by the success of Anomaly Generative Adversarial Network (AnoGAN) in image domains, our research extends its principles to tabular data. Our contributions include adapting AnoGAN's principles to a new domain and promising advancements in detecting previously undetectable anomalies. This paper delves into the multifaceted nature of anomaly detection, considering the dynamic evolution of normal behavior, context-dependent anomaly definitions, and data-related challenges like noise and imbalances.

[386]  arXiv:2405.03076 [pdf, other]
Title: Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management
Comments: 8 pages, 5 figures, submitted to 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)
Subjects: Multiagent Systems (cs.MA)

The digitization of traffic sensing infrastructure has significantly accumulated an extensive traffic data warehouse, which presents unprecedented challenges for transportation analytics. The complexities associated with querying large-scale multi-table databases require specialized programming expertise and labor-intensive development. Additionally, traditional analysis methods have focused mainly on numerical data, often neglecting the semantic aspects that could enhance interpretability and understanding. Furthermore, real-time traffic data access is typically limited due to privacy concerns. To bridge this gap, the integration of Large Language Models (LLMs) into the domain of traffic management presents a transformative approach to addressing the complexities and challenges inherent in modern transportation systems. This paper proposes an intelligent online chatbot, TP-GPT, for efficient customized transportation surveillance and management empowered by a large real-time traffic database. The innovative framework leverages contextual and generative intelligence of language models to generate accurate SQL queries and natural language interpretations by employing transportation-specialized prompts, Chain-of-Thought prompting, few-shot learning, multi-agent collaboration strategy, and chat memory. Experimental study demonstrates that our approach outperforms state-of-the-art baselines such as GPT-4 and PaLM 2 on a challenging traffic-analysis benchmark TransQuery. TP-GPT would aid researchers and practitioners in real-time transportation surveillance and management in a privacy-preserving, equitable, and customizable manner.

[387]  arXiv:2405.03080 [pdf, other]
Title: Homophilic organization of egocentric communities in ICT services
Comments: 8 pages, 7 figures, 1 table
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

Members of a society can be characterized by a large number of features, such as gender, age, ethnicity, religion, social status, and shared activities. One of the main tie-forming factors between individuals in human societies is homophily, the tendency of being attracted to similar others. Homophily has been mainly studied with focus on one of the features and little is known about the roles of similarities of different origins in the formation of communities. To close this gap, we analyze three datasets from Information and Communications Technology (ICT) services, namely, two online social networks and a network deduced from mobile phone calls, in all of which metadata about individual features are available. We identify communities within egocentric networks and surprisingly find that the larger the community is, the more overlap is found between features of its members and the ego. We interpret this finding in terms of the effort needed to manage the communities; the larger diversity requires more effort such that to maintain a large diverse group may exceed the capacity of the members. As the ego reaches out to her alters on an ICT service, we observe that the first alter in each community tends to have a higher feature overlap with the ego than the rest. Moreover the feature overlap of the ego with all her alters displays a non-monotonic behaviors as a function of the ego's degree. We propose a simple mechanism of how people add links in their egocentric networks of alters that reproduces all the empirical observations and shows the reason behind non-monotonic tendency of the egocentric feature overlap as a function of the ego's degree.

[388]  arXiv:2405.03081 [pdf, other]
Title: Design optimization in unilateral contact using pressure constraints and Bayesian optimization
Subjects: Numerical Analysis (math.NA)

Design optimization problems, e.g., shape optimization, that involve deformable bodies in unilateral contact are challenging as they require robust contact solvers, complex optimization methods that are typically gradient-based, and sensitivity derivations. Notably, the problems are nonsmooth, adding significant difficulty to the optimization process. We study design optimization problems in frictionless unilateral contact subject to pressure constraints, using both gradient-based and gradient-free optimization methods, namely Bayesian optimization. The contact simulation problem is solved via the mortar contact and finite element methods. For the gradient-based method, we use the direct differentiation method to compute the sensitivities of the cost and constraint function with respect to the design variables. Then, we use Ipopt to solve the optimization problems. For the gradient-free approach, we use a constrained Bayesian optimization algorithm based on the standard Gaussian Process surrogate model. We present numerical examples that control the contact pressure, inspired by real-life engineering applications, to demonstrate the effectiveness, strengths and shortcomings of both methods. Our results suggest that both optimization methods perform reasonably well for these nonsmooth problems.

[389]  arXiv:2405.03082 [pdf, other]
Title: Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning
Comments: Accepted in ICML 2024
Subjects: Machine Learning (cs.LG)

Reinforcement learning with multiple, potentially conflicting objectives is pervasive in real-world applications, while this problem remains theoretically under-explored. This paper tackles the multi-objective reinforcement learning (MORL) problem and introduces an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals. Notably, we provide the first analysis of finite-time Pareto-stationary convergence and corresponding sample complexity in both discounted and average reward settings. Our approach has two salient features: (a) MOAC mitigates the cumulative estimation bias resulting from finding an optimal common gradient descent direction out of stochastic samples. This enables provable convergence rate and sample complexity guarantees independent of the number of objectives; (b) With proper momentum coefficient, MOAC initializes the weights of individual policy gradients using samples from the environment, instead of manual initialization. This enhances the practicality and robustness of our algorithm. Finally, experiments conducted on a real-world dataset validate the effectiveness of our proposed method.

[390]  arXiv:2405.03084 [pdf, ps, other]
Title: Analyzing Emotional Trends from X platform using SenticNet: A Comparative Analysis with Cryptocurrency Price
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This study delves into the relationship between emotional trends from X platform data and the market dynamics of well-known cryptocurrencies Cardano, Binance, Fantom, Matic, and Ripple over the period from October 2022 to March 2023. Leveraging SenticNet, we identified emotions like Fear and Anxiety, Rage and Anger, Grief and Sadness, Delight and Pleasantness, Enthusiasm and Eagerness, and Delight and Joy. Following data extraction, we segmented each month into bi-weekly intervals, replicating this process for price data obtained from Finance-Yahoo. Consequently, a comparative analysis was conducted, establishing connections between emotional trends observed across bi-weekly intervals and cryptocurrency prices, uncovering significant correlations between emotional sentiments and coin valuations.

[391]  arXiv:2405.03085 [pdf, other]
Title: Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation
Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) have made significant strides in information acquisition. However, their overreliance on potentially flawed parametric knowledge leads to hallucinations and inaccuracies, particularly when handling long-tail, domain-specific queries. Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge. Nevertheless, the retrieved long-context documents often contain noisy, irrelevant information alongside vital knowledge, negatively diluting LLMs' attention. Inspired by the supportive role of essential concepts in individuals' reading comprehension, we propose a novel concept-based RAG framework with the Abstract Meaning Representation (AMR)-based concept distillation algorithm. The proposed algorithm compresses the cluttered raw retrieved documents into a compact set of crucial concepts distilled from the informative nodes of AMR by referring to reliable linguistic features. The concepts explicitly constrain LLMs to focus solely on vital information in the inference process. We conduct extensive experiments on open-domain question-answering datasets to empirically evaluate the proposed method's effectiveness. The results indicate that the concept-based RAG framework outperforms other baseline methods, particularly as the number of supporting documents increases, while also exhibiting robustness across various backbone LLMs. This emphasizes the distilled concepts are informative for augmenting the RAG process by filtering out interference information. To the best of our knowledge, this is the first work introducing AMR to enhance the RAG, presenting a potential solution to augment inference performance with semantic-based context compression.

[392]  arXiv:2405.03089 [pdf, other]
Title: Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition
Subjects: Machine Learning (cs.LG)

Deep Neural Networks (DNNs) have achieved remarkable success in addressing many previously unsolvable tasks. However, the storage and computational requirements associated with DNNs pose a challenge for deploying these trained models on resource-limited devices. Therefore, a plethora of compression and pruning techniques have been proposed in recent years. Low-rank decomposition techniques are among the approaches most utilized to address this problem. Compared to post-training compression, compression-promoted training is still under-explored. In this paper, we present a theoretically-justified novel approach, termed Low-Rank Induced Training (LoRITa), that promotes low-rankness through the composition of linear layers and compresses by using singular value truncation. This is achieved without the need to change the structure at inference time or require constrained and/or additional optimization, other than the standard weight decay regularization. Moreover, LoRITa eliminates the need to (i) initialize with pre-trained models and (ii) specify rank selection prior to training. Our experimental results (i) demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 on Convolutional Neural Networks, and (ii) illustrate that we achieve either competitive or SOTA results when compared to leading structured pruning methods in terms of FLOPs and parameters drop.

[393]  arXiv:2405.03090 [pdf, other]
Title: A continuum and computational framework for viscoelastodynamics: III. A nonlinear theory
Subjects: Numerical Analysis (math.NA)

We continue our investigation of viscoelasticity by extending the Holzapfel-Simo approach discussed in Part I to the fully nonlinear regime. By scrutinizing the relaxation property for the non-equilibrium stresses, it is revealed that a kinematic assumption akin to the Green-Naghdi type is necessary in the design of the potential. This insight underscores a link between the so-called additive plasticity and the viscoelasticity model under consideration, further inspiring our development of a nonlinear viscoelasticity theory. Our strategy is based on Hill's hyperelasticity framework and leverages the concept of generalized strains. Notably, the adopted kinematic assumption makes the proposed theory fundamentally different from the existing models rooted in the notion of the intermediate configuration. The computation aspects, including the consistent linearization, constitutive integration, and modular implementation, are addressed in detail. A suite of numerical examples is provided to demonstrate the capability of the proposed model in characterizing viscoelastic material behaviors at large strains.

[394]  arXiv:2405.03091 [pdf, ps, other]
Title: Research on Image Recognition Technology Based on Multimodal Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. In this project, multiple cameras developed by Microsoft Kinect were used to collect corresponding bone point data based on acquiring conventional images. In this way, the motion features in the image can be extracted. Ultimately, the behavioral characteristics discerned through both approaches are synthesized to facilitate the precise identification and categorization of behaviors. The performance of the suggested algorithm was evaluated using the MSR3D data set. The findings from these experiments indicate that the accuracy in recognizing behaviors remains consistently high, suggesting that the algorithm is reliable in various scenarios. Additionally, the tests demonstrate that the algorithm substantially enhances the accuracy of detecting pedestrian behaviors in video footage.

[395]  arXiv:2405.03095 [pdf, other]
Title: Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Subjects: Machine Learning (cs.LG); Mathematical Physics (math-ph)

Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation data loss directly constrains and measures the model output, while other loss functions indirectly model the performance of the network, which can be classified as model loss. However, this alternative approach lacks a thorough understanding of its underlying mechanisms, including theoretical foundations and rigorous characterization of various phenomena. This work focuses on investigating how different loss functions impact the training of neural networks for solving PDEs. We discover a stable loss-jump phenomenon: when switching the loss function from the data loss to the model loss, which includes different orders of derivative information, the neural network solution significantly deviates from the exact solution immediately. Further experiments reveal that this phenomenon arises from the different frequency preferences of neural networks under different loss functions. We theoretically analyze the frequency preference of neural networks under model loss. This loss-jump phenomenon provides a valuable perspective for examining the underlying mechanisms of neural networks in solving PDEs.

[396]  arXiv:2405.03097 [pdf, other]
Title: To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
Comments: Published as a conference paper at ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

LLMs have been found to memorize training textual sequences and regurgitate verbatim said sequences during text generation time. This fact is known to be the cause of privacy and related (e.g., copyright) problems. Unlearning in LLMs then takes the form of devising new algorithms that will properly deal with these side-effects of memorized data, while not hurting the model's utility. We offer a fresh perspective towards this goal, namely, that each textual sequence to be forgotten should be treated differently when being unlearned based on its degree of memorization within the LLM. We contribute a new metric for measuring unlearning quality, an adversarial attack showing that SOTA algorithms lacking this perspective fail for privacy, and two new unlearning methods based on Gradient Ascent and Task Arithmetic, respectively. A comprehensive performance evaluation across an extensive suite of NLP tasks then mapped the solution space, identifying the best solutions under different scales in model capacities and forget set sizes and quantified the gains of the new approaches.

[397]  arXiv:2405.03098 [pdf, other]
Title: FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models
Subjects: Computation and Language (cs.CL)

Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based on probability metrics, fall short in revealing the nuanced and implicit biases present in various contexts. To address this challenge, we propose the FairMonitor framework and adopt a static-dynamic detection method for a comprehensive evaluation of stereotypes and biases in LLMs. The static component consists of a direct inquiry test, an implicit association test, and an unknown situation test, including 10,262 open-ended questions with 9 sensitive factors and 26 educational scenarios. And it is effective for evaluating both explicit and implicit biases. Moreover, we utilize the multi-agent system to construst the dynamic scenarios for detecting subtle biases in more complex and realistic setting. This component detects the biases based on the interaction behaviors of LLMs across 600 varied educational scenarios. The experimental results show that the cooperation of static and dynamic methods can detect more stereotypes and biased in LLMs.

[398]  arXiv:2405.03099 [pdf, other]
Title: SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition
Comments: Accepted in ICDAR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present SketchGPT, a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion, and an interpretation case study for sketch recognition. By mapping complex sketches into simplified sequences of abstract primitives, our approach significantly streamlines the input for autoregressive modeling. SketchGPT leverages the next token prediction objective strategy to understand sketch patterns, facilitating the creation and completion of drawings and also categorizing them accurately. This proposed sketch representation strategy aids in overcoming existing challenges of autoregressive modeling for continuous stroke data, enabling smoother model training and competitive performance. Our findings exhibit SketchGPT's capability to generate a diverse variety of drawings by adding both qualitative and quantitative comparisons with existing state-of-the-art, along with a comprehensive human evaluation study. The code and pretrained models will be released on our official GitHub.

[399]  arXiv:2405.03101 [pdf, ps, other]
Title: Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications
Subjects: Information Theory (cs.IT)

A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meeting the power consumption requirements of the RISs. A block coordinate descent (BCD) algorithm based on the penalty-based method and successive convex approximation (SCA) is employed to alternatively optimize the active beamforming at the BS and the phase shifts, as well as amplitude coefficients of two RISs. Simulation results show that the required power consumption at the BS for the proposed double self-sustainable RISs system is significantly reduced compared to conventional RIS systems.

[400]  arXiv:2405.03103 [pdf, other]
Title: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Large language models (LLMs) have recently achieved state-of-the-art performance across various tasks, yet due to their large computational requirements, they struggle with strict latency and power demands. Deep neural network (DNN) quantization has traditionally addressed these limitations by converting models to low-precision integer formats. Yet recently alternative formats, such as Normal Float (NF4), have been shown to consistently increase model accuracy, albeit at the cost of increased chip area. In this work, we first conduct a large-scale analysis of LLM weights and activations across 30 networks to conclude most distributions follow a Student's t-distribution. We then derive a new theoretically optimal format, Student Float (SF4), with respect to this distribution, that improves over NF4 across modern LLMs, for example increasing the average accuracy on LLaMA2-7B by 0.76% across tasks. Using this format as a high-accuracy reference, we then propose augmenting E2M1 with two variants of supernormal support for higher model accuracy. Finally, we explore the quality and performance frontier across 11 datatypes, including non-traditional formats like Additive-Powers-of-Two (APoT), by evaluating their model accuracy and hardware complexity. We discover a Pareto curve composed of INT4, E2M1, and E2M1 with supernormal support, which offers a continuous tradeoff between model accuracy and chip area. For example, E2M1 with supernormal support increases the accuracy of Phi-2 by up to 2.19% with 1.22% area overhead, enabling more LLM-based applications to be run at four bits.

[401]  arXiv:2405.03104 [pdf, other]
Title: GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Comments: Accepted in ICDAR 2024 (Athens, Greece)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents GeoContrastNet, a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs), emphasizing the significant role of geometric features. We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework, demonstrating promising results in both link prediction and semantic entity recognition performance. Our findings reveal that combining both geometric and visual features could match the capabilities of large DU models that rely heavily on Optical Character Recognition (OCR) features in terms of performance accuracy and efficiency. This approach underscores the critical importance of relational layout information between the named text entities in a semi-structured layout of a page. Specifically, our results highlight the model's proficiency in identifying key-value relationships within the FUNSD dataset for forms and also discovering the spatial relationships in table-structured layouts for RVLCDIP business invoices. Our code and pretrained models will be accessible on our official GitHub.

[402]  arXiv:2405.03106 [pdf, other]
Title: Compression-based Privacy Preservation for Distributed Nash Equilibrium Seeking in Aggregative Games
Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)

This paper explores distributed aggregative games in multi-agent systems. Current methods for finding distributed Nash equilibrium require players to send original messages to their neighbors, leading to communication burden and privacy issues. To jointly address these issues, we propose an algorithm that uses stochastic compression to save communication resources and conceal information through random errors induced by compression. Our theoretical analysis shows that the algorithm guarantees convergence accuracy, even with aggressive compression errors used to protect privacy. We prove that the algorithm achieves differential privacy through a stochastic quantization scheme. Simulation results for energy consumption games support the effectiveness of our approach.

[403]  arXiv:2405.03109 [pdf, other]
Title: Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. Such ability stems from their capacity to identify common features shared between new and previously seen images while disregarding distractions such as background variations. However, for artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. In this paper, we propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches and encoding them using the pre-trained Vision Transformer (ViT) architecture. Specifically, we swap the class (CLS) token and patch tokens between the support and query sets to have the mutual attention, which enables each set to focus on the most useful information. This facilitates the strengthening of intra-class representations and promotes closer proximity between instances of the same class. For implementation, we adopt the ViT-based network architecture and utilize pre-trained model parameters obtained through self-supervision. By leveraging Masked Image Modeling as a self-supervised training task for pre-training, the pre-trained model yields semantically meaningful representations while successfully avoiding supervision collapse. We then employ a meta-learning method to fine-tune the last several layers and CLS token modules. Our strategy significantly reduces the num- ber of parameters that require fine-tuning while effectively uti- lizing the capability of pre-trained model. Extensive experiments show that our framework is simple, effective and computationally efficient, achieving superior performance as compared to the state-of-the-art baselines on five popular few-shot classification benchmarks under the 5-shot and 1-shot scenarios

[404]  arXiv:2405.03110 [pdf, other]
Title: Vector Quantization for Recommender Systems: A Review and Outlook
Subjects: Information Retrieval (cs.IR)

Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts with a comprehensive review of vector quantization techniques. It then explores systematic taxonomies of vector quantization methods for recommender systems (VQ4Rec), examining their applications from multiple perspectives. Further, it provides a thorough introduction to research efforts in diverse recommendation scenarios, including efficiency-oriented approaches and quality-oriented approaches. Finally, the survey analyzes the remaining challenges and anticipates future trends in VQ4Rec, including the challenges associated with the training of vector quantization, the opportunities presented by large language models, and emerging trends in multimodal recommender systems. We hope this survey can pave the way for future researchers in the recommendation community and accelerate their exploration in this promising field.

[405]  arXiv:2405.03111 [pdf, ps, other]
Title: An Active Inference Agent for Simulating Human Translation Processes in a Hierarchical Architecture: Integrating the Task Segment Framework and the HOF taxonomy
Authors: Michael Carl
Subjects: Computation and Language (cs.CL)

In this paper, we propose modelling human translation production as a hierarchy of three embedded translation processes. The proposed architecture replicates the temporal dynamics of keystroke production across sensorimotor, cognitive, and phenomenal layers. Utilizing data from the CRITT TPR-DB, the Task Segment Framework, and the HOF taxonomy, we demonstrate the temporal breakdown of the typing flow on distinct timelines within these three layers.

[406]  arXiv:2405.03113 [pdf, other]
Title: Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch.

[407]  arXiv:2405.03118 [pdf, other]
Title: Determined Multichannel Blind Source Separation with Clustered Source Model
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the other hand, NCPD preserves intrinsic structure but lacks interpretable latent factors, making it challenging to incorporate prior information as constraints. To address these limitations, we introduce a clustered source model based on nonnegative block-term decomposition (NBTD). This model defines blocks as outer products of vectors (clusters) and matrices (for spectral structure modeling), offering interpretable latent vectors. Moreover, it enables straightforward integration of orthogonality constraints to ensure independence among source images. Experimental results demonstrate that our proposed method outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.

[408]  arXiv:2405.03119 [pdf, ps, other]
Title: DAFT-Spread Affine Frequency Division Multiple Access for Downlink Transmission
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In this paper, we propose a discrete affine Fourier transform (DAFT)-spread AFDMA scheme based on the properties of the AFDM systems, named DAFT-s-AFDMA to significantly reduce the PAPR by resorting to the DAFT. We formulate the transmitted time-domain signals of the proposed DAFT-s-AFDMA schemes with localized and interleaved chirp subcarrier allocation strategies. Accordingly, we derive the guidelines for setting the DAFT parameters, revealing the insights of PAPR reduction. Finally, simulation results of PAPR comparison in terms of the complementary cumulative distribution function (CCDF) show that the proposed DAFT-s-AFDMA schemes with localized and interleaved strategies can both attain better PAPR performances than the conventional O-AFDMA scheme.

[409]  arXiv:2405.03121 [pdf, other]
Title: AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
Comments: 14 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the need for labeled data. Additionally, the integration of a diffusion model with a variance adapter allows for the generation of diverse and controllable facial animations. This method not only demonstrates AniTalker's capability to create detailed and realistic facial movements but also underscores its potential in crafting dynamic avatars for real-world applications. Synthetic results can be viewed at https://github.com/X-LANCE/AniTalker.

[410]  arXiv:2405.03122 [pdf, other]
Title: Automatic Retrieval-augmented Generation of 6G Network Specifications for Use Cases
Authors: Yun Tang, Weisi Guo
Comments: 7 pages, 6 figures, Submitted
Subjects: Networking and Internet Architecture (cs.NI)

6G Open Radio Access Networks (ORAN) promises to open data interfaces to enable plug-and-play service Apps, many of which are consumer and business-facing. Opening up 6G access lowers the barrier to innovation but raises the challenge that the required communication specifications are not fully known to all service designers. As such, business innovators must either be familiar with 6G standards or consult with experts. Enabling consistent, unbiased, rapid, and low-cost requirement assessment and specification generation is crucial to the ORAN innovation ecosystem.
Here, we discuss our initiative to bridge service specification generation gaps between network service providers and business innovators. We first review the state-of-the-art and motivation in 6G plug-and-play services and capabilities, potential use cases, and relevant advances in Large Language Models (LLMs). We identify an ample innovation space for hybrid use cases that may require diverse and variational wireless functionalities across its operating time. We show that the network specification can be automated and present the first automatic retrieval-augmented specification generation (RAG) framework for 6G use cases. To enable public acceptance and feedback, a website interface is also published for the research and industrial community to experiment with the RAG framework. We hope this review highlights the need and the emerging foundation models that advance this area and motivate researchers to engage with the framework.

[411]  arXiv:2405.03125 [pdf, other]
Title: MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model
Comments: submitted to IEEE conference
Subjects: Information Theory (cs.IT)

Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the state space, enabling feature extraction and encoding processes to operate with linear complexity. It also incorporates channel state information (CSI) via a newly proposed CSI embedding method. This method deploys a shared CSI encoding module within both the encoder and decoder to encode and inject the CSI into each VSSM-CA block, improving the adaptability of a single model to varying channel conditions. Experimental results show that MambaJSCC not only outperforms Swin Transformer based JSCC (SwinJSCC) but also significantly reduces parameter size, computational overhead, and inference delay (ID). For example, with employing an equal number of the VSSM-CA blocks and the Swin Transformer blocks, MambaJSCC achieves a 0.48 dB gain in peak-signal-to-noise ratio (PSNR) over SwinJSCC while requiring only 53.3% multiply-accumulate operations, 53.8% of the parameters, and 44.9% of ID.

[412]  arXiv:2405.03131 [pdf, other]
Title: WDMoE: Wireless Distributed Large Language Models with Mixture of Experts
Comments: submitted to IEEE conference
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but how wireless communications can support LLMs has not been extensively studied. In this paper, we propose a wireless distributed LLMs paradigm based on Mixture of Experts (MoE), named WDMoE, deploying LLMs collaboratively across edge servers of base station (BS) and mobile devices in the wireless communications system. Specifically, we decompose the MoE layer in LLMs by deploying the gating network and the preceding neural network layer at BS, while distributing the expert networks across the devices. This arrangement leverages the parallel capabilities of expert networks on distributed devices. Moreover, to overcome the instability of wireless communications, we design an expert selection policy by taking into account both the performance of the model and the end-to-end latency, which includes both transmission delay and inference delay. Evaluations conducted across various LLMs and multiple datasets demonstrate that WDMoE not only outperforms existing models, such as Llama 2 with 70 billion parameters, but also significantly reduces end-to-end latency.

[413]  arXiv:2405.03132 [pdf, other]
Title: A Multi-Agent Rollout Approach for Highway Bottleneck Decongenston in Mixed Autonomy
Subjects: Multiagent Systems (cs.MA)

The integration of autonomous vehicles (AVs) into the existing transportation infrastructure offers a promising solution to alleviate congestion and enhance mobility. This research explores a novel approach to traffic optimization by employing a multi-agent rollout approach within a mixed autonomy environment. The study concentrates on coordinating the speed of human-driven vehicles by longitudinally controlling AVs, aiming to dynamically optimize traffic flow and alleviate congestion at highway bottlenecks in real-time. We model the problem as a decentralized partially observable Markov decision process (Dec-POMDP) and propose an improved multi-agent rollout algorithm. By employing agent-by-agent policy iterations, our approach implicitly considers cooperation among multiple agents and seamlessly adapts to complex scenarios where the number of agents dynamically varies. Validated in a real-world network with varying AV penetration rates and traffic flow, the simulations demonstrate that the multi-agent rollout algorithm significantly enhances performance, reducing average travel time on bottleneck segments by 9.42% with a 10% AV penetration rate.

[414]  arXiv:2405.03133 [pdf, other]
Title: Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Comments: 21 pages, 12 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective. Recently, a fully-differentiable MoE architecture, SMEAR, was proposed (Muqeeth et al., 2023), which softly merges experts in the parameter space; nevertheless, its effectiveness was only demonstrated in downstream fine-tuning on classification tasks. In this paper, we present Lory, the first approach that scales such architectures to autoregressive language model pre-training. Lory introduces two key techniques: (1) a causal segment routing strategy that achieves high efficiency for expert merging operations while preserving the autoregressive nature of language models; (2) a similarity-based data batching method that encourages expert specialization by grouping similar documents in training instances. We pre-train a series of Lory models on 150B tokens from scratch, with up to 32 experts and 30B (1.5B active) parameters. Experimental results show significant performance gains over parameter-matched dense models on both perplexity (+13.9%) and a variety of downstream tasks (+1.5%-11.1%). Despite segment-level routing, Lory models achieve competitive performance compared to state-of-the-art MoE models with token-level routing. We further demonstrate that the trained experts in Lory capture domain-level specialization without supervision. Our work highlights the potential of fully-differentiable MoE architectures for language model pre-training and advocates future research in this area.

[415]  arXiv:2405.03134 [pdf, other]
Title: Transhuman Ansambl - Voice Beyond Language
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper we present the design and development of the Transhuman Ansambl, a novel interactive singing-voice interface which senses its environment and responds to vocal input with vocalisations using human voice. Designed for live performance with a human performer and as a standalone sound installation, the ansambl consists of sixteen bespoke virtual singers arranged in a circle. When performing live, the virtual singers listen to the human performer and respond to their singing by reading pitch, intonation and volume cues. In a standalone sound installation mode, singers use ultrasonic distance sensors to sense audience presence. Developed as part of the 1st author's practice-based PhD and artistic practice as a live performer, this work employs the singing-voice to explore voice interactions in HCI beyond language, and innovative ways of live performing. How is technology supporting the effect of intimacy produced through voice? Does the act of surrounding the audience with responsive virtual singers challenge the traditional roles of performer-listener? To answer these questions, we draw upon the 1st author's experience with the system, and the interdisciplinary field of voice studies that consider the voice as the sound medium independent of language, capable of enacting a reciprocal connection between bodies.

[416]  arXiv:2405.03136 [pdf, other]
Title: FOBNN: Fast Oblivious Binarized Neural Network Inference
Subjects: Cryptography and Security (cs.CR)

The superior performance of deep learning has propelled the rise of Deep Learning as a Service, enabling users to transmit their private data to service providers for model execution and inference retrieval. Nevertheless, the primary concern remains safeguarding the confidentiality of sensitive user data while optimizing the efficiency of secure protocols. To address this, we develop a fast oblivious binarized neural network inference framework, FOBNN. Specifically, we customize binarized convolutional neural networks to enhance oblivious inference, design two fast algorithms for binarized convolutions, and optimize network structures experimentally under constrained costs. Initially, we meticulously analyze the range of intermediate values in binarized convolutions to minimize bit representation, resulting in the Bit Length Bounding (BLB) algorithm. Subsequently, leveraging the efficiency of bitwise operations in BLB, we further enhance performance by employing pure bitwise operations for each binary digit position, yielding the Layer-wise Bit Accumulation (LBA) algorithm. Theoretical analysis validates FOBNN's security and indicates up to $2 \times$ improvement in computational and communication costs compared to the state-of-the-art method. We demonstrates our framework's effectiveness in RNA function prediction within bioinformatics. Rigorous experimental assessments confirm that our oblivious inference solutions not only maintain but often exceed the original accuracy, surpassing prior efforts.

[417]  arXiv:2405.03138 [pdf, other]
Title: CRAFT: Extracting and Tuning Cultural Instructions from the Wild
Comments: 6 pages
Subjects: Computation and Language (cs.CL)

Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introduces a novel pipeline for extracting high-quality, culturally-related instruction tuning datasets from vast unstructured corpora. We utilize a self-instruction generation pipeline to identify cultural concepts and trigger instruction. By integrating with a general-purpose instruction tuning dataset, our model demonstrates enhanced capabilities in recognizing and understanding regional cultural nuances, thereby enhancing its reasoning capabilities. We conduct experiments across three regions: Singapore, the Philippines, and the United States, achieving performance improvement of up to 6%. Our research opens new avenues for extracting cultural instruction tuning sets directly from unstructured data, setting a precedent for future innovations in the field.

[418]  arXiv:2405.03140 [pdf, other]
Title: TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning
Comments: Accepted by ICML2024
Subjects: Machine Learning (cs.LG)

Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC.

[419]  arXiv:2405.03143 [pdf, ps, other]
Title: A novel fourth-order scheme for two-dimensional Riesz space fractional nonlinear reaction-diffusion equations and its optimal preconditioned solver
Subjects: Numerical Analysis (math.NA)

A novel fourth-order finite difference formula coupling the Crank-Nicolson explicit linearized method is proposed to solve Riesz space fractional nonlinear reaction-diffusion equations in two dimensions. Theoretically, under the Lipschitz assumption on the nonlinear term, the proposed high-order scheme is proved to be unconditionally stable and convergent in the discrete $L_2$-norm. Moreover, a $\tau$-matrix based preconditioner is developed to speed up the convergence of the conjugate gradient method with an optimal convergence rate (a convergence rate independent of mesh sizes) for solving the symmetric discrete linear system. Theoretical analysis shows that the spectra of the preconditioned matrices are uniformly bounded in the open interval $(3/8,2)$. To the best of our knowledge, this is the first attempt to develop a preconditioned iterative solver with a mesh-independent convergence rate for the linearized high-order scheme. Numerical examples are given to validate the accuracy of the scheme and the effectiveness of the proposed preconditioned solver.

[420]  arXiv:2405.03144 [pdf, other]
Title: PTQ4SAM: Post-Training Quantization for Segment Anything
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base, which is hardware-friendly. Extensive experimental results across various vision tasks (instance segmentation, semantic segmentation and object detection), datasets and model variants show the superiority of PTQ4SAM. For example, when quantizing SAM-L to 6-bit, we achieve lossless accuracy for instance segmentation, about 0.5\% drop with theoretical 3.9$\times$ acceleration. The code is available at \url{https://github.com/chengtao-lv/PTQ4SAM}.

[421]  arXiv:2405.03145 [pdf, other]
Title: Projection-Free Method for the Full Frank-Oseen Model of Liquid Crystals
Comments: 32 pages, 8 figures, 4 tables
Subjects: Numerical Analysis (math.NA)

Liquid crystals are materials that experience an intermediate phase where the material can flow like a liquid, but the molecules maintain an orientation order. The Frank-Oseen model is a continuum model of a liquid crystal. The model represents the liquid crystal orientation as a vector field and posits that the vector field minimizes some elastic energy subject to a pointwise unit length constraint, which is a nonconvex constraint. Previous numerical methods in the literature assumed restrictions on the physical constants or had regularity assumptions that ruled out point defects, which are important physical phenomena to model. We present a finite element discretization of the full Frank-Oseen model and a projection free gradient flow algorithm for the discrete problem in the spirit of Bartels (2016). We prove Gamma-convergence of the discrete to the continuous problem: weak convergence of subsequences of discrete minimizers and convergence of energies. We also prove that the gradient flow algorithm has a desirable energy decrease property. Our analysis only requires that the physical constants are positive, which presents challenges due to the additional nonlinearities from the elastic energy.

[422]  arXiv:2405.03146 [pdf, other]
Title: Quantifying the Capabilities of LLMs across Scale and Precision
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.

[423]  arXiv:2405.03150 [pdf, other]
Title: Video Diffusion Models: A Survey
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models

[424]  arXiv:2405.03151 [pdf, ps, other]
Title: Time Series Stock Price Forecasting Based on Genetic Algorithm (GA)-Long Short-Term Memory Network (LSTM) Optimization
Authors: Xinye Sha
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)

In this paper, a time series algorithm based on Genetic Algorithm (GA) and Long Short-Term Memory Network (LSTM) optimization is used to forecast stock prices effectively, taking into account the trend of the big data era. The data are first analyzed by descriptive statistics, and then the model is built and trained and tested on the dataset. After optimization and adjustment, the mean absolute error (MAE) of the model gradually decreases from 0.11 to 0.01 and tends to be stable, indicating that the model prediction effect is gradually close to the real value. The results on the test set show that the time series algorithm optimized based on Genetic Algorithm (GA)-Long Short-Term Memory Network (LSTM) is able to accurately predict the stock prices, and is highly consistent with the actual price trends and values, with strong generalization ability. The MAE on the test set is 2.41, the MSE is 9.84, the RMSE is 3.13, and the R2 is 0.87. This research result not only provides a novel stock price prediction method, but also provides a useful reference for financial market analysis using computer technology and big data.

[425]  arXiv:2405.03153 [pdf, ps, other]
Title: Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines
Comments: 5 pages, 2 tables, 1st HEAL Workshop at CHI Conference on Human Factors in Computing Systems, May 12, Honolulu, HI, USA 2024
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)

In the digital age, the prevalence of misleading news headlines poses a significant challenge to information integrity, necessitating robust detection mechanisms. This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines. Utilizing a dataset of 60 articles, sourced from both reputable and questionable outlets across health, science & tech, and business domains, we employ three LLMs- ChatGPT-3.5, ChatGPT-4, and Gemini-for classification. Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy, especially in cases with unanimous annotator agreement on misleading headlines. The study emphasizes the importance of human-centered evaluation in developing LLMs that can navigate the complexities of misinformation detection, aligning technical proficiency with nuanced human judgment. Our findings contribute to the discourse on AI ethics, emphasizing the need for models that are not only technically advanced but also ethically aligned and sensitive to the subtleties of human interpretation.

[426]  arXiv:2405.03155 [pdf, other]
Title: CushSense: Soft, Stretchable, and Comfortable Tactile-Sensing Skin for Physical Human-Robot Interaction
Comments: 8 pages, 8 figures, ICRA2024
Subjects: Robotics (cs.RO)

Whole-arm tactile feedback is crucial for robots to ensure safe physical interaction with their surroundings. This paper introduces CushSense, a fabric-based soft and stretchable tactile-sensing skin designed for physical human-robot interaction (pHRI) tasks such as robotic caregiving. Using stretchable fabric and hyper-elastic polymer, CushSense identifies contacts by monitoring capacitive changes due to skin deformation. CushSense is cost-effective ($\sim$US\$7 per taxel) and easy to fabricate. We detail the sensor design and fabrication process and perform characterization, highlighting its high sensing accuracy (relative error of 0.58%) and durability (0.054% accuracy drop after 1000 interactions). We also present a user study underscoring its perceived safety and comfort for the assistive task of limb manipulation. We open source all sensor-related resources on https://emprise.cs.cornell.edu/cushsense.

[427]  arXiv:2405.03158 [pdf, other]
Title: Decentralized Online Learning in General-Sum Stackelberg Games
Comments: Accepted for the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)
Subjects: Machine Learning (cs.LG)

We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner. We study two settings depending on the type of information for the follower: (1) the limited information setting where the follower only observes its own reward, and (2) the side information setting where the follower has extra side information about the leader's reward. We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting, but not necessarily so for the side information setting -- the follower can manipulate the leader's reward signals with strategic actions, and hence induce the leader's strategy to converge to an equilibrium that is better off for itself. Based on these insights, we study decentralized online learning for both players in the two settings. Our main contribution is to derive last-iterate convergence and sample complexity results in both settings. Notably, we design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy. Our theories are also supported by empirical results.

[428]  arXiv:2405.03159 [pdf, other]
Title: DeepMpMRI: Tensor-decomposition Regularized Learning for Fast and High-Fidelity Multi-Parametric Microstructural MR Imaging
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning has emerged as a promising approach for learning the nonlinear mapping between diffusion-weighted MR images and tissue parameters, which enables automatic and deep understanding of the brain microstructures. However, the efficiency and accuracy in the multi-parametric estimations are still limited since previous studies tend to estimate multi-parametric maps with dense sampling and isolated signal modeling. This paper proposes DeepMpMRI, a unified framework for fast and high-fidelity multi-parametric estimation from various diffusion models using sparsely sampled q-space data. DeepMpMRI is equipped with a newly designed tensor-decomposition-based regularizer to effectively capture fine details by exploiting the correlation across parameters. In addition, we introduce a Nesterov-based adaptive learning algorithm that optimizes the regularization parameter dynamically to enhance the performance. DeepMpMRI is an extendable framework capable of incorporating flexible network architecture. Experimental results demonstrate the superiority of our approach over 5 state-of-the-art methods in simultaneously estimating multi-parametric maps for various diffusion models with fine-grained details both quantitatively and qualitatively, achieving 4.5 - 22.5$\times$ acceleration compared to the dense sampling of a total of 270 diffusion gradients.

[429]  arXiv:2405.03162 [pdf, other]
Title: Advancing Multimodal Medical Capabilities of Gemini
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.

[430]  arXiv:2405.03164 [pdf, other]
Title: The Role of Predictive Uncertainty and Diversity in Embodied AI and Robot Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Uncertainty has long been a critical area of study in robotics, particularly when robots are equipped with analytical models. As we move towards the widespread use of deep neural networks in robots, which have demonstrated remarkable performance in research settings, understanding the nuances of uncertainty becomes crucial for their real-world deployment. This guide offers an overview of the importance of uncertainty and provides methods to quantify and evaluate it from an applications perspective.

[431]  arXiv:2405.03166 [pdf, other]
Title: An Efficient All-to-All GCD Algorithm for Low Entropy RSA Key Factorization
Authors: Elijah Pelofske
Subjects: Cryptography and Security (cs.CR)

RSA is an incredibly successful and useful asymmetric encryption algorithm. One of the types of implementation flaws in RSA is low entropy of the key generation, specifically the prime number creation stage. This can occur due to flawed usage of random prime number generator libraries, or on computers where there is a lack of a source of external entropy. These implementation flaws result in some RSA keys sharing prime factors, which means that the full factorization of the public modulus can be recovered incredibly efficiently by performing a computation GCD between the two public key moduli that share the prime factor. However, since one does not know which of the composite moduli share a prime factor a-priori, to determine if any such shared prime factors exist, an all-to-all GCD attack (also known as a batch GCD attack, or a bulk GCD attack) can be performed on the available public keys so as to recover any shared prime factors. This study describes a novel all-to-all batch GCD algorithm, which will be referred to as the binary tree batch GCD algorithm, that is more efficient than the current best batch GCD algorithm (the remainder tree batch GCD algorithm). A comparison against the best existing batch GCD method (which is a product tree followed by a remainder tree computation) is given using a dataset of random RSA moduli that are constructed such that some of the moduli share prime factors. This proposed binary tree batch GCD algorithm has better runtime than the existing remainder tree batch GCD algorithm, although asymptotically it has nearly identical scaling and its complexity is dependent on how many shared prime factors exist in the set of RSA keys. In practice, the implementation of the proposed binary tree batch GCD algorithm has a roughly 6x speedup compared to the standard remainder tree batch GCD approach.

[432]  arXiv:2405.03167 [pdf, other]
Title: TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation
Subjects: Information Retrieval (cs.IR)

Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are undifferentiated, which may lead the model to learn a larger number of easy samples in a single-minded manner while ignoring a smaller number of hard samples, thus reducing the model's generalization ability; (2) differentiated feature interaction encoders are designed to capture different interactions information but receive consistent supervision signals, thereby limiting the effectiveness of the encoder. To bridge the identified gaps, this paper introduces a novel CTR prediction framework by integrating the plug-and-play Twin Focus (TF) Loss, Sample Selection Embedding Module (SSEM), and Dynamic Fusion Module (DFM), named the Twin Focus Framework for CTR (TF4CTR). Specifically, the framework employs the SSEM at the bottom of the model to differentiate between samples, thereby assigning a more suitable encoder for each sample. Meanwhile, the TF Loss provides tailored supervision signals to both simple and complex encoders. Moreover, the DFM dynamically fuses the feature interaction information captured by the encoders, resulting in more accurate predictions. Experiments on five real-world datasets confirm the effectiveness and compatibility of the framework, demonstrating its capacity to enhance various representative baselines in a model-agnostic manner. To facilitate reproducible research, our open-sourced code and detailed running logs will be made available at: https://github.com/salmon1802/TF4CTR.

[433]  arXiv:2405.03169 [pdf, other]
Title: SOC-MartNet: A Martingale Neural Network for the Hamilton-Jacobi-Bellman Equation without Explicit inf H in Stochastic Optimal Controls
Subjects: Numerical Analysis (math.NA)

In this work, we propose a martingale based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the Hamiltonian $\inf_{u \in U} H(t,x,u, z,p)$, and stochastic optimal control problems with controls on both drift and volatility. We reformulate the HJB equations into a stochastic neural network learning process, i.e., training a control network and a value network such that the associated Hamiltonian process is minimized and the cost process becomes a martingale.To enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function based on the projection property of conditional expectations. Then, the control/value networks and the adversarial network are trained adversarially, such that the cost process is driven towards a martingale and the minimum principle is satisfied for the control.Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCP with a dimension up to $500$ in a small number of training epochs.

[434]  arXiv:2405.03170 [pdf, other]
Title: Oracle-Checker Scheme for Evaluating a Generative Large Language Model
Subjects: Computation and Language (cs.CL)

This work presents a novel approach called oracle-checker scheme for evaluating the answer given by a generative large language model (LLM). Two types of checkers are presented. The first type of checker follows the idea of property testing. The second type of checker follows the idea of program checking. Their applications are demonstrated in two separate contexts, entity extraction and paraphrase decision, respectively.

[435]  arXiv:2405.03176 [pdf, other]
Title: FIMP-HGA: A Novel Approach to Addressing the Partitioning Min-Max Weighted Matching Problem
Subjects: Neural and Evolutionary Computing (cs.NE)

The Partitioning Min-Max Weighted Matching (PMMWM) problem, being a practical NP-hard problem, integrates the task of partitioning the vertices of a bipartite graph into disjoint sets of limited size with the classical Maximum-Weight Perfect Matching (MPWM) problem. Initially introduced in 2015, the state-of-the-art method for addressing PMMWM is the MP$_{\text{LS}}$. In this paper, we present a novel approach, the Fast Iterative Match-Partition Hybrid Genetic Algorithm (FIMP-HGA), for addressing PMMWM. Similar to MP$_{\text{LS}}$, FIMP-HGA divides the solving into match and partition stages, iteratively refining the solution. In the match stage, we propose the KM-M algorithm, which reduces matching complexity through incremental adjustments, significantly enhancing runtime efficiency. For the partition stage, we introduce a Hybrid Genetic Algorithm (HGA) incorporating an elite strategy and design a Greedy Partition Crossover (GPX) operator alongside a Multilevel Local Search (MLS) to optimize individuals in the population. Population initialization employs various methods, including the multi-way Karmarkar-Karp (KK) algorithm, ensuring both quality and diversity. At each iteration, the bipartite graph is adjusted based on the current solution, aiming for continuous improvement. To conduct comprehensive experiments, we develop a new instance generation method compatible with existing approaches, resulting in four benchmark groups. Extensive experiments evaluate various algorithm modules, accurately assessing each module's impact on improvement. Evaluation results on our benchmarks demonstrate that the proposed FIMP-HGA significantly enhances solution quality compared to MP$_{\text{LS}}$, meanwhile reducing runtime by 3 to 20 times.

[436]  arXiv:2405.03177 [pdf, other]
Title: Transformer-based RGB-T Tracking with Channel and Spatial Feature Fusion
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Complementary RGB and TIR modalities enable RGB-T tracking to achieve competitive performance in challenging scenarios. Therefore, how to better fuse cross-modal features is the core issue of RGB-T tracking. Some previous methods either insufficiently fuse RGB and TIR features, or depend on intermediaries containing information from both modalities to achieve cross-modal information interaction. The former does not fully exploit the potential of using only RGB and TIR information of the template or search region for channel and spatial feature fusion, and the latter lacks direct interaction between the template and search area, which limits the model's ability to fully exploit the original semantic information of both modalities. To alleviate these limitations, we explore how to improve the performance of a visual Transformer by using direct fusion of cross-modal channels and spatial features, and propose CSTNet. CSTNet uses ViT as a backbone and inserts cross-modal channel feature fusion modules (CFM) and cross-modal spatial feature fusion modules (SFM) for direct interaction between RGB and TIR features. The CFM performs parallel joint channel enhancement and joint multilevel spatial feature modeling of RGB and TIR features and sums the features, and then globally integrates the sum feature with the original features. The SFM uses cross-attention to model the spatial relationship of cross-modal features and then introduces a convolutional feedforward network for joint spatial and channel integration of multimodal features. Comprehensive experiments show that CSTNet achieves state-of-the-art performance on three public RGB-T tracking benchmarks. Code is available at https://github.com/LiYunfengLYF/CSTNet.

[437]  arXiv:2405.03178 [pdf, other]
Title: POPDG: Popular 3D Dance Generation with PopDanceSet
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.

[438]  arXiv:2405.03181 [pdf, other]
Title: Collaborative Satellite Computing through Adaptive DNN Task Splitting and Offloading
Comments: Accepted by 29th IEEE Symposium on Computers and Communications (ISCC)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Satellite computing has emerged as a promising technology for next-generation wireless networks. This innovative technology provides data processing capabilities, which facilitates the widespread implementation of artificial intelligence (AI)-based applications, especially for image processing tasks involving deep neural network (DNN). With the limited computing resources of an individual satellite, independently handling DNN tasks generated by diverse user equipments (UEs) becomes a significant challenge. One viable solution is dividing a DNN task into multiple subtasks and subsequently distributing them across multiple satellites for collaborative computing. However, it is challenging to partition DNN appropriately and allocate subtasks into suitable satellites while ensuring load balancing. To this end, we propose a collaborative satellite computing system designed to improve task processing efficiency in satellite networks. Based on this system, a workload-balanced adaptive task splitting scheme is developed to equitably distribute the workload of DNN slices for collaborative inference, consequently enhancing the utilization of satellite computing resources. Additionally, a self-adaptive task offloading scheme based on a genetic algorithm (GA) is introduced to determine optimal offloading decisions within dynamic network environments. The numerical results illustrate that our proposal can outperform comparable methods in terms of task completion rate, delay, and resource utilization.

[439]  arXiv:2405.03183 [pdf, other]
Title: Impact of EIP-4844 on Ethereum: Consensus Security, Ethereum Usage, Rollup Transaction Dynamics, and Blob Gas Fee Markets
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Numerical Analysis (math.NA)

On March 13, 2024, Ethereum implemented EIP-4844, designed to enhance its role as a data availability layer. While this upgrade reduces data posting costs for rollups, it also raises concerns about its impact on the consensus layer due to increased propagation sizes. Moreover, the broader effects on the overall Ethereum ecosystem remain largely unexplored. In this paper, we conduct an empirical analysis of the impact of EIP-4844 on consensus security, Ethereum usage, rollup transaction dynamics, and the blob gas fee mechanism. We explore changes in synchronization times, provide quantitative assessments of rollup and user behaviors, and deepen the understanding of the blob gas fee mechanism, highlighting both enhancements and areas of concern post-upgrade.

[440]  arXiv:2405.03185 [pdf, other]
Title: Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner
Subjects: Machine Learning (cs.LG)

Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. Existing methods aim to reconstruct STTD using low-dimensional models. However, they are limited to data-specific dimensions or source-dependent patterns, restricting them from unifying representations. Here, we present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation. To discern the underlying dynamics in low-dimensional regimes, coordinate-based neural networks that can encode high-frequency structures are employed to directly map coordinates to traffic variables. To unravel the entangled spatial-temporal interactions, the variability is decomposed into separate processes. We further enable modeling in irregular spaces such as sensor graphs using spectral embedding. Through continuous representations, our approach enables the modeling of a variety of STTD with a unified input, thereby serving as a generalized learner of the underlying traffic dynamics. It is also shown that it can learn implicit low-rank priors and smoothness regularization from the data, making it versatile for learning different dominating data patterns. We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales. Empirical results not only indicate that our model has significant superiority over conventional low-rank models, but also highlight that the versatility of the approach extends to different data domains, output resolutions, and network topologies. Comprehensive model analyses provide further insight into the inductive bias of STTD. We anticipate that this pioneering modeling perspective could lay the foundation for universal representation of STTD in various real-world tasks.

[441]  arXiv:2405.03188 [pdf, other]
Title: Hyperbolic Geometric Latent Diffusion Model for Graph Generation
Comments: Accepted by the 41st International Conference on Machine Learning (ICML 2024)
Subjects: Machine Learning (cs.LG)

Diffusion models have made significant contributions to computer vision, sparking a growing interest in the community recently regarding the application of them to graph generation. Existing discrete graph diffusion models exhibit heightened computational complexity and diminished training efficiency. A preferable and natural way is to directly diffuse the graph within the latent space. However, due to the non-Euclidean structure of graphs is not isotropic in the latent space, the existing latent diffusion models effectively make it difficult to capture and preserve the topological information of graphs. To address the above challenges, we propose a novel geometrically latent diffusion framework HypDiff. Specifically, we first establish a geometrically latent space with interpretability measures based on hyperbolic geometry, to define anisotropic latent diffusion processes for graphs. Then, we propose a geometrically latent diffusion process that is constrained by both radial and angular geometric properties, thereby ensuring the preservation of the original topological properties in the generative graphs. Extensive experimental results demonstrate the superior effectiveness of HypDiff for graph generation with various topologies.

[442]  arXiv:2405.03190 [pdf, other]
Title: Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the recent years, the dual-encoder vision-language models (\eg CLIP) have achieved remarkable text-to-image retrieval performance. However, we discover that these models usually results in very different retrievals for a pair of paraphrased queries. Such behavior might render the retrieval system less predictable and lead to user frustration. In this work, we consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries. To start with, we collect a dataset of paraphrased image descriptions to facilitate quantitative evaluation for this task. We then hypothesize that the undesired behavior of existing dual-encoder model is due to their text towers which are trained on image-sentence pairs and lack the ability to capture the semantic similarity between paraphrased queries. To improve on this, we investigate multiple strategies for training a dual-encoder model starting from a language model pretrained on a large text corpus. Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries while maintaining similar zero-shot classification and retrieval accuracy.

[443]  arXiv:2405.03191 [pdf, ps, other]
Title: Exploiting Matrix Information Geometry for Integrated Decoding of Massive Uncoupled Unsourced Random Access
Subjects: Information Theory (cs.IT)

In this paper, we explore an efficient uncoupled unsourced random access (UURA) scheme for 6G massive communication. UURA is a typical framework of unsourced random access that addresses the problems of codeword detection and message stitching, without the use of check bits. Firstly, we establish a framework for UURA, allowing for immediate decoding of sub-messages upon arrival. Thus, the processing delay is effectively reduced due to the decreasing waiting time. Next, we propose an integrated decoding algorithm for sub-messages by leveraging matrix information geometry (MIG) theory. Specifically, MIG is applied to measure the feature similarities of codewords belonging to the same user equipment, and thus sub-message can be stitched once it is received. This enables the timely recovery of a portion of the original message by simultaneously detecting and stitching codewords within the current sub-slot. Furthermore, we analyze the performance of the proposed integrated decoding-based UURA scheme in terms of computational complexity and convergence rate. Finally, we present extensive simulation results to validate the effectiveness of the proposed scheme in 6G wireless networks.

[444]  arXiv:2405.03192 [pdf, other]
Title: QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.

[445]  arXiv:2405.03193 [pdf, other]
Title: Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent studies have shown that Deep Neural Networks (DNNs) are susceptible to adversarial attacks, with frequency-domain analysis underscoring the significance of high-frequency components in influencing model predictions. Conversely, targeting low-frequency components has been effective in enhancing attack transferability on black-box models. In this study, we introduce a frequency decomposition-based feature mixing method to exploit these frequency characteristics in both clean and adversarial samples. Our findings suggest that incorporating features of clean samples into adversarial features extracted from adversarial examples is more effective in attacking normally-trained models, while combining clean features with the adversarial features extracted from low-frequency parts decomposed from the adversarial samples yields better results in attacking defense models. However, a conflict issue arises when these two mixing approaches are employed simultaneously. To tackle the issue, we propose a cross-frequency meta-optimization approach comprising the meta-train step, meta-test step, and final update. In the meta-train step, we leverage the low-frequency components of adversarial samples to boost the transferability of attacks against defense models. Meanwhile, in the meta-test step, we utilize adversarial samples to stabilize gradients, thereby enhancing the attack's transferability against normally trained models. For the final update, we update the adversarial sample based on the gradients obtained from both meta-train and meta-test steps. Our proposed method is evaluated through extensive experiments on the ImageNet-Compatible dataset, affirming its effectiveness in improving the transferability of attacks on both normally-trained CNNs and defense models.
The source code is available at https://github.com/WJJLL/MetaSSA.

[446]  arXiv:2405.03194 [pdf, other]
Title: CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
Comments: Accepted by AICITY2024 Workshop Track2 at CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention. This paper introduces CityLLaVA, a novel fine-tuning framework for Visual Language Models (VLMs) designed for urban scenarios. CityLLaVA enhances model comprehension and prediction accuracy through (1) employing bounding boxes for optimal visual data preprocessing, including video best-view selection and visual prompt engineering during both training and testing phases; (2) constructing concise Question-Answer sequences and designing textual prompts to refine instruction comprehension; (3) implementing block expansion to fine-tune large VLMs efficiently; and (4) advancing prediction accuracy via a unique sequential questioning-based prediction augmentation. Demonstrating top-tier performance, our method achieved a benchmark score of 33.4308, securing the leading position on the leaderboard. The code can be found: https://github.com/alibaba/AICITY2024_Track2_AliOpenTrek_CityLLaVA

[447]  arXiv:2405.03196 [pdf, ps, other]
Title: Design and Analysis of Massive Uncoupled Unsourced Random Access with Bayesian Joint Decoding
Subjects: Information Theory (cs.IT)

In this paper, we investigate unsourced random access for massive machine-type communications (mMTC) in the sixth-generation (6G) wireless networks. Firstly, we establish a high-efficiency uncoupled framework for massive unsourced random access without extra parity check bits. Then, we design a low-complexity Bayesian joint decoding algorithm, including codeword detection and stitching. In particular, we present a Bayesian codeword detection approach by exploiting Bayes-optimal divergence-free orthogonal approximate message passing in the case of unknown priors. The output long-term channel statistic information is well leveraged to stitch codewords for recovering the original message. Thus, the spectral efficiency is improved by avoiding the use of parity bits. Moreover, we analyze the performance of the proposed Bayesian joint decoding-based massive uncoupled unsourced random access scheme in terms of computational complexity and error probability of decoding. Furthermore, by asymptotic analysis, we obtain some useful insights for the design of massive unsourced random access. Finally, extensive simulation results confirm the effectiveness of the proposed scheme in 6G wireless networks.

[448]  arXiv:2405.03197 [pdf, other]
Title: StyleSeg V2: Towards Robust One-shot Segmentation of Brain Tissue via Optimization-free Registration Error Perception
Comments: 9 pages, 8 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One-shot segmentation of brain tissue requires training registration-segmentation (reg-seg) dual-model iteratively, where reg-model aims to provide pseudo masks of unlabeled images for seg-model by warping a carefully-labeled atlas. However, the imperfect reg-model induces image-mask misalignment, poisoning the seg-model subsequently. Recent StyleSeg bypasses this bottleneck by replacing the unlabeled images with their warped copies of atlas, but needs to borrow the diverse image patterns via style transformation. Here, we present StyleSeg V2, inherited from StyleSeg but granted the ability of perceiving the registration errors. The motivation is that good registration behaves in a mirrored fashion for mirrored images. Therefore, almost at no cost, StyleSeg V2 can have reg-model itself "speak out" incorrectly-aligned regions by simply mirroring (symmetrically flipping the brain) its input, and the registration errors are symmetric inconsistencies between the outputs of original and mirrored inputs. Consequently, StyleSeg V2 allows the seg-model to make use of correctly-aligned regions of unlabeled images and also enhances the fidelity of style-transformed warped atlas image by weighting the local transformation strength according to registration errors. The experimental results on three public datasets demonstrate that our proposed StyleSeg V2 outperforms other state-of-the-arts by considerable margins, and exceeds StyleSeg by increasing the average Dice by at least 2.4%.

[449]  arXiv:2405.03199 [pdf, other]
Title: Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting
Subjects: Machine Learning (cs.LG)

Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Here, we propose the Coarsened Perceptron Network (CP-Net), a novel architecture that efficiently enhances the predictive capability of MLPs while maintains a linear computational complexity. It utilizes a coarsening strategy as the backbone that leverages two-stage convolution-based sampling blocks. Based purely on convolution, they provide the functionality of extracting short-term semantic and contextual patterns, which is relatively deficient in the global point-wise projection of the MLP layer. With the architectural simplicity and low runtime, our experiments on seven time series forecasting benchmarks demonstrate that CP-Net achieves an improvement of 4.1% compared to the SOTA method. The model further shows effective utilization of the exposed information with a consistent improvement as the look-back window expands.

[450]  arXiv:2405.03201 [pdf, other]
Title: Experimental Investigation of Repurposed Kaplan Turbines as Variable-Speed Propellers for Maximizing Frequency Containment Reserve
Comments: Submitted to PSCC 2024
Subjects: Systems and Control (eess.SY)

This study explores the practical viability of repurposing aging Kaplan turbines into variable-speed propellers by employing full-size frequency converters. The motivation behind this approach is to improve the provision of \emph{Frequency Containment Reserve} (FCR) while reducing fatigue in the Kaplan blades servomechanism. We evaluate the performance of these modified Kaplan turbines against the one of another hydro asset composed of the same Kaplan turbine hybridized with a \emph{Battery Energy Storage System} (BESS). Experiments are conducted on a one-of-its-kind reduced-scale model testing platform. Our findings reveal that Kaplan turbines repurposed as variable-speed propellers exhibit similar dynamic response characteristics compared to the standalone Kaplan operation, with the added benefit of effectively eliminating blade movements. Furthermore, the ability to control the speed increases the hydraulic efficiency for certain operating points. In summary, investment in variable speed technology emerges as a viable alternative to BESS-based hydropower hybridization.

[451]  arXiv:2405.03202 [pdf, other]
Title: Hierarchical Space-Time Attention for Micro-Expression Recognition
Comments: 9 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within these relationships. To solve this issue, we propose the Hierarchical Space-Time Attention (HSTA). Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas. Then, we design Crossmodal Space-Time Attention (CSTA) to achieve a higher-quality fusion for crossmodal data. Finally, we hierarchically integrate USTA and CSTA to grasp the deeper facial cues. Our model emphasizes temporal modeling without neglecting the processing of special data, and it fuses the contents in different modalities while maintaining their respective uniqueness. Extensive experiments on the four benchmarks show the effectiveness of our proposed HSTA. Specifically, compared with the latest method on the CASME3 dataset, it achieves about 3% score improvement in seven-category classification.

[452]  arXiv:2405.03205 [pdf, other]
Title: Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
Authors: Ruizhe Li, Yanjun Gao
Comments: Work in process
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs), such as the GPT-4 and LLaMA families, have demonstrated considerable success across diverse tasks, including multiple-choice questions (MCQs). However, these models exhibit a positional bias, particularly an even worse anchored bias in the GPT-2 family, where they consistently favour the first choice 'A' in MCQs during inference. This anchored bias challenges the integrity of GPT-2's decision-making process, as it skews performance based on the position rather than the content of the choices in MCQs. In this study, we utilise the mechanistic interpretability approach to identify the internal modules within GPT-2 models responsible for this bias. We focus on the Multi-Layer Perceptron (MLP) layers and attention heads, using the "logit lens" method to trace and modify the specific value vectors that contribute to the bias. By updating these vectors within MLP and recalibrating attention patterns to neutralise the preference for the first choice 'A', we effectively mitigate the anchored bias. Our interventions not only correct the bias but also improve the overall MCQ prediction accuracy for the GPT-2 family across various datasets. This work represents the first comprehensive mechanistic analysis of anchored bias in MCQs within the GPT-2 models, introducing targeted, minimal-intervention strategies that significantly enhance GPT2 model robustness and accuracy in MCQs. Our code is available at https://github.com/ruizheliUOA/Anchored_Bias_GPT2.

[453]  arXiv:2405.03206 [pdf, other]
Title: Vietnamese AI Generated Text Detection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we present a dataset named ViDetect, comprising 6.800 samples of Vietnamese essay, with 3.400 samples authored by humans and the remainder generated by LLMs, serving the purpose of detecting text generated by AI. We conducted evaluations using state-of-the-art methods, including ViT5, BartPho, PhoBERT, mDeberta V3, and mBERT. These results contribute not only to the growing body of research on detecting text generated by AI but also demonstrate the adaptability and effectiveness of different methods in the Vietnamese language context. This research lays the foundation for future advancements in AI-generated text detection and provides valuable insights for researchers in the field of natural language processing.

[454]  arXiv:2405.03207 [pdf, other]
Title: A Philosophical Introduction to Language Models - Part II: The Way Forward
Subjects: Computation and Language (cs.CL)

In this paper, the second of two companion pieces, we explore novel philosophical questions raised by recent progress in large language models (LLMs) that go beyond the classical debates covered in the first part. We focus particularly on issues related to interpretability, examining evidence from causal intervention methods about the nature of LLMs' internal representations and computations. We also discuss the implications of multimodal and modular extensions of LLMs, recent debates about whether such systems may meet minimal criteria for consciousness, and concerns about secrecy and reproducibility in LLM research. Finally, we discuss whether LLM-like systems may be relevant to modeling aspects of human cognition, if their architectural characteristics and learning scenario are adequately constrained.

[455]  arXiv:2405.03215 [pdf, other]
Title: OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of examples to efficiently generate optimized parallel code, streamlining the development process. This method's strength lies in its ability to swiftly produce parallelized code that runs efficiently. However, this reliance on NLP models, without direct code analysis, can introduce inaccuracies, as these models might not fully grasp the nuanced semantics of the code they parallelize. We build OMP-Engineer, which balances the efficiency and scalability of NLP models with the accuracy and reliability of traditional methods, aiming to enhance the performance of automating parallelization while navigating its inherent challenges.

[456]  arXiv:2405.03217 [pdf, other]
Title: PCG: Mitigating Conflict-based Cache Side-channel Attacks with Prefetching
Comments: 12 pages, 9 figures, submitting to a journal
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

To defend against conflict-based cache side-channel attacks, cache partitioning or remapping techniques were proposed to prevent set conflicts between different security domains or obfuscate the locations of such conflicts. But such techniques complicate cache design and may result in significant performance penalties. Therefore, there have been lightweight prefetching-based schemes proposed to introduce noise to confuse attackers' observation. However, we have validated experimentally that relying on prefetching to only introduce noise is insufficient, as attackers can still reliably distinguish the victim's cache accesses. This paper proposes a novel prefetching-based scheme, called PCG. It combines adding victim-irrelevant cache occupancy changes and reducing victim-relevant cache occupancy changes to disrupt attackers by generating noisy and indistinguishable cache access patterns. Additionally, PCG can either work independently or seamlessly be integrated with most of the commonly used prefetchers. We have implemented and evaluated PCG in both gem5 and the open-source RISC-V core BOOMv3. The evaluation results show the PCG's robust security superior to the existing solutions, while without resulting in significant performance degradation. According to the evaluation based on the SPEC CPU 2017 benchmark suite, PCG even shows an average performance improvement of about 1.64%. Moreover, it incurs only 1.26% overhead on hardware resource consumption.

[457]  arXiv:2405.03218 [pdf, other]
Title: Elevator, Escalator or Neither? Classifying Pedestrian Conveyor State Using Inertial Navigation System
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Classifying a pedestrian in one of the three conveyor states of "elevator," "escalator" and "neither" is fundamental to many applications such as indoor localization and people flow analysis. We estimate, for the first time, the pedestrian conveyor state given the inertial navigation system (INS) readings of accelerometer, gyroscope and magnetometer sampled from the phone. Our problem is challenging because the INS signals of the conveyor state are coupled and perturbed by unpredictable arbitrary human actions, confusing the decision process. We propose ELESON, a novel, effective and lightweight INS-based deep learning approach to classify whether a pedestrian is in an elevator, escalator or neither. ELESON utilizes a motion feature extractor to decouple the conveyor state from human action in the feature space, and a magnetic feature extractor to account for the speed difference between elevator and escalator. Given the results of the extractors, it employs an evidential state classifier to estimate the confidence of the pedestrian states. Based on extensive experiments conducted on twenty hours of real pedestrian data, we demonstrate that ELESON outperforms significantly the state-of-the-art approaches (where combined INS signals of both the conveyor state and human actions are processed together), with 15% classification improvement in F1 score, stronger confidence discriminability with 10% increase in AUROC (Area Under the Receiver Operating Characteristics), and low computational and memory requirements on smartphones.

[458]  arXiv:2405.03221 [pdf, other]
Title: Spatial and Surface Correspondence Field for Interaction Transfer
Comments: Accepted to SIGGRAPH 2024, project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combined spatial and surface representation. We correspond the agent points and object points related to the representation to the target object space using a learned spatial and surface correspondence field, which represents objects as deformed and rotated signed distance fields. With the corresponded points, an optimization is performed under the constraints of our spatial and surface interaction representation and additional regularization. Experiments conducted on human-chair and hand-mug interaction transfer tasks show that our approach can handle larger geometry and topology variations between source and target shapes, significantly outperforming state-of-the-art methods.

[459]  arXiv:2405.03223 [pdf, other]
Title: The Kansei Engineering Approach in Web Design:Case of Transportation Website
Comments: submitted to Springer journal for consideration
Subjects: Software Engineering (cs.SE)

Kansei Engineering (KE) is a user-centered design approach that emphasizes the emotional aspects of user experience. This paper explores the integration of KE in the case of a transportation company that focuses on connecting cargo owners with transportation providers. The methodology involves aligning the design process with the company's strategy, collecting and semantic scaling Kansei words, and evaluating website design through experimental and statistical analyses. Initially, we collaborated with the company to understand their strategic goals, using Use Case and Entity Relationship diagrams to learn about the website functionality. Subsequent steps involved collecting Kansei words that resonate with the company's vision. Website samples from comparable transportation companies were then evaluated by X subject in the survey. Participants were asked to arrange samples based on emotional feedback using a 5-point SD scale. We used Principal Component Analysis (PCA) to identify critical factors affecting users' perceptions of the design. Based on these results, we collaborated with designers to reformulate the website, ensuring the design features aligned with the Kansei principles. The outcome is a user-centric web design to enhance the site's user experience. This study shows that KE can be effective in creating more user-friendly web interfaces in the transportation industry.

[460]  arXiv:2405.03224 [pdf, ps, other]
Title: A Two-Step Method Coupling Eddy Currents and Magneto-Statics
Comments: 12 pages, 6 figures
Subjects: Numerical Analysis (math.NA)

We present the mathematical theory and its numerical validation of a method tailored to include eddy-current effects only in a part of the domain. This results in a heterogeneous problem combining an eddy-current model in a subset of the computational domain with a magneto-static model in the remainder of the domain. We adopt a two-domain two-step approach in which the primary variables of the problem are the electric scalar potential and the magnetic vector potential. We show numerical results that validate the formulation.

[461]  arXiv:2405.03228 [pdf, other]
Title: TED: Accelerate Model Training by Internal Generalization
Subjects: Machine Learning (cs.LG)

Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes. We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data while fitting retained data, known as Internal Generalization (IG). TED uses an optimization objective based on Internal Generalization Distance (IGD), measuring changes in IG before and after pruning to align with true generalization performance and achieve implicit regularization. The IGD optimization objective was verified to allow the model to achieve the smallest upper bound on generalization error. The impact of small mask fluctuations on IG is studied through masks and Taylor approximation, and fast estimation of IGD is enabled. In analyzing continuous training dynamics, the prior effect of IGD is validated, and a progressive pruning strategy is proposed. Experiments on image classification, natural language understanding, and large language model fine-tuning show TED achieves lossless performance with 60-70\% of the data. Upon acceptance, our code will be made publicly available.

[462]  arXiv:2405.03232 [pdf, other]
Title: Successive Interference Cancellation for Optical Fiber Using Discrete Constellations
Comments: Submitted to European Conference on Optical Communications (ECOC) 2024
Subjects: Information Theory (cs.IT)

Successive interference cancellation is used to detect discrete modulation symbols transmitted over a 1000 km fiber-optic link. A transmitter and receiver are presented that have linear complexity in the number of transmitted symbols and achieve the information rates of previous studies that use continuous modulations.

[463]  arXiv:2405.03234 [pdf, other]
Title: A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series
Comments: The manuscript is currently under review
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performed models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights to detect such issues by elucidating model attributions of their decision, many limitations still exist -- They are primarily instance-based and not scalable across dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation with two time series datasets and user studies demonstrates the effectiveness of HILAD in fostering a deeper human understanding, immediate corrective actions, and the reliability enhancement of models.

[464]  arXiv:2405.03235 [pdf, ps, other]
Title: Cross-Modal Domain Adaptation in Brain Disease Diagnosis: Maximum Mean Discrepancy-based Convolutional Neural Networks
Authors: Xuran Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Brain disorders are a major challenge to global health, causing millions of deaths each year. Accurate diagnosis of these diseases relies heavily on advanced medical imaging techniques such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, the scarcity of annotated data poses a significant challenge in deploying machine learning models for medical diagnosis. To address this limitation, deep learning techniques have shown considerable promise. Domain adaptation techniques enhance a model's ability to generalize across imaging modalities by transferring knowledge from one domain (e.g., CT images) to another (e.g., MRI images). Such cross-modality adaptation is essential to improve the ability of models to consistently generalize across different imaging modalities. This study collected relevant resources from the Kaggle website and employed the Maximum Mean Difference (MMD) method - a popular domain adaptation method - to reduce the differences between imaging domains. By combining MMD with Convolutional Neural Networks (CNNs), the accuracy and utility of the model is obviously enhanced. The excellent experimental results highlight the great potential of data-driven domain adaptation techniques to improve diagnostic accuracy and efficiency, especially in resource-limited environments. By bridging the gap between different imaging modalities, the study aims to provide clinicians with more reliable diagnostic tools.

[465]  arXiv:2405.03236 [pdf, other]
Title: Federated Reinforcement Learning with Constraint Heterogeneity
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning problems are prevalent in scenarios of Large Language Model (LLM) fine-tuning and healthcare applications. To solve the problem, we propose federated primal-dual policy optimization methods based on traditional policy gradient methods. Specifically, we introduce $N$ local Lagrange functions for agents to perform local policy updates, and these agents are then scheduled to periodically communicate on their local policies. Taking natural policy gradient (NPG) and proximal policy optimization (PPO) as policy optimization methods, we mainly focus on two instances of our algorithms, ie, {FedNPG} and {FedPPO}. We show that FedNPG achieves global convergence with an $\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.

[466]  arXiv:2405.03239 [pdf, other]
Title: Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series: A UK Biobank Study
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease progression, slowing it down, or even preventing its onset. However, these methods fail to early predict an individual's probability of COPD in the future based on subtle features in the spirogram. To address this gap, for the first time, we propose DeepSpiro, a method based on deep learning for early prediction of future COPD risk. DeepSpiro consists of four parts. First, we construct Volume-Flow curves guided by Time-Volume instability smoothing (SpiroSmoother) to enhance the stability of the original Volume-Flow curves precisely. Second, we extract critical features from the evolution of varied-length key patches (SpiroEncoder) to capture the key temporal evolution from original high-dimensional dynamic sequences to a unified low-dimensional temporal representation. Third, we explain the model based on temporal attention and heterogeneous feature fusion (SpiroExplainer), which integrates information from heterogeneous data such as spirogram and demographic information. Fourth, we predict the risk of COPD based on the evolution of key patch concavity (SpiroPredictor), enabling accurate prediction of the risk of disease in high-risk patients who are not yet diagnosed, for up to 1, 2, 3, 4, 5 years, and beyond. We conduct experiments on the UK Biobank dataset. Results show that DeepSpiro achieves an AUC value of 0.8328 in the task of detecting COPD. In early prediction tasks, high-risk and low-risk groups show significant differences in the future, with a p-value of <0.001.

[467]  arXiv:2405.03243 [pdf, other]
Title: Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generative foundation models like Stable Diffusion comprise a diverse spectrum of knowledge in computer vision with the potential for transfer learning, e.g., via generating data to train student models for downstream tasks. This could circumvent the necessity of collecting labeled real-world data, thereby presenting a form of data-free knowledge distillation. However, the resultant student models show a significant drop in accuracy compared to models trained on real data. We investigate possible causes for this drop and focus on the role of the different layers of the student model. By training these layers using either real or synthetic data, we reveal that the drop mainly stems from the model's final layers. Further, we briefly investigate other factors, such as differences in data-normalization between synthetic and real, the impact of data augmentations, texture vs.\ shape learning, and assuming oracle prompts. While we find that some of those factors can have an impact, they are not sufficient to close the gap towards real data. Building upon our insights that mainly later layers are responsible for the drop, we investigate the data-efficiency of fine-tuning a synthetically trained model with real data applied to only those last layers. Our results suggest an improved trade-off between the amount of real training data used and the model's accuracy. Our findings contribute to the understanding of the gap between synthetic and real data and indicate solutions to mitigate the scarcity of labeled real data.

[468]  arXiv:2405.03244 [pdf, other]
Title: Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition
Subjects: Machine Learning (cs.LG)

Continual learning (CL) has spurred the development of several methods aimed at consolidating previous knowledge across sequential learning. Yet, the evaluations of these methods have primarily focused on the final output, such as changes in the accuracy of predicted classes, overlooking the issue of representational forgetting within the model. In this paper, we propose a novel representation-based evaluation framework for CL models. This approach involves gathering internal representations from throughout the continual learning process and formulating three-dimensional tensors. The tensors are formed by stacking representations, such as layer activations, generated from several inputs and model `snapshots', throughout the learning process. By conducting tensor component analysis (TCA), we aim to uncover meaningful patterns about how the internal representations evolve, expecting to highlight the merits or shortcomings of examined CL strategies. We conduct our analyses across different model architectures and importance-based continual learning strategies, with a curated task selection. While the results of our approach mirror the difference in performance of various CL strategies, we found that our methodology did not directly highlight specialized clusters of neurons, nor provide an immediate understanding the evolution of filters. We believe a scaled down version of our approach will provide insight into the benefits and pitfalls of using TCA to study continual learning dynamics.

[469]  arXiv:2405.03245 [pdf, ps, other]
Title: How improving performance may imply losing consistency in event-triggered consensus
Subjects: Systems and Control (eess.SY)

Event-triggered control is often argued to lower the average triggering rate compared to time-triggered control while still achieving a desired control goal, e.g., the same performance level. However, this property, often called consistency, cannot be taken for granted and can be hard to analyze in many settings. In particular, although numerous decentralized event-triggered control schemes have been proposed in the past years, their performance properties with respect to time-triggered control remain mostly unexplored. In this paper, we therefore examine the performance properties of event-triggered control (relative to time-triggered control) for a single-integrator consensus problem with a level-triggering rule. We consider the long-term average quadratic deviation from consensus as a performance measure. For this setting, we show that enriching the information the local controllers use improves the performance of the consensus algorithm but renders a previously consistent event-triggered control scheme inconsistent. In addition, we do so while deploying optimal control inputs which we derive for both information cases and all triggering schemes. With this insight, we can furthermore explain the relationship between two contrasting consistency results from the literature on decentralized event-triggered control. We support our theoretical findings with simulation results.

[470]  arXiv:2405.03248 [pdf, other]
Title: Communication-Efficient Federated Learning with Adaptive Compression under Dynamic Bandwidth
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Federated learning can train models without directly providing local data to the server. However, the frequent updating of the local model brings the problem of large communication overhead. Recently, scholars have achieved the communication efficiency of federated learning mainly by model compression. But they ignore two problems: 1) network state of each client changes dynamically; 2) network state among clients is not the same. The clients with poor bandwidth update local model slowly, which leads to low efficiency. To address this challenge, we propose a communication-efficient federated learning algorithm with adaptive compression under dynamic bandwidth (called AdapComFL). Concretely, each client performs bandwidth awareness and bandwidth prediction. Then, each client adaptively compresses its local model via the improved sketch mechanism based on his predicted bandwidth. Further, the server aggregates sketched models with different sizes received. To verify the effectiveness of the proposed method, the experiments are based on real bandwidth data which are collected from the network topology we build, and benchmark datasets which are obtained from open repositories. We show the performance of AdapComFL algorithm, and compare it with existing algorithms. The experimental results show that our AdapComFL achieves more efficient communication as well as competitive accuracy compared to existing algorithms.

[471]  arXiv:2405.03249 [pdf, other]
Title: V-line tensor tomography: numerical results
Subjects: Numerical Analysis (math.NA)

This article presents the numerical verification and validation of several inversion algorithms for V-line transforms (VLTs) acting on symmetric 2-tensor fields in the plane. The analysis of these transforms and the theoretical foundation of their inversion methods were studied in a recent work [G. Ambartsoumian, R. K. Mishra, and I. Zamindar, Inverse Problems, 40 (2024), 035003]. We demonstrate the efficient recovery of an unknown symmetric 2-tensor field from various combinations of the longitudinal, transverse, and mixed VLTs, their corresponding first moments, and the star VLT. The paper examines the performance of the proposed algorithms in different settings and illustrates the results with numerical simulations on smooth and non-smooth phantoms.

[472]  arXiv:2405.03250 [pdf, other]
Title: A survey to measure cognitive biases influencing mobility choices
Authors: Carole Adam
Subjects: Computers and Society (cs.CY); Multiagent Systems (cs.MA)

In this paper, we describe a survey about the perceptions of 4 mobility modes (car, bus, bicycle, walking) and the preferences of users for 6 modal choice factors. This survey has gathered 650 answers in 2023, that are published as open data. In this study, we analyse these results to highlight the influence of 3 cognitive biases on mobility decisions: halo bias, choice-supportive bias, and reactance. These cognitive biases are proposed as plausible explanations of the observed behaviour, where the population tends to stick to individual cars despite urban policies aiming at favouring soft mobility. This model can serve as the basis for a simulator of mobility decisions in a virtual town, and the gathered data can be used to initialise this population with realistic attributes. Work is ongoing to design a simulation-based serious game where the player takes the role of an urban manager faced with planning choices to make their city more sustainable.

[473]  arXiv:2405.03251 [pdf, ps, other]
Title: Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond
Comments: 53 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that contribute to the effectiveness of softmax remain largely unexplored. As a step towards better understanding, this paper provides a theoretical study of the optimization and generalization properties of two-layer softmax neural networks, providing theoretical insights into their superior performance as other activation functions, such as ReLU and exponential. Leveraging the Neural Tangent Kernel (NTK) framework, our analysis reveals that the normalization effect of the softmax function leads to a good perturbation property of the induced NTK matrix, resulting in a good convex region of the loss landscape. Consequently, softmax neural networks can learn the target function in the over-parametrization regime. To demonstrate the broad applicability of our theoretical findings, we apply them to the task of learning score estimation functions in diffusion models, a promising approach for generative modeling. Our analysis shows that gradient-based algorithms can learn the score function with a provable accuracy. Our work provides a deeper understanding of the effectiveness of softmax neural networks and their potential in various domains, paving the way for further advancements in natural language processing and beyond.

[474]  arXiv:2405.03252 [pdf, other]
Title: A Universal List Decoding Algorithm with Application to Decoding of Polar Codes
Comments: 47 pages, 24 figures
Subjects: Information Theory (cs.IT)

This paper is concerned with a guessing codeword decoding (GCD) of linear block codes. Compared with the guessing noise decoding (GND), which is only efficient for high-rate codes, the GCD is efficient for not only high-rate codes but also low-rate codes. We prove that the GCD typically requires a fewer number of queries than the GND. Compared with the ordered statistics decoding (OSD), the GCD does not require the online Gaussian elimination (GE). In addition to limiting the maximum number of searches, we suggest limiting the radius of searches in terms of soft weights or tolerated performance loss to further reduce the decoding complexity, resulting in the so-called truncated GCD. The performance gap between the truncated GCD and the optimal decoding can be upper bounded approximately by the saddlepoint approach or other numerical approaches. The derived upper bound captures the relationship between the performance and the decoding parameters, enabling us to balance the performance and the complexity by optimizing the decoding parameters of the truncated GCD. We also introduce a parallel implementation of the (truncated) GCD algorithm to reduce decoding latency without compromising performance. Another contribution of this paper is the application of the GCD to the polar codes. We propose a multiple-bit-wise decoding algorithm over a pruned tree for the polar codes, referred to as the successive-cancellation list (SCL) decoding algorithm by GCD. First, we present a strategy for pruning the conventional polar decoding tree based on the complexity analysis rather than the specific bit patterns. Then we apply the GCD algorithm in parallel aided by the early stopping criteria to the leaves of the pruned tree. Simulation results show that, without any performance loss as justified by analysis, the proposed decoding algorithm can significantly reduce the decoding latency of the polar codes.

[475]  arXiv:2405.03255 [pdf, other]
Title: Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning
Comments: Accepted by IJCAI 2024 Main Track
Subjects: Machine Learning (cs.LG)

Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST forecasting is more challenging because it possesses (i) high-dimensional and complex internal structures and (ii) dynamic heterogeneity caused by temporal, spatial, and modality variations. In this study, we propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL, which aims to uncover latent patterns from temporal, spatial, and modality perspectives while quantifying dynamic heterogeneity. Experiment results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines. Model implementation is available at https://github.com/beginner-sketch/MoSSL.

[476]  arXiv:2405.03256 [pdf, other]
Title: MARE: Multi-Agents Collaboration Framework for Requirements Engineering
Subjects: Software Engineering (cs.SE)

Requirements Engineering (RE) is a critical phase in the software development process that generates requirements specifications from stakeholders' needs. Recently, deep learning techniques have been successful in several RE tasks. However, obtaining high-quality requirements specifications requires collaboration across multiple tasks and roles. In this paper, we propose an innovative framework called MARE, which leverages collaboration among large language models (LLMs) throughout the entire RE process. MARE divides the RE process into four tasks: elicitation, modeling, verification, and specification. Each task is conducted by engaging one or two specific agents and each agent can conduct several actions. MARE has five agents and nine actions. To facilitate collaboration between agents, MARE has designed a workspace for agents to upload their generated intermediate requirements artifacts and obtain the information they need. We conduct experiments on five public cases, one dataset, and four new cases created by this work. We compared MARE with three baselines using three widely used metrics for the generated requirements models. Experimental results show that MARE can generate more correct requirements models and outperform the state-of-the-art approaches by 15.4%. For the generated requirements specifications, we conduct a human evaluation in three aspects and provide insights about the quality

[477]  arXiv:2405.03262 [pdf, other]
Title: End-to-End Reinforcement Learning of Curative Curtailment with Partial Measurement Availability
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the course of the energy transition, the expansion of generation and consumption will change, and many of these technologies, such as PV systems, electric cars and heat pumps, will influence the power flow, especially in the distribution grids. Scalable methods that can make decisions for each grid connection are needed to enable congestion-free grid operation in the distribution grids. This paper presents a novel end-to-end approach to resolving congestion in distribution grids with deep reinforcement learning. Our architecture learns to curtail power and set appropriate reactive power to determine a non-congested and, thus, feasible grid state. State-of-the-art methods such as the optimal power flow (OPF) demand high computational costs and detailed measurements of every bus in a grid. In contrast, the presented method enables decisions under sparse information with just some buses observable in the grid. Distribution grids are generally not yet fully digitized and observable, so this method can be used for decision-making on the majority of low-voltage grids. On a real low-voltage grid the approach resolves 100\% of violations in the voltage band and 98.8\% of asset overloads. The results show that decisions can also be made on real grids that guarantee sufficient quality for congestion-free grid operation.

[478]  arXiv:2405.03264 [pdf, other]
Title: Delooping generated groups in homotopy type theory
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)

Homotopy type theory is a logical setting based on Martin-L\"of type theory in which one can perform geometric constructions and proofs in a synthetic way. Namely, types can be interpreted as spaces (up to continuous deformation) and proofs as homotopy invariant constructions. In this context, the loop spaces of types with a distinguished element (more precisely, pointed connected groupoids), provide a natural representation of groups, what we call here internal groups. The construction which internalizes a given group is called delooping, because it is a formal inverse to the loop space operator. As we recall in the article, this delooping operation has a concrete definition for any group G given by the type of G-torsors. Those are particular sets together with an action of G, which means that they come equipped with an endomorphism for every element of G. We show that, when a generating set is known for the group, we can construct a smaller representation of the type of G-torsors, using the fact that we only need automorphisms for the elements of the generating set. We thus obtain a concise definition of (internal) groups in homotopy type theory, which can be useful to define deloopings without resorting to higher inductive types, or to perform computations on those. We also investigate an abstract construction for the Cayley group of a generated group. Most of the developments performed in the article have been formalized using the cubical version of the Agda proof assistant.

[479]  arXiv:2405.03266 [pdf, other]
Title: Efficient computation of Katz centrality for very dense networks via "negative parameter Katz"
Subjects: Social and Information Networks (cs.SI); Combinatorics (math.CO)

Katz centrality (and its limiting case, eigenvector centrality) is a frequently used tool to measure the importance of a node in a network, and to rank the nodes accordingly. One reason for its popularity is that Katz centrality can be computed very efficiently when the network is sparse, i.e., having only $O(n)$ edges between its $n$ nodes. While sparsity is common in practice, in some applications one faces the opposite situation of a very dense network, where only $O(n)$ potential edges are missing with respect to a complete graph. We explain why and how, even for very dense networks, it is possible to efficiently compute the ranking stemming from Katz centrality for unweighted graphs, possibly directed and possibly with loops, by working on the complement graph. Our approach also provides an interpretation, regardless of sparsity, of "Katz centrality with negative parameter" as usual Katz centrality on the complement graph. For weighted graphs, we provide instead an approximation method that is based on removing sufficiently many edges from the network (or from its complement), and we give sufficient conditions for this approximation to provide the correct ranking. We include numerical experiments to illustrate the advantages of the proposed approach.

[480]  arXiv:2405.03267 [pdf, other]
Title: Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Information Retrieval (cs.IR)

Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by {5.7\,$\times$} and {1.7\,$\times$}, these indexes have to pay a {5.8\,$\times$} storage amplification and {7.7\,$\times$} with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification.
This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices.
Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs.

[481]  arXiv:2405.03272 [pdf, other]
Title: WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multimodal information, together with our knowledge, help us to understand the complex and dynamic world. Large language models (LLM) and large multimodal models (LMM), however, still struggle to emulate this capability. In this paper, we present WorldQA, a video understanding dataset designed to push the boundaries of multimodal world models with three appealing properties: (1) Multimodal Inputs: The dataset comprises 1007 question-answer pairs and 303 videos, necessitating the analysis of both auditory and visual data for successful interpretation. (2) World Knowledge: We identify five essential types of world knowledge for question formulation. This approach challenges models to extend their capabilities beyond mere perception. (3) Long-Chain Reasoning: Our dataset introduces an average reasoning step of 4.45, notably surpassing other videoQA datasets. Furthermore, we introduce WorldRetriever, an agent designed to synthesize expert knowledge into a coherent reasoning chain, thereby facilitating accurate responses to WorldQA queries. Extensive evaluations of 13 prominent LLMs and LMMs reveal that WorldRetriever, although being the most effective model, achieved only 70% of humanlevel performance in multiple-choice questions. This finding highlights the necessity for further advancement in the reasoning and comprehension abilities of models. Our experiments also yield several key insights. For instance, while humans tend to perform better with increased frames, current LMMs, including WorldRetriever, show diminished performance under similar conditions. We hope that WorldQA,our methodology, and these insights could contribute to the future development of multimodal world models.

[482]  arXiv:2405.03273 [pdf, other]
Title: Evaluation of Drivers' Interaction Ability at Social Scenarios: A Process-Based Framework
Subjects: Robotics (cs.RO)

Assessing drivers' interaction capabilities is crucial for understanding human driving behavior and enhancing the interactive abilities of autonomous vehicles. In scenarios involving strong interaction, existing metrics focused on interaction outcomes struggle to capture the evolutionary process of drivers' interactive behaviors, making it challenging for autonomous vehicles to dynamically assess and respond to other agents during interactions. To address this issue, we propose a framework for assessing drivers' interaction capabilities, oriented towards the interactive process itself, which includes three components: Interaction Risk Perception, Interaction Process Modeling, and Interaction Ability Scoring. We quantify interaction risks through motion state estimation and risk field theory, followed by introducing a dynamic action assessment benchmark based on a game-theoretical rational agent model, and designing a capability scoring metric based on morphological similarity distance. By calculating real-time differences between a driver's actions and the assessment benchmark, the driver's interaction capabilities are scored dynamically. We validated our framework at unsignalized intersections as a typical scenario. Validation analysis on driver behavior datasets from China and the USA shows that our framework effectively distinguishes and evaluates conservative and aggressive driving states during interactions, demonstrating good adaptability and effectiveness in various regional settings.

[483]  arXiv:2405.03278 [pdf, other]
Title: Approximate Realizations for Outerplanaric Degree Sequences
Comments: This paper has published in 35th IWOCA
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

We study the question of whether a sequence d = (d_1,d_2, \ldots, d_n) of positive integers is the degree sequence of some outerplanar (a.k.a. 1-page book embeddable) graph G. If so, G is an outerplanar realization of d and d is an outerplanaric sequence. The case where \sum d \leq 2n - 2 is easy, as d has a realization by a forest (which is trivially an outerplanar graph). In this paper, we consider the family \cD of all sequences d of even sum 2n\leq \sum d \le 4n-6-2\multipl_1, where \multipl_x is the number of x's in d. (The second inequality is a necessary condition for a sequence d with \sum d\geq 2n to be outerplanaric.) We partition \cD into two disjoint subfamilies, \cD=\cD_{NOP}\cup\cD_{2PBE}, such that every sequence in \cD_{NOP} is provably non-outerplanaric, and every sequence in \cD_{2PBE} is given a realizing graph $G$ enjoying a 2-page book embedding (and moreover, one of the pages is also bipartite).

[484]  arXiv:2405.03279 [pdf, other]
Title: Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
Comments: 14 pages, 4 figures, 6 tables
Subjects: Computation and Language (cs.CL)

Model editing aims to correct outdated or erroneous knowledge in large language models (LLMs) without the need for costly retraining. Lifelong model editing is the most challenging task that caters to the continuous editing requirements of LLMs. Prior works primarily focus on single or batch editing; nevertheless, these methods fall short in lifelong editing scenarios due to catastrophic knowledge forgetting and the degradation of model performance. Although retrieval-based methods alleviate these issues, they are impeded by slow and cumbersome processes of integrating the retrieved knowledge into the model. In this work, we introduce RECIPE, a RetriEval-augmented ContInuous Prompt lEarning method, to boost editing efficacy and inference efficiency in lifelong learning. RECIPE first converts knowledge statements into short and informative continuous prompts, prefixed to the LLM's input query embedding, to efficiently refine the response grounded on the knowledge. It further integrates the Knowledge Sentinel (KS) that acts as an intermediary to calculate a dynamic threshold, determining whether the retrieval repository contains relevant knowledge. Our retriever and prompt encoder are jointly trained to achieve editing properties, i.e., reliability, generality, and locality. In our experiments, RECIPE is assessed extensively across multiple LLMs and editing datasets, where it achieves superior editing performance. RECIPE also demonstrates its capability to maintain the overall performance of LLMs alongside showcasing fast editing and inference speed.

[485]  arXiv:2405.03280 [pdf, other]
Title: Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model.

[486]  arXiv:2405.03281 [pdf, other]
Title: FDSPC: Fast and Direct Smooth Path Planning via Continuous Curvature Integration
Authors: Zong Chen, Yiqun Li
Subjects: Robotics (cs.RO)

In recent decades, global path planning of robot has seen significant advancements. Both heuristic search-based methods and probability sampling-based methods have shown capabilities to find feasible solutions in complex scenarios. However, mainstream global path planning algorithms often produce paths with bends, requiring additional smoothing post-processing. In this work, we propose a fast and direct path planning method based on continuous curvature integration. This method ensures path feasibility while directly generating global smooth paths with constant velocity, thus eliminating the need for post-path-smoothing. Furthermore, we compare the proposed method with existing approaches in terms of solution time, path length, memory usage, and smoothness under multiple scenarios. The proposed method is vastly superior to the average performance of state-of-the-art (SOTA) methods, especially in terms of the self-defined $\mathcal{S}_2 $ smoothness (mean angle of steering). These results demonstrate the effectiveness and superiority of our approach in several representative environments.

[487]  arXiv:2405.03285 [pdf, other]
Title: A continuous approach for computing the pseudospectra of linear operators
Comments: 22 pages, 6 figures
Subjects: Numerical Analysis (math.NA)

We propose a continuous approach for computing the pseudospectra of linear operators following a 'solve-then-discretize' strategy. Instead of taking a finite section approach or using a finite-dimensional matrix to approximate the operator of interest, the new method employs an operator analogue of the Lanczos process to work directly with operators and functions. The method is shown to be free of spectral pollution and spectral invisibility, fully adaptive, nearly optimal in accuracy, and well-conditioned. The advantages of the method are demonstrated by extensive numerical examples and comparison with the traditional method.

[488]  arXiv:2405.03287 [pdf, other]
Title: Evaluating Eye Movement Biometrics in Virtual Reality: A Comparative Analysis of VR Headset and High-End Eye-Tracker Collected Dataset
Comments: 9 pages, 6 figures
Subjects: Human-Computer Interaction (cs.HC)

Previous studies have shown that eye movement data recorded at 1000 Hz can be used to authenticate individuals. This study explores the effectiveness of eye movement-based biometrics (EMB) by utilizing data from an eye-tracking (ET)-enabled virtual reality (VR) headset (GazeBaseVR) and compares it to the performance using data from a high-end eye tracker (GazeBase) that has been downsampled to 250 Hz. The research also aims to assess the biometric potential of both binocular and monocular eye movement data. GazeBaseVR dataset achieves an equal error rate (EER) of 1.67% and a false rejection rate (FRR) at 10^-4 false acceptance rate (FAR) of 22.73% in a binocular configuration. This study underscores the biometric viability of data obtained from eye-tracking-enabled VR headset.

[489]  arXiv:2405.03288 [pdf, other]
Title: Fundamental Bounds on Unequal Error Protection Codes
Comments: 8 pages, 4 figures
Subjects: Information Theory (cs.IT)

Unequal error protection (UEP) codes can facilitate the transmission of messages with different protection levels. In this paper, we study the achievability bounds on UEP by the generalization of Gilbert-Varshamov (GV) bound. For the first time, we show that under certain conditions, UEP enhances the code rate comparing with time-sharing (TS) strategies asymptotically.

[490]  arXiv:2405.03290 [pdf, other]
Title: Coordinating Cooperative Perception in Urban Air Mobility for Enhanced Environmental Awareness
Comments: If you cite this paper, please use the original reference: Timo H\"ackel, Luca von Roenn, Nemo Juchmann, Alexander Fay, Rinie Akkermans, Tim Tiedemann, and Thomas C. Schmidt. "Coordinating Cooperative Perception in Urban Air Mobility for Enhanced Environmental Awareness,'' In: 2024 International Conference on Unmanned Aircraft Systems (ICUAS). IEEE, June 2024
Subjects: Networking and Internet Architecture (cs.NI)

The trend for Urban Air Mobility (UAM) is growing with prospective air taxis, parcel deliverers, and medical and industrial services. Safe and efficient UAM operation relies on timely communication and reliable data exchange. In this paper, we explore Cooperative Perception (CP) for Unmanned Aircraft Systems (UAS), considering the unique communication needs involving high dynamics and a large number of UAS. We propose a hybrid approach combining local broadcast with a central CP service, inspired by centrally managed U-space and broadcast mechanisms from automotive and aviation domains. In a simulation study, we show that our approach significantly enhances the environmental awareness for UAS compared to fully distributed approaches, with an increased communication channel load, which we also evaluate. These findings prompt a discussion on communication strategies for CP in UAM and the potential of a centralized CP service in future research.

[491]  arXiv:2405.03296 [pdf, ps, other]
Title: Coefficient Decomposition for Spectral Graph Convolution
Authors: Feng Huang, Wen Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed. In this paper, we propose a general form in terms of spectral graph convolution, where the coefficients of polynomial basis are stored in a third-order tensor. Then, we show that the convolution block in existing SGCNs can be derived by performing a certain coefficient decomposition operation on the coefficient tensor. Based on the generalized view, we develop novel spectral graph convolutions CoDeSGC-CP and -Tucker by tensor decomposition CP and Tucker on the coefficient tensor. Extensive experimental results demonstrate that the proposed convolutions achieve favorable performance improvements.

[492]  arXiv:2405.03298 [pdf, other]
Title: Online Clustering of Known and Emerging Malware Families
Comments: arXiv admin note: text overlap with arXiv:2305.00605
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Malware attacks have become significantly more frequent and sophisticated in recent years. Therefore, malware detection and classification are critical components of information security. Due to the large amount of malware samples available, it is essential to categorize malware samples according to their malicious characteristics. Clustering algorithms are thus becoming more widely used in computer security to analyze the behavior of malware variants and discover new malware families. Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats. This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families. Streaming data is divided according to the clustering decision rule into samples from known and new emerging malware families. The streaming data is classified using the weighted k-nearest neighbor classifier into known families, and the online k-means algorithm clusters the remaining streaming data and achieves a purity of clusters from 90.20% for four clusters to 93.34% for ten clusters. This work is based on static analysis of portable executable files for the Windows operating system. Experimental results indicate that the proposed online clustering model can create high-purity clusters corresponding to malware families. This allows malware analysts to receive similar malware samples, speeding up their analysis.

[493]  arXiv:2405.03299 [pdf, other]
Title: DarkFed: A Data-Free Backdoor Attack in Federated Learning
Comments: This paper has been accepted by IJCAI 2024
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development.
To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed.

[494]  arXiv:2405.03300 [pdf, other]
Title: Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system in the presence of phase noise at the active RIS. Specifically, we employ a two-timescale scheme, where the beamforming at the base station (BS) is adjusted based on the instantaneous aggregated channel state information (CSI) and the statistical CSI serves as the basis for designing the phase shifts at the active RIS, so that the feedback overhead and computational complexity can be significantly reduced. The aggregated channel composed of the cascaded and direct channels is estimated by utilizing the linear minimum mean square error (LMMSE) technique. Based on the estimated channel, we derive the analytical closed-form expression of a lower bound of the achievable rate. The power scaling laws in the active RIS-aided system are investigated based on the theoretical expressions. When the transmit power of each user is scaled down by the number of BS antennas M or reflecting elements N, we find that the thermal noise will cause the lower bound of the achievable rate to approach zero, as the number of M or N increases to infinity. Moreover, an optimization approach based on genetic algorithms (GA) is introduced to tackle the phase shift optimization problem. Numerical results reveal that the active RIS can greatly enhance the performance of the considered system under various settings.

[495]  arXiv:2405.03301 [pdf, other]
Title: Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification
Comments: International Joint Conference on Artificial Intelligence 2024 (to be published)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.

[496]  arXiv:2405.03302 [pdf, other]
Title: The number of random 2-SAT solutions is asymptotically log-normal
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO); Probability (math.PR)

We prove that throughout the satisfiable phase, the logarithm of the number of satisfying assignments of a random 2-SAT formula satisfies a central limit theorem. This implies that the log of the number of satisfying assignments exhibits fluctuations of order $\sqrt n$, with $n$ the number of variables. The formula for the variance can be evaluated effectively. By contrast, for numerous other random constraint satisfaction problems the typical fluctuations of the logarithm of the number of solutions are {\em bounded} throughout all or most of the satisfiable regime.

[497]  arXiv:2405.03303 [pdf, other]
Title: Explainability for Transparent Conversational Information-Seeking
Comments: This is the author's version of the work. The definitive version is published in: 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining the responses, hypothesizing that transparency about the source of the information, system confidence, and limitations can enhance users' ability to objectively assess the response. By exploring transparency across explanation type, quality, and presentation mode, this research aims to bridge the gap between system-generated responses and responses verifiable by the user. We design a user study to answer questions concerning the impact of (1) the quality of explanations enhancing the response on its usefulness and (2) ways of presenting explanations to users. The analysis of the collected data reveals lower user ratings for noisy explanations, although these scores seem insensitive to the quality of the response. Inconclusive results on the explanations presentation format suggest that it may not be a critical factor in this setting.

[498]  arXiv:2405.03305 [pdf, other]
Title: Artificial Intelligence in the Autonomous Navigation of Endovascular Interventions: A Systematic Review
Comments: Abstract shortened for arXiv character limit
Journal-ref: Robertshaw H, Karstensen L, Jackson B, Sadati H, Rhode K, Ourselin S, Granados A and Booth TC (2023) Artificial intelligence in the autonomous navigation of endovascular interventions: a systematic review. Front. Hum. Neurosci. 17:1239374
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Purpose: Autonomous navigation of devices in endovascular interventions can decrease operation times, improve decision-making during surgery, and reduce operator radiation exposure while increasing access to treatment. This systematic review explores recent literature to assess the impact, challenges, and opportunities artificial intelligence (AI) has for the autonomous endovascular intervention navigation.
Methods: PubMed and IEEEXplore databases were queried. Eligibility criteria included studies investigating the use of AI in enabling the autonomous navigation of catheters/guidewires in endovascular interventions. Following PRISMA, articles were assessed using QUADAS-2. PROSPERO: CRD42023392259.
Results: Among 462 studies, fourteen met inclusion criteria. Reinforcement learning (9/14, 64%) and learning from demonstration (7/14, 50%) were used as data-driven models for autonomous navigation. Studies predominantly utilised physical phantoms (10/14, 71%) and in silico (4/14, 29%) models. Experiments within or around the blood vessels of the heart were reported by the majority of studies (10/14, 71%), while simple non-anatomical vessel platforms were used in three studies (3/14, 21%), and the porcine liver venous system in one study. We observed that risk of bias and poor generalisability were present across studies. No procedures were performed on patients in any of the studies reviewed. Studies lacked patient selection criteria, reference standards, and reproducibility, resulting in low clinical evidence levels.
Conclusions: AI's potential in autonomous endovascular navigation is promising, but in an experimental proof-of-concept stage, with a technology readiness level of 3. We highlight that reference standards with well-identified performance metrics are crucial to allow for comparisons of data-driven algorithms proposed in the years to come.

[499]  arXiv:2405.03307 [pdf, other]
Title: Efficient Symbolic Planning with Views
Comments: 6 pages
Subjects: Robotics (cs.RO)

Robotic planning systems model spatial relations in detail as these are needed for manipulation tasks. In contrast to this, other physical attributes of objects and the effect of devices are usually oversimplified and expressed by abstract compound attributes. This limits the ability of planners to find alternative solutions. We propose to break these compound attributes down into a shared set of elementary attributes. This strongly facilitates generalization between different tasks and environments and thus helps to find innovative solutions. On the down-side, this generalization comes with an increased complexity of the solution space. Therefore, as the main contribution of the paper, we propose a method that splits the planning problem into a sequence of views, where in each view only an increasing subset of attributes is considered. We show that this view-based strategy offers a good compromise between planning speed and quality of the found plan, and discuss its general applicability and limitations.

[500]  arXiv:2405.03309 [pdf, other]
Title: On de Bruijn Rings and Families of Almost Perfect Maps
Authors: Peer Stelldinger
Subjects: Discrete Mathematics (cs.DM)

De Bruijn tori, also called perfect maps, are two-dimensional periodic arrays of letters drawn from a given finite alphabet, such that each possible pattern of a given shape $(m,n)$ appears exactly once within one period of the torus. It is still unknown if de Bruijn tori of some certain size exist, like e.g. square shaped de Bruijn Tori with odd $m=n\in\{3,5,7\}$ and an even alphabet size $k$. However, in certain applications like positional coding, sub-perfect maps are sufficient, i.e. one does not need every possible $(m,n)$-pattern to appear, as long as a sufficient large number of such patterns is captured and every pattern occurs at most once. We show, that given any $m=n$ and a square alphabet size $k^2$, one can efficiently construct a sub-perfect map which is almost perfect, i.e. of almost maximal size. We do this by introducing de Bruijn rings, i.e. sub-perfect maps of minimal height, and providing an efficient construction method for them. We extend our results to non-square torus shapes and arbitrary non-prime alphabet sizes.

[501]  arXiv:2405.03311 [pdf, other]
Title: Federated Learning for Drowsiness Detection in Connected Vehicles
Comments: 14 pages, 8 figures, 1 table, EAI INTSYS 2023 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Ensuring driver readiness poses challenges, yet driver monitoring systems can assist in determining the driver's state. By observing visual cues, such systems recognize various behaviors and associate them with specific conditions. For instance, yawning or eye blinking can indicate driver drowsiness. Consequently, an abundance of distributed data is generated for driver monitoring. Employing machine learning techniques, such as driver drowsiness detection, presents a potential solution. However, transmitting the data to a central machine for model training is impractical due to the large data size and privacy concerns. Conversely, training on a single vehicle would limit the available data and likely result in inferior performance. To address these issues, we propose a federated learning framework for drowsiness detection within a vehicular network, leveraging the YawDD dataset. Our approach achieves an accuracy of 99.2%, demonstrating its promise and comparability to conventional deep learning techniques. Lastly, we show how our model scales using various number of federated clients

[502]  arXiv:2405.03314 [pdf, other]
Title: Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery
Comments: 5 pages, 4 figures; accepted at IEEE ISBI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Point cloud registration aligns 3D point clouds using spatial transformations. It is an important task in computer vision, with applications in areas such as augmented reality (AR) and medical imaging. This work explores the intersection of two research trends: the integration of AR into image-guided surgery and the use of deep learning for point cloud registration. The main objective is to evaluate the feasibility of applying deep learning-based point cloud registration methods for image-to-patient registration in augmented reality-guided surgery. We created a dataset of point clouds from medical imaging and corresponding point clouds captured with a popular AR device, the HoloLens 2. We evaluate three well-established deep learning models in registering these data pairs. While we find that some deep learning methods show promise, we show that a conventional registration pipeline still outperforms them on our challenging dataset.

[503]  arXiv:2405.03316 [pdf, other]
Title: Provably Unlearnable Examples
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

The exploitation of publicly accessible data has led to escalating concerns regarding data privacy and intellectual property (IP) breaches in the age of artificial intelligence. As a strategy to safeguard both data privacy and IP-related domain knowledge, efforts have been undertaken to render shared data unlearnable for unauthorized models in the wild. Existing methods apply empirically optimized perturbations to the data in the hope of disrupting the correlation between the inputs and the corresponding labels such that the data samples are converted into Unlearnable Examples (UEs). Nevertheless, the absence of mechanisms that can verify how robust the UEs are against unknown unauthorized models and train-time techniques engenders several problems. First, the empirically optimized perturbations may suffer from the problem of cross-model generalization, which echoes the fact that the unauthorized models are usually unknown to the defender. Second, UEs can be mitigated by train-time techniques such as data augmentation and adversarial training. Furthermore, we find that a simple recovery attack can restore the clean-task performance of the classifiers trained on UEs by slightly perturbing the learned weights. To mitigate the aforementioned problems, in this paper, we propose a mechanism for certifying the so-called $(q, \eta)$-Learnability of an unlearnable dataset via parametric smoothing. A lower certified $(q, \eta)$-Learnability indicates a more robust protection over the dataset. Finally, we try to 1) improve the tightness of certified $(q, \eta)$-Learnability and 2) design Provably Unlearnable Examples (PUEs) which have reduced $(q, \eta)$-Learnability. According to experimental results, PUEs demonstrate both decreased certified $(q, \eta)$-Learnability and enhanced empirical robustness compared to existing UEs.

[504]  arXiv:2405.03318 [pdf, other]
Title: Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation
Comments: 11 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

The design of the query is crucial for the performance of DETR and its variants. Each query consists of two components: a content part and a positional one. Traditionally, the content query is initialized with a zero or learnable embedding, lacking essential content information and resulting in sub-optimal performance. In this paper, we introduce a novel plug-and-play module, Self-Adaptive Content Query (SACQ), to address this limitation. The SACQ module utilizes features from the transformer encoder to generate content queries via self-attention pooling. This allows candidate queries to adapt to the input image, resulting in a more comprehensive content prior and better focus on target objects. However, this improved concentration poses a challenge for the training process that utilizes the Hungarian matching, which selects only a single candidate and suppresses other similar ones. To overcome this, we propose a query aggregation strategy to cooperate with SACQ. It merges similar predicted candidates from different queries, easing the optimization. Our extensive experiments on the COCO dataset demonstrate the effectiveness of our proposed approaches across six different DETR's variants with multiple configurations, achieving an average improvement of over 1.0 AP.

[505]  arXiv:2405.03320 [pdf, other]
Title: Denoising of Geodetic Time Series Using Spatiotemporal Graph Neural Networks: Application to Slow Slip Event Extraction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Geophysics (physics.geo-ph)

Geospatial data has been transformative for the monitoring of the Earth, yet, as in the case of (geo)physical monitoring, the measurements can have variable spatial and temporal sampling and may be associated with a significant level of perturbations degrading the signal quality. Denoising geospatial data is, therefore, essential, yet often challenging because the observations may comprise noise coming from different origins, including both environmental signals and instrumental artifacts, which are spatially and temporally correlated, thus hard to disentangle. This study addresses the denoising of multivariate time series acquired by irregularly distributed networks of sensors, requiring specific methods to handle the spatiotemporal correlation of the noise and the signal of interest. Specifically, our method focuses on the denoising of geodetic position time series, used to monitor ground displacement worldwide with centimeter- to-millimeter precision. Among the signals affecting GNSS data, slow slip events (SSEs) are of interest to seismologists. These are transients of deformation that are weakly emerging compared to other signals. Here, we design SSEdenoiser, a multi-station spatiotemporal graph-based attentive denoiser that learns latent characteristics of GNSS noise to reveal SSE-related displacement with sub-millimeter precision. It is based on the key combination of graph recurrent networks and spatiotemporal Transformers. The proposed method is applied to the Cascadia subduction zone, where SSEs occur along with bursts of tectonic tremors, a seismic rumbling identified from independent seismic recordings. The extracted events match the spatiotemporal evolution of tremors. This good space-time correlation of the denoised GNSS signals with the tremors validates the proposed denoising procedure.

[506]  arXiv:2405.03321 [pdf, other]
Title: Distributed Model Checking on Graphs of Bounded Treedepth
Subjects: Data Structures and Algorithms (cs.DS)

We establish that every monadic second-order logic (MSO) formula on graphs with bounded treedepth is decidable in a constant number of rounds within the CONGEST model. To our knowledge, this marks the first meta-theorem regarding distributed model-checking. Various optimization problems on graphs are expressible in MSO. Examples include determining whether a graph $G$ has a clique of size $k$, whether it admits a coloring with $k$ colors, whether it contains a graph $H$ as a subgraph or minor, or whether terminal vertices in $G$ could be connected via vertex-disjoint paths. Our meta-theorem significantly enhances the work of Bousquet et al. [PODC 2022], which was focused on distributed certification of MSO on graphs with bounded treedepth. Moreover, our results can be extended to solving optimization and counting problems expressible in MSO, in graphs of bounded treedepth.

[507]  arXiv:2405.03322 [pdf, other]
Title: Enhancing Aeroacoustic Wind Tunnel Studies through Massive Channel Upscaling with MEMS Microphones
Comments: 30th AIAA/CEAS Aeroacoustics Conference
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Instrumentation and Detectors (physics.ins-det)

This paper presents a large 6~m x 3~m aperture 7200 MEMS microphone array. The array is designed so that sub-arrays with optimized point spread functions can be used for beamforming and thus, enable the research of source directivity in wind tunnel facilities. The total array consists of modular 800 microphone panels, each consisting of four unique PCB board designs. This modular architecture allows for the time-synchronized measurement of an arbitrary number of panels and thus, aperture size and total number of sensors. The panels can be installed without a gap so that the array's microphone pattern avoids high sidelobes in the point spread function. The array's capabilities are evaluated on a 1:9.5 airframe half model in an open wind tunnel at DNW-NWB. The total source emission is quantified and the directivity is evaluated with beamforming. Additional far-field microphones are employed to validate the results.

[508]  arXiv:2405.03326 [pdf, other]
Title: PAFOT: A Position-Based Approach for Finding Optimal Tests of Autonomous Vehicles
Comments: Pre-print from AST 2024 conference
Subjects: Software Engineering (cs.SE)

Autonomous Vehicles (AVs) are prone to revolutionise the transportation industry. However, they must be thoroughly tested to avoid safety violations. Simulation testing plays a crucial role in finding safety violations of Automated Driving Systems (ADSs). This paper proposes PAFOT, a position-based approach testing framework, which generates adversarial driving scenarios to expose safety violations of ADSs. We introduce a 9-position grid which is virtually drawn around the Ego Vehicle (EV) and modify the driving behaviours of Non-Playable Characters (NPCs) to move within this grid. PAFOT utilises a single-objective genetic algorithm to search for adversarial test scenarios. We demonstrate PAFOT on a well-known high-fidelity simulator, CARLA. The experimental results show that PAFOT can effectively generate safety-critical scenarios to crash ADSs and is able to find collisions in a short simulation time. Furthermore, it outperforms other search-based testing techniques by finding more safety-critical scenarios under the same driving conditions within less effective simulation time.

[509]  arXiv:2405.03327 [pdf, other]
Title: Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes
Subjects: Machine Learning (cs.LG)

The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several distinct phenotypes, which cannot be directly observed in clinical practice. Identifying these phenotypes could enhance our understanding of POD pathogenesis and facilitate the development of targeted prevention and treatment strategies. In this paper, we propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes. We first demonstrate our approach using synthetic data, where we simulate patient cohorts with predefined phenotypes based on distinct sets of informative features. We aim to mimic any clinical disease with our synthetic data generation method. By training a predictive model and applying SHAP, we show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space. We then present a case study using real-world data from a cohort of elderly surgical patients. The results showcase the utility of our approach in uncovering clinically relevant subtypes of complex disorders like POD, paving the way for more precise and personalized treatment strategies.

[510]  arXiv:2405.03328 [pdf, other]
Title: Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In this work, we introduce Brain Latent Progression (BrLP), a novel spatiotemporal disease progression model based on latent diffusion. BrLP is designed to predict the evolution of diseases at the individual level on 3D brain MRIs. Existing deep generative models developed for this task are primarily data-driven and face challenges in learning disease progressions. BrLP addresses these challenges by incorporating prior knowledge from disease models to enhance the accuracy of predictions. To implement this, we propose to integrate an auxiliary model that infers volumetric changes in various brain regions. Additionally, we introduce Latent Average Stabilization (LAS), a novel technique to improve spatiotemporal consistency of the predicted progression. BrLP is trained and evaluated on a large dataset comprising 11,730 T1-weighted brain MRIs from 2,805 subjects, collected from three publicly available, longitudinal Alzheimer's Disease (AD) studies. In our experiments, we compare the MRI scans generated by BrLP with the actual follow-up MRIs available from the subjects, in both cross-sectional and longitudinal settings. BrLP demonstrates significant improvements over existing methods, with an increase of 22% in volumetric accuracy across AD-related brain regions and 43% in image similarity to the ground-truth scans. The ability of BrLP to generate conditioned 3D scans at the subject level, along with the novelty of integrating prior knowledge to enhance accuracy, represents a significant advancement in disease progression modeling, opening new avenues for precision medicine. The code of BrLP is available at the following link: https://github.com/LemuelPuglisi/BrLP.

[511]  arXiv:2405.03329 [pdf, other]
Title: Policy Learning for Balancing Short-Term and Long-Term Rewards
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both long-term and short-term rewards, where some long-term outcomes are allowed to be missing. In particular, we first present the identifiability of both rewards under mild assumptions. Next, we deduce the semiparametric efficiency bounds, along with the consistency and asymptotic normality of their estimators. We also reveal that short-term outcomes, if associated, contribute to improving the estimator of the long-term reward. Based on the proposed estimators, we develop a principled policy learning approach and further derive the convergence rates of regret and estimation errors associated with the learned policy. Extensive experiments are conducted to validate the effectiveness of the proposed method, demonstrating its practical applicability.

[512]  arXiv:2405.03333 [pdf, other]
Title: Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the Video Quality Assessment (VQA). Unfortunately, almost all existing VQA models are built generally, measuring the quality of a video from a comprehensive perspective. As a result, Light-VQA, trained on LLVE-QA, is proposed for assessing LLVE. We extend the work of Light-VQA by expanding the LLVE-QA dataset into Video Exposure Correction Quality Assessment (VEC-QA) dataset with over-exposed videos and their corresponding corrected versions. In addition, we propose Light-VQA+, a VQA model specialized in assessing VEC. Light-VQA+ differs from Light-VQA mainly from the usage of the CLIP model and the vision-language guidance during the feature extraction, followed by a new module referring to the Human Visual System (HVS) for more accurate assessment. Extensive experimental results show that our model achieves the best performance against the current State-Of-The-Art (SOTA) VQA models on the VEC-QA dataset and other public datasets.

[513]  arXiv:2405.03334 [pdf, other]
Title: On the constrained feedback linearization control based on the MILP representation of a ReLU-ANN
Subjects: Systems and Control (eess.SY)

In this work, we explore the efficacy of rectified linear unit artificial neural networks in addressing the intricate challenges of convoluted constraints arising from feedback linearization mapping. Our approach involves a comprehensive procedure, encompassing the approximation of constraints through a regression process. Subsequently, we transform these constraints into an equivalent representation of mixed-integer linear constraints, seamlessly integrating them into other stabilizing control architectures. The advantage resides in the compatibility with the linear control design and the constraint satisfaction in the model predictive control setup, even for forecasted trajectories. Simulations are provided to validate the proposed constraint reformulation.

[514]  arXiv:2405.03340 [pdf, other]
Title: Functional Equivalence with NARS
Subjects: Artificial Intelligence (cs.AI)

This study explores the concept of functional equivalence within the framework of the Non-Axiomatic Reasoning System (NARS), specifically through OpenNARS for Applications (ONA). Functional equivalence allows organisms to categorize and respond to varied stimuli based on their utility rather than perceptual similarity, thus enhancing cognitive efficiency and adaptability. In this study, ONA was modified to allow the derivation of functional equivalence. This paper provides practical examples of the capability of ONA to apply learned knowledge across different functional situations, demonstrating its utility in complex problem-solving and decision-making. An extended example is included, where training of ONA aimed to learn basic human-like language abilities, using a systematic procedure in relating spoken words, objects and written words. The research carried out as part of this study extends the understanding of functional equivalence in AGI systems, and argues for its necessity for level of flexibility in learning and adapting necessary for human-level AGI.

[515]  arXiv:2405.03341 [pdf, other]
Title: Enhancing Q-Learning with Large Language Model Heuristics
Authors: Xiefeng Wu
Comments: arXiv admin note: text overlap with arXiv:2009.06799 by other authors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Q-learning excels in learning from feedback within sequential decision-making tasks but requires extensive sampling for significant improvements. Although reward shaping is a powerful technique for enhancing learning efficiency, it can introduce biases that affect agent performance. Furthermore, potential-based reward shaping is constrained as it does not allow for reward modifications based on actions or terminal states, potentially limiting its effectiveness in complex environments. Additionally, large language models (LLMs) can achieve zero-shot learning, but this is generally limited to simpler tasks. They also exhibit low inference speeds and occasionally produce hallucinations. To address these issues, we propose \textbf{LLM-guided Q-learning} that employs LLMs as heuristic to aid in learning the Q-function for reinforcement learning. It combines the advantages of both technologies without introducing performance bias. Our theoretical analysis demonstrates that the LLM heuristic provides action-level guidance. Additionally, our architecture has the capability to convert the impact of hallucinations into exploration costs. Moreover, the converged Q function corresponds to the MDP optimal Q function. Experiment results demonstrated that our algorithm enables agents to avoid ineffective exploration, enhances sampling efficiency, and is well-suited for complex control tasks.

[516]  arXiv:2405.03342 [pdf, other]
Title: Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG)

Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e.g., leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generation process. To mitigate bias stemming from misspecification, we propose a novel doubly robust causal effect estimator under networked interference, by adapting the targeted learning technique to the training of neural networks. Specifically, we generalize the targeted learning technique into the networked interference setting and establish the condition under which an estimator achieves double robustness. Based on the condition, we devise an end-to-end causal effect estimator by transforming the identified theoretical condition into a targeted loss. Moreover, we provide a theoretical analysis of our designed estimator, revealing a faster convergence rate compared to a single nuisance model. Extensive experimental results on two real-world networks with semisynthetic data demonstrate the effectiveness of our proposed estimators.

[517]  arXiv:2405.03343 [pdf, other]
Title: An efficient hierarchical Bayesian method for the Kuopio tomography challenge 2023
Subjects: Numerical Analysis (math.NA)

The aim of Electrical Impedance Tomography (EIT) is to determine the electrical conductivity distribution inside a domain by applying currents and measuring voltages on its boundary. Mathematically, the EIT reconstruction task can be formulated as a non-linear inverse problem. The Bayesian inverse problems framework has been applied expensively to solutions of the EIT inverse problem, in particular in the cases when the unknown conductivity is believed to be blocky. Recently, the Sparsity Promoting Iterative Alternating Sequential (PS-IAS) algorithm, originally proposed for the solution of linear inverse problems, has been adapted for the non linear case of EIT reconstruction in a computationally efficient manner. Here we introduce a hybrid version of the SP-IAS algorithms for the nonlinear EIT inverse problem, providing a detailed description of the implementation details, with a specific focus on parameters selection. The method is applied to the 2023 Kuopio Tomography Challenge dataset, with a comprehensive report of the running times for the different cases and parameter selections.

[518]  arXiv:2405.03345 [pdf, ps, other]
Title: FAIR 2.0: Extending the FAIR Guiding Principles to Address Semantic Interoperability
Subjects: Databases (cs.DB)

FAIR data presupposes their successful communication between machines and humans while preserving their meaning and reference, requiring all parties involved to share the same background knowledge. Inspired by English as a natural language, we investigate the linguistic structure that ensures reliable communication of information and draw parallels with data structures, understanding both as models of systems of interest. We conceptualize semantic interoperability as comprising terminological and propositional interoperability. The former includes ontological (i.e., same meaning) and referential (i.e., same referent/extension) interoperability and the latter schema (i.e., same data schema) and logical (i.e., same logical framework) interoperability. Since no best ontology and no best data schema exists, establishing semantic interoperability and FAIRness of data and metadata requires the provision of a comprehensive set of relevant ontological and referential entity mappings and schema crosswalks. We therefore propose appropriate additions to the FAIR Guiding Principles, leading to FAIR 2.0. Furthermore, achieving FAIRness of data requires the provision of FAIR services in addition to organizing data into FAIR Digital Objects. FAIR services include a terminology, a schema, and an operations service.

[519]  arXiv:2405.03348 [pdf, other]
Title: Evolution of the 5G New Radio Two-Step Random Access towards 6G Unsourced MAC
Comments: Version 1.0 of the report
Subjects: Information Theory (cs.IT)

This report summarizes some considerations on possible evolutions of grant-free random access in the next generation of the 3GPP wireless cellular standard. The analysis is carried out by mapping the problem to the recently-introduced unsourced multiple access channel (UMAC) setup. By doing so, the performance of existing solutions can be benchmarked with information-theoretic bounds, assessing the potential gains that can be achieved over legacy 3GPP schemes. The study focuses on the two-step random access (2SRA) protocol introduced by Release 16 of the 5G New Radio standard, investigating its applicability to support large MTC / IoT terminal populations in a grant-free fashion. The analysis shows that the existing 2SRA scheme may not succeed in providing energy-efficient support to large user populations. Modifications to the protocol are proposed that enable remarkable gains in both energy and spectral efficiency while retaining a strong resemblance to the legacy protocol.

[520]  arXiv:2405.03349 [pdf, other]
Title: Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preservation under low light conditions. Retinexformer enhances illumination estimation through traditional self-attention mechanisms, but faces challenges with insufficient interpretability and suboptimal enhancement effects. To overcome these limitations, this paper introduces the RetinexMamba architecture. RetinexMamba not only captures the physical intuitiveness of traditional Retinex methods but also integrates the deep learning framework of Retinexformer, leveraging the computational efficiency of State Space Models (SSMs) to enhance processing speed. This architecture features innovative illumination estimators and damage restorer mechanisms that maintain image quality during enhancement. Moreover, RetinexMamba replaces the IG-MSA (Illumination-Guided Multi-Head Attention) in Retinexformer with a Fused-Attention mechanism, improving the model's interpretability. Experimental evaluations on the LOL dataset show that RetinexMamba outperforms existing deep learning approaches based on Retinex theory in both quantitative and qualitative metrics, confirming its effectiveness and superiority in enhancing low-light images.

[521]  arXiv:2405.03351 [pdf, other]
Title: Modality Prompts for Arbitrary Modality Salient Object Detection
Comments: 13 pages, 7 Figures, 3 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper delves into the task of arbitrary modality salient object detection (AM SOD), aiming to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD, ie more diverse modality discrepancies caused by varying modality types that need to be processed, and dynamic fusion design caused by an uncertain number of modalities present in the inputs of multimodal fusion strategy. Specifically, inspired by prompt learning's ability of aligning the distributions of pre-trained models to the characteristic of downstream tasks by learning some prompts, MAT will first present a modality-adaptive feature extractor (MAFE) to tackle the diverse modality discrepancies by introducing a modality prompt for each modality. In the training stage, a new modality translation contractive (MTC) loss will be further designed to assist MAFE in learning those modality-distinguishable modality prompts. Accordingly, in the testing stage, MAFE can employ those learned modality prompts to adaptively adjust its feature space according to the characteristics of the input modalities, thus being able to extract discriminative unimodal features. Then, MAFE will present a channel-wise and spatial-wise fusion hybrid (CSFH) strategy to meet the demand for dynamic fusion. For that, CSFH dedicates a channel-wise dynamic fusion module (CDFM) and a novel spatial-wise dynamic fusion module (SDFM) to fuse the unimodal features from varying numbers of modalities and meanwhile effectively capture cross-modal complementary semantic and detail information, respectively. Moreover, CSFH will carefully align CDFM and SDFM to different levels of unimodal features based on their characteristics for more effective complementary information exploitation.

[522]  arXiv:2405.03352 [pdf, other]
Title: Salient Object Detection From Arbitrary Modalities
Comments: 15 Pages, 7 Figures, 8 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Toward desirable saliency prediction, the types and numbers of inputs for a salient object detection (SOD) algorithm may dynamically change in many real-life applications. However, existing SOD algorithms are mainly designed or trained for one particular type of inputs, failing to be generalized to other types of inputs. Consequentially, more types of SOD algorithms need to be prepared in advance for handling different types of inputs, raising huge hardware and research costs. Differently, in this paper, we propose a new type of SOD task, termed Arbitrary Modality SOD (AM SOD). The most prominent characteristics of AM SOD are that the modality types and modality numbers will be arbitrary or dynamically changed. The former means that the inputs to the AM SOD algorithm may be arbitrary modalities such as RGB, depths, or even any combination of them. While, the latter indicates that the inputs may have arbitrary modality numbers as the input type is changed, e.g. single-modality RGB image, dual-modality RGB-Depth (RGB-D) images or triple-modality RGB-Depth-Thermal (RGB-D-T) images. Accordingly, a preliminary solution to the above challenges, \i.e. a modality switch network (MSN), is proposed in this paper. In particular, a modality switch feature extractor (MSFE) is first designed to extract discriminative features from each modality effectively by introducing some modality indicators, which will generate some weights for modality switching. Subsequently, a dynamic fusion module (DFM) is proposed to adaptively fuse features from a variable number of modalities based on a novel Transformer structure. Finally, a new dataset, named AM-XD, is constructed to facilitate research on AM SOD. Extensive experiments demonstrate that our AM SOD method can effectively cope with changes in the type and number of input modalities for robust salient object detection.

[523]  arXiv:2405.03353 [pdf, other]
Title: Markov Chain-based Optimization Time Analysis of Bivalent Ant Colony Optimization for Sorting and LeadingOnes
Subjects: Neural and Evolutionary Computing (cs.NE); Computational Complexity (cs.CC)

So far, only few bounds on the runtime behavior of Ant Colony Optimization (ACO) have been reported. To alleviate this situation, we investigate the ACO variant we call Bivalent ACO (BACO) that uses exactly two pheromone values. We provide and successfully apply a new Markov chain-based approach to calculate the expected optimization time, i. e., the expected number of iterations until the algorithm terminates. This approach allows to derive exact formulae for the expected optimization time for the problems Sorting and LeadingOnes. It turns out that the ratio of the two pheromone values significantly governs the runtime behavior of BACO. To the best of our knowledge, for the first time, we can present tight bounds for Sorting ($\Theta(n^3)$) with a specifically chosen objective function and prove the missing lower bound $\Omega(n^2)$ for LeadingOnes which, thus, is tightly bounded by $\Theta(n^2)$. We show that despite we have a drastically simplified ant algorithm with respect to the influence of the pheromones on the solving process, known bounds on the expected optimization time for the problems OneMax ($O(n\log n)$) and LeadingOnes ($O(n^2)$) can be re-produced as a by-product of our approach. Experiments validate our theoretical findings.

[524]  arXiv:2405.03354 [pdf, other]
Title: VACO: a Multi-perspective Development of a Therapeutic and Motivational Virtual Robotic Agent for Concentration for children with ADHD
Subjects: Human-Computer Interaction (cs.HC)

In this work, we present (i) a novel approach how artificial intelligence can support in the therapy for better concentration of children with Attention Deficit Hyperactivity Disorder (ADHD) through motivational attention training with a virtual robotic agent and (ii) a development process in which different stakeholders are included with their perspectives. Therefore, we present three participative approaches to include the perspectives of different stakeholders. An online survey (Study I) was conducted with parents in Germany with the aim of ascertaining whether they would use software to promote their children's attention, what influences their attitude towards using it, and what requirements it would have to meet. About half of the parents would be willing to use software to promote attention. To develop the software as close to practice as possible, one of the developers took part in an intensive training for ADHD with the aim of testing which of the elements are technically feasible. Afterward, a first prototype was presented to clinicians (Study II) to make further adjustments. A first feasibility test (Study III) was conducted with the end users to check if the system works and if children and adolescents can use it. Attentional performance software offers multiple opportunities in the treatment of ADHD if the system is adapted to the needs of the practitioner and end user. This development process requires a lot of time and close interdisciplinary collaboration.

[525]  arXiv:2405.03355 [pdf, other]
Title: On the Theory of Cross-Modality Distillation with Contrastive Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Cross-modality distillation arises as an important topic for data modalities containing limited knowledge such as depth maps and high-quality sketches. Such techniques are of great importance, especially for memory and privacy-restricted scenarios where labeled training data is generally unavailable. To solve the problem, existing label-free methods leverage a few pairwise unlabeled data to distill the knowledge by aligning features or statistics between the source and target modalities. For instance, one typically aims to minimize the L2 distance or contrastive loss between the learned features of pairs of samples in the source (e.g. image) and the target (e.g. sketch) modalities. However, most algorithms in this domain only focus on the experimental results but lack theoretical insight. To bridge the gap between the theory and practical method of cross-modality distillation, we first formulate a general framework of cross-modality contrastive distillation (CMCD), built upon contrastive learning that leverages both positive and negative correspondence, towards a better distillation of generalizable features. Furthermore, we establish a thorough convergence analysis that reveals that the distance between source and target modalities significantly impacts the test error on downstream tasks within the target modality which is also validated by the empirical results. Extensive experimental results show that our algorithm outperforms existing algorithms consistently by a margin of 2-3\% across diverse modalities and tasks, covering modalities of image, sketch, depth map, and audio and tasks of recognition and segmentation.

[526]  arXiv:2405.03356 [pdf, other]
Title: An Overview of Intelligent Meta-surfaces for 6G and Beyond: Opportunities, Trends, and Challenges
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

With the impending arrival of the sixth generation (6G) of wireless communication technology, the telecommunications landscape is poised for another revolutionary transformation. At the forefront of this evolution are intelligent meta-surfaces (IS), emerging as a disruptive physical layer technology with the potential to redefine the capabilities and performance metrics of future wireless networks. As 6G evolves from concept to reality, industry stakeholders, standards organizations, and regulatory bodies are collaborating to define the specifications, protocols, and interoperability standards governing IS deployment. Against this background, this article delves into the ongoing standardization efforts, emerging trends, potential opportunities, and prevailing challenges surrounding the integration of IS into the framework of 6G and beyond networks. Specifically, it provides a tutorial-style overview of recent advancements in IS and explores their potential applications within future networks beyond 6G. Additionally, the article identifies key challenges in the design and implementation of various types of intelligent surfaces, along with considerations for their practical standardization. Finally, it highlights potential future prospects in this evolving field.

[527]  arXiv:2405.03357 [pdf, ps, other]
Title: A Game Theoretic Analysis of Validator Strategies in Ethereum 2.0
Comments: This work has been accepted for publication in BSCI 2024
Subjects: Computer Science and Game Theory (cs.GT)

Ethereum 2.0 is the second-largest cryptocurrency by market capitalization and a widely used smart contract platform. Therefore, examining the reliability of Ethereum 2.0's incentive mechanism is crucial, particularly its effectiveness in encouraging validators to adhere to the Ethereum 2.0's protocol. This paper studies the incentive mechanism of Ethereum 2.0 and evaluates its robustness by analyzing the interaction between block proposers and attesters in a single slot. To this end, we use Bayesian games to model the strategies of block proposers and attesters and calculate their expected utilities. Our results demonstrate that the Ethereum 2.0 incentive mechanism is incentive-compatible and promotes cooperation among validators. We prove that a Bayesian Nash equilibrium and an ex ante dominant strategy exist between the block proposer and attesters in a single slot. Our research provides a solid foundation for further analysis of Ethereum 2.0's incentive mechanism and insights for individuals considering participation as a validator in Ethereum 2.0.

[528]  arXiv:2405.03358 [pdf, other]
Title: Pinching Tactile Display: A Cloth that Changes Tactile Sensation by Electrostatic Adsorption
Comments: 9 pages, 7 figures, International Conference on Advanced Visual Interfaces 2024 (AVI 2024)
Subjects: Human-Computer Interaction (cs.HC); Hardware Architecture (cs.AR)

Haptic displays play an important role in enhancing the sense of presence in VR and telepresence. Displaying the tactile properties of fabrics has potential in the fashion industry, but there are difficulties in dynamically displaying different types of tactile sensations while maintaining their flexible properties. The vibrotactile stimulation of fabrics is an important element in the tactile properties of fabrics, as it greatly affects the way a garment feels when rubbed against the skin. To dynamically change the vibrotactile stimuli, many studies have used mechanical actuators. However, when combined with fabric, the soft properties of the fabric are compromised by the stiffness of the actuator. In addition, because the vibration generated by such actuators is applied to a single point, it is not possible to provide a uniform tactile sensation over the entire surface of the fabric, resulting in an uneven tactile sensation. In this study, we propose a Pinching Tactile Display: a conductive cloth that changes the tactile sensation by controlling electrostatic adsorption. By controlling the voltage and frequency applied to the conductive cloth, different tactile sensations can be dynamically generated. This makes it possible to create a tactile device in which tactile sensations are applied to the entire fabric while maintaining the thin and soft characteristics of the fabric. As a result, users could experiment with tactile sensations by picking up and rubbing the fabric in the same way they normally touch it. This mechanism has the potential for dynamic tactile transformation of soft materials.

[529]  arXiv:2405.03359 [pdf, ps, other]
Title: MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline
Comments: {copyright} 2024 IEEE. This work has been accepted for publication and presentation at the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, to be held in Orlando, Florida, USA, July 15-19, 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

This research focuses on evaluating the non-commercial open-source large language models (LLMs) Meditron, MedAlpaca, Mistral, and Llama-2 for their efficacy in interpreting medical guidelines saved in PDF format. As a specific test scenario, we applied these models to the guidelines for hypertension in children and adolescents provided by the European Society of Cardiology (ESC). Leveraging Streamlit, a Python library, we developed a user-friendly medical document chatbot tool (MedDoc-Bot). This tool enables authorized users to upload PDF files and pose questions, generating interpretive responses from four locally stored LLMs. A pediatric expert provides a benchmark for evaluation by formulating questions and responses extracted from the ESC guidelines. The expert rates the model-generated responses based on their fidelity and relevance. Additionally, we evaluated the METEOR and chrF metric scores to assess the similarity of model responses to reference answers. Our study found that Llama-2 and Mistral performed well in metrics evaluation. However, Llama-2 was slower when dealing with text and tabular data. In our human evaluation, we observed that responses created by Mistral, Meditron, and Llama-2 exhibited reasonable fidelity and relevance. This study provides valuable insights into the strengths and limitations of LLMs for future developments in medical document interpretation. Open-Source Code: https://github.com/yaseen28/MedDoc-Bot

[530]  arXiv:2405.03360 [pdf, other]
Title: Embedded Distributed Inference of Deep Neural Networks: A Systematic Review
Comments: 32 pages, 12 tables, 11 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Embedded distributed inference of Neural Networks has emerged as a promising approach for deploying machine-learning models on resource-constrained devices in an efficient and scalable manner. The inference task is distributed across a network of embedded devices, with each device contributing to the overall computation by performing a portion of the workload. In some cases, more powerful devices such as edge or cloud servers can be part of the system to be responsible of the most demanding layers of the network. As the demand for intelligent systems and the complexity of the deployed neural network models increases, this approach is becoming more relevant in a variety of applications such as robotics, autonomous vehicles, smart cities, Industry 4.0 and smart health. We present a systematic review of papers published during the last six years which describe techniques and methods to distribute Neural Networks across these kind of systems. We provide an overview of the current state-of-the-art by analysing more than 100 papers, present a new taxonomy to characterize them, and discuss trends and challenges in the field.

[531]  arXiv:2405.03361 [pdf, other]
Title: Secure Semantic Communication over Wiretap Channel
Subjects: Information Theory (cs.IT)

Semantic communication, an emerging feature for future networks like 6G, emphasizes message meaning. Yet, the open nature of a wireless channel poses security risks for semantic communications. In this paper we derive information-theoretic limits, considering the semantic source model within a wiretap channel framework. Under separate equivocation and distortion conditions for semantics and observed data, we present the general outer and inner bounds of the region. We also reduce the general region to a case of Gaussian source and channel and provide numerical evaluation.

[532]  arXiv:2405.03363 [pdf, other]
Title: Telextiles: End-to-end Remote Transmission of Fabric Tactile Sensation
Comments: 10 pages, 8 figures, Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
Journal-ref: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (2023)
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The tactile sensation of textiles is critical in determining the comfort of clothing. For remote use, such as online shopping, users cannot physically touch the textile of clothes, making it difficult to evaluate its tactile sensation. Tactile sensing and actuation devices are required to transmit the tactile sensation of textiles. The sensing device needs to recognize different garments, even with hand-held sensors. In addition, the existing actuation device can only present a limited number of known patterns and cannot transmit unknown tactile sensations of textiles. To address these issues, we propose Telextiles, an interface that can remotely transmit tactile sensations of textiles by creating a latent space that reflects the proximity of textiles through contrastive self-supervised learning. We confirm that textiles with similar tactile features are located close to each other in the latent space through a two-dimensional plot. We then compress the latent features for known textile samples into the 1D distance and apply the 16 textile samples to the rollers in the order of the distance. The roller is rotated to select the textile with the closest feature if an unknown textile is detected.

[533]  arXiv:2405.03367 [pdf, ps, other]
Title: On the (In-)Completeness of Destructive Equality Resolution in the Superposition Calculus
Authors: Uwe Waldmann
Comments: 22 pages; shortened version to appear in Proc. IJCAR 2024, Springer
Subjects: Logic in Computer Science (cs.LO)

Bachmair's and Ganzinger's abstract redundancy concept for the Superposition Calculus justifies almost all operations that are used in superposition provers to delete or simplify clauses, and thus to keep the clause set manageable. Typical examples are tautology deletion, subsumption deletion, and demodulation, and with a more refined definition of redundancy joinability and connectedness can be covered as well. The notable exception is Destructive Equality Resolution, that is, the replacement of a clause $x \not\approx t \lor C$ with $x \notin \mathrm{vars}(t)$ by $C\{x \mapsto t\}$. This operation is implemented in state-of-the-art provers, and it is clearly useful in practice, but little is known about how it affects refutational completeness. We demonstrate on the one hand that the naive addition of Destructive Equality Resolution to the standard abstract redundancy concept renders the calculus refutationally incomplete. On the other hand, we present several restricted variants of the Superposition Calculus that are refutationally complete even with Destructive Equality Resolution.

[534]  arXiv:2405.03371 [pdf, other]
Title: Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom
Comments: 12 pages, WWW'2024
Subjects: Computation and Language (cs.CL)

Most fake news detection methods learn latent feature representations based on neural networks, which makes them black boxes to classify a piece of news without giving any justification. Existing explainable systems generate veracity justifications from investigative journalism, which suffer from debunking delayed and low efficiency. Recent studies simply assume that the justification is equivalent to the majority opinions expressed in the wisdom of crowds. However, the opinions typically contain some inaccurate or biased information since the wisdom of crowds is uncensored. To detect fake news from a sea of diverse, crowded and even competing narratives, in this paper, we propose a novel defense-based explainable fake news detection framework. Specifically, we first propose an evidence extraction module to split the wisdom of crowds into two competing parties and respectively detect salient evidences. To gain concise insights from evidences, we then design a prompt-based module that utilizes a large language model to generate justifications by inferring reasons towards two possible veracities. Finally, we propose a defense-based inference module to determine veracity via modeling the defense among these justifications. Extensive experiments conducted on two real-world benchmarks demonstrate that our proposed method outperforms state-of-the-art baselines in terms of fake news detection and provides high-quality justifications.

[535]  arXiv:2405.03372 [pdf, other]
Title: Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G
Comments: 7 pages, 6 figures
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, costly communication overheads, severe computing resource consumption, and data heterogeneity across network nodes. These obstacles hinder the applications of ubiquitous computing capabilities of 6G networks, especially in light of the trend of escalating model parameters and training data volumes. To address these challenges effectively, this paper introduces "Snake Learning", a cost-effective distributed learning framework. Specifically, Snake Learning respects the heterogeneity of inter-node computing capability and local data distribution in 6G networks, and sequentially trains the designated part of model layers on individual nodes. This layer-by-layer serpentine update mechanism contributes to significantly reducing the requirements for storage, memory and communication during the model training phase, and demonstrates superior adaptability and efficiency for both Computer Vision (CV) training and Large Language Model (LLM) fine-tuning tasks across homogeneous and heterogeneous data distributions.

[536]  arXiv:2405.03373 [pdf, other]
Title: Knowledge-aware Text-Image Retrieval for Remote Sensing Images
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image-based retrieval in large Earth observation archives is challenging because one needs to navigate across thousands of candidate matches only with the query image as a guide. By using text as information supporting the visual query, the retrieval system gains in usability, but at the same time faces difficulties due to the diversity of visual signals that cannot be summarized by a short caption only. For this reason, as a matching-based task, cross-modal text-image retrieval often suffers from information asymmetry between texts and images. To address this challenge, we propose a Knowledge-aware Text-Image Retrieval (KTIR) method for remote sensing images. By mining relevant information from an external knowledge graph, KTIR enriches the text scope available in the search query and alleviates the information gaps between texts and images for better matching. Moreover, by integrating domain-specific knowledge, KTIR also enhances the adaptation of pre-trained vision-language models to remote sensing applications. Experimental results on three commonly used remote sensing text-image retrieval benchmarks show that the proposed knowledge-aware method leads to varied and consistent retrievals, outperforming state-of-the-art retrieval methods.

[537]  arXiv:2405.03376 [pdf, other]
Title: CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer
Comments: Main text and supplementary, 22 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI-based meteorological research. To mitigate this issue, we introduce an efficient neural codec, the Variational Autoencoder Transformer (VAEformer), for extreme compression of climate data to significantly reduce data storage cost, making AI-based meteorological research portable to researchers. Our approach diverges from recent complex neural codecs by utilizing a low-complexity Auto-Encoder transformer. This encoder produces a quantized latent representation through variance inference, which reparameterizes the latent space as a Gaussian distribution. This method improves the estimation of distributions for cross-entropy coding. Extensive experiments demonstrate that our VAEformer outperforms existing state-of-the-art compression methods in the context of climate data. By applying our VAEformer, we compressed the most popular ERA5 climate dataset (226 TB) into a new dataset, CRA5 (0.7 TB). This translates to a compression ratio of over 300 while retaining the dataset's utility for accurate scientific analysis. Further, downstream experiments show that global weather forecasting models trained on the compact CRA5 dataset achieve forecasting accuracy comparable to the model trained on the original dataset. Code, the CRA5 dataset, and the pre-trained model are available at https://github.com/taohan10200/CRA5.

[538]  arXiv:2405.03379 [pdf, other]
Title: Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024). Website: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, especially for domains such as robotics. Our approach consists of a reverse curriculum followed by a forward curriculum. Unique to our approach compared to past work is the ability to efficiently leverage more than one demonstration via a per-demonstration reverse curriculum generated via state resets. The result of our reverse curriculum is an initial policy that performs well on a narrow initial state distribution and helps overcome difficult exploration problems. A forward curriculum is then used to accelerate the training of the initial policy to perform well on the full initial state distribution of the task and improve demonstration and sample efficiency. We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency compared against various state-of-the-art learning-from-demonstration baselines, even solving previously unsolvable tasks that require high precision and control.

[539]  arXiv:2405.03381 [pdf, other]
Title: Statistical Edge Detection And UDF Learning For Shape Representation
Authors: Virgile Foy (IMT), Fabrice Gamboa (IMT), Reda Chhaibi (IMT)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Applications (stat.AP)

In the field of computer vision, the numerical encoding of 3D surfaces is crucial. It is classical to represent surfaces with their Signed Distance Functions (SDFs) or Unsigned Distance Functions (UDFs). For tasks like representation learning, surface classification, or surface reconstruction, this function can be learned by a neural network, called Neural Distance Function. This network, and in particular its weights, may serve as a parametric and implicit representation for the surface. The network must represent the surface as accurately as possible. In this paper, we propose a method for learning UDFs that improves the fidelity of the obtained Neural UDF to the original 3D surface. The key idea of our method is to concentrate the learning effort of the Neural UDF on surface edges. More precisely, we show that sampling more training points around surface edges allows better local accuracy of the trained Neural UDF, and thus improves the global expressiveness of the Neural UDF in terms of Hausdorff distance. To detect surface edges, we propose a new statistical method based on the calculation of a $p$-value at each point on the surface. Our method is shown to detect surface edges more accurately than a commonly used local geometric descriptor.

[540]  arXiv:2405.03382 [pdf, ps, other]
Title: Improving (Re-)Usability of Musical Datasets: An Overview of the DOREMUS Project
Journal-ref: Bibliothek Forschung und Praxis, 2018, 42 (2), pp.194-205.
Subjects: Information Retrieval (cs.IR)

DOREMUS works on a better description of music by building new tools to link and explore the data of three French institutions. This paper gives an overview of the data model based on FRBRoo, explains the conversion and linking processes using linked data technologies and presents the prototypes created to consume the data according to the web users' needs.

[541]  arXiv:2405.03384 [pdf, other]
Title: GLIP: Electromagnetic Field Exposure Map Completion by Deep Generative Networks
Subjects: Machine Learning (cs.LG)

In Spectrum cartography (SC), the generation of exposure maps for radio frequency electromagnetic fields (RF-EMF) spans dimensions of frequency, space, and time, which relies on a sparse collection of sensor data, posing a challenging ill-posed inverse problem. Cartography methods based on models integrate designed priors, such as sparsity and low-rank structures, to refine the solution of this inverse problem. In our previous work, EMF exposure map reconstruction was achieved by Generative Adversarial Networks (GANs) where physical laws or structural constraints were employed as a prior, but they require a large amount of labeled data or simulated full maps for training to produce efficient results. In this paper, we present a method to reconstruct EMF exposure maps using only the generator network in GANs which does not require explicit training, thus overcoming the limitations of GANs, such as using reference full exposure maps. This approach uses a prior from sensor data as Local Image Prior (LIP) captured by deep convolutional generative networks independent of learning the network parameters from images in an urban environment. Experimental results show that, even when only sparse sensor data are available, our method can produce accurate estimates.

[542]  arXiv:2405.03385 [pdf, other]
Title: Fully Reversing the Shoebox Image Source Method: From Impulse Responses to Room Parameters
Authors: Tom Sprunck (IRMA), Antoine Deleforge (IRMA), Yannick Privat (IECL, SPHINX, IUF), Cédric Foy (UMRAE, Cerema Direction Est)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Classical Physics (physics.class-ph)

We present an algorithm that fully reverses the shoebox image source method (ISM), a popular and widely used room impulse response (RIR) simulator for cuboid rooms introduced by Allen and Berkley in 1979. More precisely, given a discrete multichannel RIR generated by the shoebox ISM for a microphone array of known geometry, the algorithm reliably recovers the 18 input parameters. These are the 3D source position, the 3 dimensions of the room, the 6-degrees-of-freedom room translation and orientation, and an absorption coefficient for each of the 6 room boundaries. The approach builds on a recently proposed gridless image source localization technique combined with new procedures for room axes recovery and first-order-reflection identification. Extensive simulated experiments reveal that near-exact recovery of all parameters is achieved for a 32-element, 8.4-cm-wide spherical microphone array and a sampling rate of 16~kHz using fully randomized input parameters within rooms of size 2X2X2 to 10X10X5 meters. Estimation errors decay towards zero when increasing the array size and sampling rate. The method is also shown to strongly outperform a known baseline, and its ability to extrapolate RIRs at new positions is demonstrated. Crucially, the approach is strictly limited to low-passed discrete RIRs simulated using the vanilla shoebox ISM. Nonetheless, it represents to our knowledge the first algorithmic demonstration that this difficult inverse problem is in-principle fully solvable over a wide range of configurations.

[543]  arXiv:2405.03386 [pdf, other]
Title: Annot-Mix: Learning with Noisy Class Labels from Multiple Annotators via a Mixup Extension
Comments: Under review
Subjects: Machine Learning (cs.LG)

Training with noisy class labels impairs neural networks' generalization performance. In this context, mixup is a popular regularization technique to improve training robustness by making memorizing false class labels more difficult. However, mixup neglects that, typically, multiple annotators, e.g., crowdworkers, provide class labels. Therefore, we propose an extension of mixup, which handles multiple class labels per instance while considering which class label originates from which annotator. Integrated into our multi-annotator classification framework annot-mix, it performs superiorly to eight state-of-the-art approaches on eleven datasets with noisy class labels provided either by human or simulated annotators. Our code is publicly available through our repository at https://github.com/ies-research/annot-mix.

[544]  arXiv:2405.03387 [pdf, ps, other]
Title: The high dimensional psychological profile and cultural bias of ChatGPT
Authors: Hang Yuan (1), Zhongyue Che (1), Shao Li (1), Yue Zhang, Xiaomeng Hu (2), Siyang Luo (1) ((1) Sun Yat-Sen University, (2) Renmin University of China)
Subjects: Computation and Language (cs.CL)

Given the rapid advancement of large-scale language models, artificial intelligence (AI) models, like ChatGPT, are playing an increasingly prominent role in human society. However, to ensure that artificial intelligence models benefit human society, we must first fully understand the similarities and differences between the human-like characteristics exhibited by artificial intelligence models and real humans, as well as the cultural stereotypes and biases that artificial intelligence models may exhibit in the process of interacting with humans. This study first measured ChatGPT in 84 dimensions of psychological characteristics, revealing differences between ChatGPT and human norms in most dimensions as well as in high-dimensional psychological representations. Additionally, through the measurement of ChatGPT in 13 dimensions of cultural values, it was revealed that ChatGPT's cultural value patterns are dissimilar to those of various countries/regions worldwide. Finally, an analysis of ChatGPT's performance in eight decision-making tasks involving interactions with humans from different countries/regions revealed that ChatGPT exhibits clear cultural stereotypes in most decision-making tasks and shows significant cultural bias in third-party punishment and ultimatum games. The findings indicate that, compared to humans, ChatGPT exhibits a distinct psychological profile and cultural value orientation, and it also shows cultural biases and stereotypes in interpersonal decision-making. Future research endeavors should emphasize enhanced technical oversight and augmented transparency in the database and algorithmic training procedures to foster more efficient cross-cultural communication and mitigate social disparities.

[545]  arXiv:2405.03388 [pdf, other]
Title: 3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation
Comments: 10 pages, CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Building accurate maps is a key building block to enable reliable localization, planning, and navigation of autonomous vehicles. We propose a novel approach for building accurate maps of dynamic environments utilizing a sequence of LiDAR scans. To this end, we propose encoding the 4D scene into a novel spatio-temporal implicit neural map representation by fitting a time-dependent truncated signed distance function to each point. Using our representation, we extract the static map by filtering the dynamic parts. Our neural representation is based on sparse feature grids, a globally shared decoder, and time-dependent basis functions, which we jointly optimize in an unsupervised fashion. To learn this representation from a sequence of LiDAR scans, we design a simple yet efficient loss function to supervise the map optimization in a piecewise way. We evaluate our approach on various scenes containing moving objects in terms of the reconstruction quality of static maps and the segmentation of dynamic point clouds. The experimental results demonstrate that our method is capable of removing the dynamic part of the input point clouds while reconstructing accurate and complete 3D maps, outperforming several state-of-the-art methods. Codes are available at: https://github.com/PRBonn/4dNDF

[546]  arXiv:2405.03389 [pdf, other]
Title: Don't Waste Your Time: Early Stopping Cross-Validation
Comments: Accepted at Third International Conference on Automated Machine Learning (AutoML 2024); for code, see this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization and, by extension, better performance, the additional cost is often prohibitive for effective model selection within a time budget. We aim to make model selection with cross-validation more effective. Therefore, we study early stopping the process of cross-validation during model selection. We investigate the impact of early stopping on random search for two algorithms, MLP and random forest, across 36 classification datasets. We further analyze the impact of the number of folds by considering 3-, 5-, and 10-folds. In addition, we investigate the impact of early stopping with Bayesian optimization instead of random search and also repeated cross-validation. Our exploratory study shows that even a simple-to-understand and easy-to-implement method consistently allows model selection to converge faster; in ~94% of all datasets, on average by ~214%. Moreover, stopping cross-validation enables model selection to explore the search space more exhaustively by considering +167% configurations on average within one hour, while also obtaining better overall performance.

[547]  arXiv:2405.03393 [pdf, other]
Title: On-site scale factor linearity calibration of MEMS triaxial gyroscopes
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environments. The core strategy of the method involves utilizing the fact that the dot product of the measured gravity and the rotational speed in a fixed frame remains constant. To eliminate the influence of rotating centrifugal force on the accelerometer, the accelerometer data is measured while stationary. The proposed calibration experiment scheme, which allows gyroscopic measurements when operating each axis at a specific rotation speed, making it easier to evaluate the linearity across a related speed range constituted by a series of rotation speeds. Moreover, solely the classical least squares algorithm proves adequate for estimating the scale factor, notably streamlining the analysis of the calibration process. Extensive numerical simulations were conducted to analyze the proposed method's performance in calibrating a triaxial gyroscope model. Experimental validation was also carried out using a commercially available MEMS inertial measurement unit (LSM9DS1 from Arduino nano 33 BLE SENSE) and a servo motor capable of controlling precise speed. The experimental results effectively demonstrate the efficacy of the proposed calibration approach.

[548]  arXiv:2405.03401 [pdf, other]
Title: E2GNN: Efficient Graph Neural Network Ensembles for Semi-Supervised Classification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This work studies ensemble learning for graph neural networks (GNNs) under the popular semi-supervised setting. Ensemble learning has shown superiority in improving the accuracy and robustness of traditional machine learning by combining the outputs of multiple weak learners. However, adopting a similar idea to integrate different GNN models is challenging because of two reasons. First, GNN is notorious for its poor inference ability, so naively assembling multiple GNN models would deteriorate the inference efficiency. Second, when GNN models are trained with few labeled nodes, their performance are limited. In this case, the vanilla ensemble approach, e.g., majority vote, may be sub-optimal since most base models, i.e., GNNs, may make the wrong predictions. To this end, in this paper, we propose an efficient ensemble learner--E2GNN to assemble multiple GNNs in a learnable way by leveraging both labeled and unlabeled nodes. Specifically, we first pre-train different GNN models on a given data scenario according to the labeled nodes. Next, instead of directly combing their outputs for label inference, we train a simple multi-layer perceptron--MLP model to mimic their predictions on both labeled and unlabeled nodes. Then the unified MLP model is deployed to infer labels for unlabeled or new nodes. Since the predictions of unlabeled nodes from different GNN models may be incorrect, we develop a reinforced discriminator to effectively filter out those wrongly predicted nodes to boost the performance of MLP. By doing this, we suggest a principled approach to tackle the inference issues of GNN ensembles and maintain the merit of ensemble learning: improved performance. Comprehensive experiments over both transductive and inductive settings, across different GNN backbones and 8 benchmark datasets, demonstrate the superiority of E2GNN.

[549]  arXiv:2405.03403 [pdf, ps, other]
Title: Improved scalar auxiliary variable schemes for original energy stability of gradient flows
Subjects: Numerical Analysis (math.NA)

Scalar auxiliary variable (SAV) methods are a class of linear schemes for solving gradient flows that are known for the stability of a `modified' energy. In this paper, we propose an improved SAV (iSAV) scheme that not only retains the complete linearity but also ensures rigorously the stability of the original energy. The convergence and optimal error bound are rigorously established for the iSAV scheme and discussions are made for its high-order extension. Extensive numerical experiments are done to validate the convergence, robustness and energy stability of iSAV, and some comparisons are made.

[550]  arXiv:2405.03406 [pdf, other]
Title: Automated Computation of Therapies Using Failure Mode and Effects Analysis in the Medical Domain
Comments: Accepted to the German Journal of Artificial Intelligence
Subjects: Artificial Intelligence (cs.AI)

Failure mode and effects analysis (FMEA) is a systematic approach to identify and analyse potential failures and their effects in a system or process. The FMEA approach, however, requires domain experts to manually analyse the FMEA model to derive risk-reducing actions that should be applied. In this paper, we provide a formal framework to allow for automatic planning and acting in FMEA models. More specifically, we cast the FMEA model into a Markov decision process which can then be solved by existing solvers. We show that the FMEA approach can not only be used to support medical experts during the modelling process but also to automatically derive optimal therapies for the treatment of patients.

[551]  arXiv:2405.03409 [pdf, other]
Title: LightTR: A Lightweight Framework for Federated Trajectory Recovery
Comments: The paper was accepted by ICDE 2024
Subjects: Machine Learning (cs.LG)

With the proliferation of GPS-equipped edge devices, huge trajectory data is generated and accumulated in various domains, motivating a variety of urban applications. Due to the limited acquisition capabilities of edge devices, a lot of trajectories are recorded at a low sampling rate, which may lead to the effectiveness drop of urban applications. We aim to recover a high-sampled trajectory based on the low-sampled trajectory in free space, i.e., without road network information, to enhance the usability of trajectory data and support urban applications more effectively. Recent proposals targeting trajectory recovery often assume that trajectories are available at a central location, which fail to handle the decentralized trajectories and hurt privacy. To bridge the gap between decentralized training and trajectory recovery, we propose a lightweight framework, LightTR, for federated trajectory recovery based on a client-server architecture, while keeping the data decentralized and private in each client/platform center (e.g., each data center of a company). Specifically, considering the limited processing capabilities of edge devices, LightTR encompasses a light local trajectory embedding module that offers improved computational efficiency without compromising its feature extraction capabilities. LightTR also features a meta-knowledge enhanced local-global training scheme to reduce communication costs between the server and clients and thus further offer efficiency improvement. Extensive experiments demonstrate the effectiveness and efficiency of the proposed framework.

[552]  arXiv:2405.03411 [pdf, other]
Title: Greedy Heuristics for Sampling-based Motion Planning in High-Dimensional State Spaces
Comments: To be published at the International Journal of Robotics Research (IJRR)
Subjects: Robotics (cs.RO)

Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approximating and searching the problem domain through random sampling. However, due to its low sampling efficiency and slow convergence rate, research has proposed many variants of RRT*, incorporating different heuristics and sampling strategies to overcome the constraints in complex planning problems. Yet, these approaches address specific convergence aspects of RRT* limitations, leaving a need for a sampling-based algorithm that can quickly find better solutions in complex high-dimensional state spaces with a faster convergence rate for practical motion planning applications. This article unifies and leverages the greedy search and heuristic techniques used in various RRT* variants to develop a greedy version of the anytime Rapidly-exploring Random Trees algorithm, denoted as Greedy RRT* (G-RRT*). It improves the initial solution-finding time of RRT* by maintaining two trees rooted at both the start and goal ends, advancing toward each other using greedy connection heuristics. It also accelerates the convergence rate of RRT* by introducing a greedy version of direct informed sampling procedure, which guides the sampling towards the promising region of the problem domain based on heuristics. We validate our approach on simulated planning problems, manipulation problems on Barrett WAM Arms, and on a self-reconfigurable robot, Panthera. Results show that G-RRT* produces asymptotically optimal solution paths and outperforms state-of-the-art RRT* variants, especially in high-dimensional planning problems.

[553]  arXiv:2405.03413 [pdf, other]
Title: SL-SLAM: A robust visual-inertial SLAM based deep feature extraction and matching
Subjects: Robotics (cs.RO)

This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at https://github.com/zzzzxxxx111/SLslam.

[554]  arXiv:2405.03415 [pdf, other]
Title: Unique solvability and error analysis of the Lagrange multiplier approach for gradient flows
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

The unique solvability and error analysis of the original Lagrange multiplier approach proposed in [8] for gradient flows is studied in this paper. We identify a necessary and sufficient condition that must be satisfied for the nonlinear algebraic equation arising from the original Lagrange multiplier approach to admit a unique solution in the neighborhood of its exact solution, and propose a modified Lagrange multiplier approach so that the computation can continue even if the aforementioned condition is not satisfied. Using Cahn-Hilliard equation as an example, we prove rigorously the unique solvability and establish optimal error estimates of a second-order Lagrange multiplier scheme assuming this condition and that the time step is sufficient small. We also present numerical results to demonstrate that the modified Lagrange multiplier approach is much more robust and can use much larger time step than the original Lagrange multiplier approach.

[555]  arXiv:2405.03416 [pdf, ps, other]
Title: Mental health of computing professionals and students: A systematic literature review
Comments: 16 pages, 7 figures
Subjects: Computers and Society (cs.CY)

The intersections of mental health and computing education is under-examined. In this systematic literature review, we evaluate the state-of-the-art of research in mental health and well-being interventions, assessments, and concerns like anxiety and depression in computer science and computing education. The studies evaluated occurred across the computing education pipeline from introductory to PhD courses and found some commonalities contributing to high reporting of anxiety and depression in those studied. In addition, interventions that were designed to address mental health topics often revolved around self-guidance. Based on our review of the literature, we recommend increasing sample sizes and focusing on the design and development of tools and interventions specifically designed for computing professionals and students.

[556]  arXiv:2405.03417 [pdf, other]
Title: Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
Comments: 24 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting.

[557]  arXiv:2405.03419 [pdf, other]
Title: Automated Metaheuristic Algorithm Design with Autoregressive Learning
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Automated design of metaheuristic algorithms offers an attractive avenue to reduce human effort and gain enhanced performance beyond human intuition. Current automated methods design algorithms within a fixed structure and operate from scratch. This poses a clear gap towards fully discovering potentials over the metaheuristic family and fertilizing from prior design experience. To bridge the gap, this paper proposes an autoregressive learning-based designer for automated design of metaheuristic algorithms. Our designer formulates metaheuristic algorithm design as a sequence generation task, and harnesses an autoregressive generative network to handle the task. This offers two advances. First, through autoregressive inference, the designer generates algorithms with diverse lengths and structures, enabling to fully discover potentials over the metaheuristic family. Second, prior design knowledge learned and accumulated in neurons of the designer can be retrieved for designing algorithms for future problems, paving the way to continual design of algorithms for open-ended problem-solving. Extensive experiments on numeral benchmarks and real-world problems reveal that the proposed designer generates algorithms that outperform all human-created baselines on 24 out of 25 test problems. The generated algorithms display various structures and behaviors, reasonably fitting for different problem-solving contexts. Code will be released after paper publication.

[558]  arXiv:2405.03420 [pdf, ps, other]
Title: Implantable Adaptive Cells: differentiable architecture search to improve the performance of any trained U-shaped network
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a novel approach to enhance the performance of pre-trained neural networks in medical image segmentation using Neural Architecture Search (NAS) methods, specifically Differentiable Architecture Search (DARTS). We present the concept of Implantable Adaptive Cell (IAC), small but powerful modules identified through Partially-Connected DARTS, designed to be injected into the skip connections of an existing and already trained U-shaped model. Our strategy allows for the seamless integration of the IAC into the pre-existing architecture, thereby enhancing its performance without necessitating a complete retraining from scratch. The empirical studies, focusing on medical image segmentation tasks, demonstrate the efficacy of this method. The integration of specialized IAC cells into various configurations of the U-Net model increases segmentation accuracy by almost 2\% points on average for the validation dataset and over 3\% points for the training dataset. The findings of this study not only offer a cost-effective alternative to the complete overhaul of complex models for performance upgrades but also indicate the potential applicability of our method to other architectures and problem domains.

[559]  arXiv:2405.03421 [pdf, other]
Title: Homotopy methods for higher order shape optimization: A globalized shape-Newton method and Pareto-front tracing
Subjects: Numerical Analysis (math.NA)

First order shape optimization methods, in general, require a large number of iterations until they reach a locally optimal design. While higher order methods can significantly reduce the number of iterations, they exhibit only local convergence properties, necessitating a sufficiently close initial guess. In this work, we present an unregularized shape-Newton method and combine shape optimization with homotopy (or continuation) methods in order to allow for the use of higher order methods even if the initial design is far from a solution. The idea of homotopy methods is to continuously connect the problem of interest with a simpler problem and to follow the corresponding solution path by a predictor-corrector scheme. We use a shape-Newton method as a corrector and arbitrary order shape derivatives for the predictor. Moreover, we apply homotopy methods also to the case of multi-objective shape optimization to efficiently obtain well-distributed points on a Pareto front. Finally, our results are substantiated with a set of numerical experiments.

[560]  arXiv:2405.03425 [pdf, other]
Title: Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models
Comments: 14 pages, 1 figure, 2 tables
Subjects: Computation and Language (cs.CL)

Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration. We further show that our method exhibits greater robustness against distribution shift, as reflected in its performance on out-of-distribution tasks.

[561]  arXiv:2405.03426 [pdf, other]
Title: EdgeAlpha: Bringing Process Discovery to the Data Sources
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Process Mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging for traditional process discovery algorithms to store and mine such data in traditional event logs. Further, privacy considerations often prevent data collection at a central location in the first place. To address this challenge, this paper introduces EdgeAlpha, a distributed algorithm for process discovery operating directly on sensor nodes and edge devices on a stream of real-time event data. Based on the Alpha Miner, EdgeAlpha tracks each event and its predecessor and successor events directly on the sensor node where the event is sensed and recorded. From this local view, each node in EdgeAlpha derives a partial footprint matrix, which we then merge at a central location, whenever we query the system to compute a process model. EdgeAlpha enables (a) scalable mining, as a node, for each event, only interacts with its predecessors and, when queried, only exchanges aggregates, i.e., partial footprint matrices, with the central location and (b) privacy preserving process mining, as nodes only store their own as well as predecessor and successor events. On the Sepsis Cases event log, for example, a node queries on average 18.7% of all nodes. For the Hospital Log, we can even reduce the overall querying to 3.87% of the nodes.

[562]  arXiv:2405.03427 [pdf, other]
Title: Geometry-aware framework for deep energy method: an application to structural mechanics with hyperelastic materials
Comments: 28 pages, 26 figures, 4 tables
Subjects: Machine Learning (cs.LG)

Physics-Informed Neural Networks (PINNs) have gained considerable interest in diverse engineering domains thanks to their capacity to integrate physical laws into deep learning models. Recently, geometry-aware PINN-based approaches that employ the strong form of underlying physical system equations have been developed with the aim of integrating geometric information into PINNs. Despite ongoing research, the assessment of PINNs in problems with various geometries remains an active area of investigation. In this work, we introduce a novel physics-informed framework named the Geometry-Aware Deep Energy Method (GADEM) for solving structural mechanics problems on different geometries. As the weak form of the physical system equation (or the energy-based approach) has demonstrated clear advantages compared to the strong form for solving solid mechanics problems, GADEM employs the weak form and aims to infer the solution on multiple shapes of geometries. Integrating a geometry-aware framework into an energy-based method results in an effective physics-informed deep learning model in terms of accuracy and computational cost. Different ways to represent the geometric information and to encode the geometric latent vectors are investigated in this work. We introduce a loss function of GADEM which is minimized based on the potential energy of all considered geometries. An adaptive learning method is also employed for the sampling of collocation points to enhance the performance of GADEM. We present some applications of GADEM to solve solid mechanics problems, including a loading simulation of a toy tire involving contact mechanics and large deformation hyperelasticity. The numerical results of this work demonstrate the remarkable capability of GADEM to infer the solution on various and new shapes of geometries using only one trained model.

[563]  arXiv:2405.03429 [pdf, ps, other]
Title: ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers
Comments: 9 pages, 3 figures, to be published at IEEE CAI 2024, Associated code available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Transformers have recently gained prominence in long time series forecasting by elevating accuracies in a variety of use cases. Regrettably, in the race for better predictive performance the overhead of model architectures has grown onerous, leading to models with computational demand infeasible for most practical applications. To bridge the gap between high method complexity and realistic computational resources, we introduce the Residual Cyclic Transformer, ReCycle. ReCycle utilizes primary cycle compression to address the computational complexity of the attention mechanism in long time series. By learning residuals from refined smoothing average techniques, ReCycle surpasses state-of-the-art accuracy in a variety of application use cases. The reliable and explainable fallback behavior ensured by simple, yet robust, smoothing average techniques additionally lowers the barrier for user acceptance. At the same time, our approach reduces the run time and energy consumption by more than an order of magnitude, making both training and inference feasible on low-performance, low-power and edge computing devices. Code is available at https://github.com/Helmholtz-AI-Energy/ReCycle

[564]  arXiv:2405.03432 [pdf, other]
Title: Improved Forward-Forward Contrastive Learning
Authors: Gananath R
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The backpropagation algorithm, or backprop, is a widely utilized optimization technique in deep learning. While there's growing evidence suggesting that models trained with backprop can accurately explain neuronal data, no backprop-like method has yet been discovered in the biological brain for learning. Moreover, employing a naive implementation of backprop in the brain has several drawbacks. In 2022, Geoffrey Hinton proposed a biologically plausible learning method known as the Forward-Forward (FF) algorithm. Shortly after this paper, a modified version called FFCL was introduced. However, FFCL had limitations, notably being a three-stage learning system where the final stage still relied on regular backpropagation. In our approach, we address these drawbacks by eliminating the last two stages of FFCL and completely removing regular backpropagation. Instead, we rely solely on local updates, offering a more biologically plausible alternative.

[565]  arXiv:2405.03433 [pdf, other]
Title: Annealed adaptive importance sampling method in PINNs for solving high dimensional partial differential equations
Subjects: Numerical Analysis (math.NA)

Physics-informed neural networks (PINNs) have emerged as powerful tools for solving a wide range of partial differential equations (PDEs). However, despite their user-friendly interface and broad applicability, PINNs encounter challenges in accurately resolving PDEs, especially when dealing with singular cases that may lead to unsatisfactory local minima. To address these challenges and improve solution accuracy, we propose an innovative approach called Annealed Adaptive Importance Sampling (AAIS) for computing the discretized PDE residuals of the cost functions, inspired by the Expectation Maximization algorithm used in finite mixtures to mimic target density. Our objective is to approximate discretized PDE residuals by strategically sampling additional points in regions with elevated residuals, thus enhancing the effectiveness and accuracy of PINNs. Implemented together with a straightforward resampling strategy within PINNs, our AAIS algorithm demonstrates significant improvements in efficiency across a range of tested PDEs, even with limited training datasets. Moreover, our proposed AAIS-PINN method shows promising capabilities in solving high-dimensional singular PDEs. The adaptive sampling framework introduced here can be integrated into various PINN frameworks.

[566]  arXiv:2405.03436 [pdf, other]
Title: DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization
Comments: 7 pages, 6 figures (Have been accepted by IJCNN 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Embedding invisible hyperlinks or hidden codes in images to replace QR codes has become a hot topic recently. This technology requires first localizing the embedded region in the captured photos before decoding. Existing methods that train models to find the invisible embedded region struggle to obtain accurate localization results, leading to degraded decoding accuracy. This limitation is primarily because the CNN network is sensitive to low-frequency signals, while the embedded signal is typically in the high-frequency form. Based on this, this paper proposes a Dual-Branch Dual-Head (DBDH) neural network tailored for the precise localization of invisible embedded regions. Specifically, DBDH uses a low-level texture branch containing 62 high-pass filters to capture the high-frequency signals induced by embedding. A high-level context branch is used to extract discriminative features between the embedded and normal regions. DBDH employs a detection head to directly detect the four vertices of the embedding region. In addition, we introduce an extra segmentation head to segment the mask of the embedding region during training. The segmentation head provides pixel-level supervision for model learning, facilitating better learning of the embedded signals. Based on two state-of-the-art invisible offline-to-online messaging methods, we construct two datasets and augmentation strategies for training and testing localization models. Extensive experiments demonstrate the superior performance of the proposed DBDH over existing methods.

[567]  arXiv:2405.03437 [pdf, other]
Title: pyCFS-data: Data Processing Framework in Python for openCFS
Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)

Many numerical simulation tools have been developed and are on the market, but there is still a strong need for appropriate tools capable of simulating multi-field problems, especially in aeroacoustics. Therefore, openCFS provides an open-source framework for implementing partial differential equations using the finite element method. Since 2000, the software has been developed continuously. The result is openCFS (before 2020, known as CFS++ Coupled Field Simulations written in C++). In this paper, we present pyCFS-data, a data processing framework written in Python to provide a flexible and easy-to-use toolbox to access and manipulate, pre- and postprocess data generated by or for usage with openCFS.

[568]  arXiv:2405.03440 [pdf, other]
Title: Robotic Constrained Imitation Learning for the Peg Transfer Task in Fundamentals of Laparoscopic Surgery
Comments: Accepted at ICRA2024, website - this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this study, we present an implementation strategy for a robot that performs peg transfer tasks in Fundamentals of Laparoscopic Surgery (FLS) via imitation learning, aimed at the development of an autonomous robot for laparoscopic surgery. Robotic laparoscopic surgery presents two main challenges: (1) the need to manipulate forceps using ports established on the body surface as fulcrums, and (2) difficulty in perceiving depth information when working with a monocular camera that displays its images on a monitor. Especially, regarding issue (2), most prior research has assumed the availability of depth images or models of a target to be operated on. Therefore, in this study, we achieve more accurate imitation learning with only monocular images by extracting motion constraints from one exemplary motion of skilled operators, collecting data based on these constraints, and conducting imitation learning based on the collected data. We implemented an overall system using two Franka Emika Panda Robot Arms and validated its effectiveness.

[569]  arXiv:2405.03442 [pdf, other]
Title: Behavioral analysis in immersive learning environments: A systematic literature review and research agenda
Authors: Yu Liu, Kang Yue, Yue Liu
Comments: 29 pages, 5 figures
Subjects: Human-Computer Interaction (cs.HC)

The rapid growth of immersive technologies in educational areas has increased research interest in analyzing the specific behavioral patterns of learners in immersive learning environments. Considering the fact that research on the technical affordances of immersive technologies and the pedagogical affordances of behavioral analysis remains fragmented, this study first contributes by developing a conceptual framework that amalgamates learning requirements, specification, evaluation, and iteration into an integrated model to identify learning benefits and potential hurdles of behavioral analysis in immersive learning environments. Then, a systematic review was conducted underpinning the proposed conceptual framework to retrieve valuable empirical evidence from the 40 eligible articles during the last decade. The review findings suggest that (1) there is an essential need to sufficiently prepare the salient pedagogical requirements to define the specific learning stage, envisage intended cognitive objectives, and specify an appropriate set of learning activities, when developing comprehensive plans on behavioral analysis in immersive learning environments. (2) Researchers could customize the unique immersive experimental implementation by considering factors from four dimensions: learner, pedagogy, context, and representation. (3) The behavioral patterns constructed in immersive learning environments vary by considering the influence of behavioral analysis techniques, research themes, and immersive technical features. (4) The use of behavioral analysis in immersive learning environments faces several challenges from technical, implementation, and data processing perspectives. This study also articulates critical research agenda that could drive future investigation on behavioral analysis in immersive learning environments.

[570]  arXiv:2405.03446 [pdf, other]
Title: SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence
Subjects: Cryptography and Security (cs.CR)

To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability of large language models (LLMs) in handling complex tasks, in this paper, we introduce a framework to benchmark, elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events (SEvenLLM). Specifically, we create a high-quality bilingual instruction corpus by crawling cybersecurity raw text from cybersecurity websites to overcome the lack of effective data for information extraction. Then, we design a pipeline to auto-select tasks from the tasks pool and convert the raw text into supervised corpora comprised of question and response. The instruction dataset SEvenLLM-Instruct is used to train cybersecurity LLMs with the multi-task learning objective (27 well-designed tasks) for augmenting the analysis of cybersecurity events. Extensive experiments in our curated benchmark (SEvenLLM-bench) demonstrate that SEvenLLM performs more sophisticated threat analysis and fortifies defenses against the evolving landscape of cyber threats.

[571]  arXiv:2405.03449 [pdf, other]
Title: Byzantine-Robust Gossip: Insights from a Dual Approach
Comments: 9 pages, 1 figure
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Distributed approaches have many computational benefits, but they are vulnerable to attacks from a subset of devices transmitting incorrect information. This paper investigates Byzantine-resilient algorithms in a decentralized setting, where devices communicate directly with one another. We leverage the so-called dual approach to design a general robust decentralized optimization method. We provide both global and local clipping rules in the special case of average consensus, with tight convergence guarantees. These clipping rules are practical, and yield results that finely characterize the impact of Byzantine nodes, highlighting for instance a qualitative difference in convergence between global and local clipping thresholds. Lastly, we demonstrate that they can serve as a basis for designing efficient attacks.

[572]  arXiv:2405.03452 [pdf, ps, other]
Title: Large Language Models (LLMs) as Agents for Augmented Democracy
Comments: 15 pages main manuscript with 3 figures. 12 pages of supplementary material
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We explore the capabilities of an augmented democracy system built on off-the-shelf LLMs fine-tuned on data summarizing individual preferences across 67 policy proposals collected during the 2022 Brazilian presidential elections. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants. At the individual level, the accuracy of the out of sample predictions lie in the range 69%-76% and are significantly better at predicting the preferences of liberal and college educated participants. At the population level, we aggregate preferences using an adaptation of the Borda score and compare the ranking of policy proposals obtained from a probabilistic sample of participants and from data augmented using LLMs. We find that the augmented data predicts the preferences of the full population of participants better than probabilistic samples alone when these represent less than 30% to 40% of the total population. These results indicate that LLMs are potentially useful for the construction of systems of augmented democracy.

[573]  arXiv:2405.03456 [pdf, other]
Title: Performance of H-Matrix-Vector Multiplication with Floating Point Compression
Authors: Ronald Kriemann
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices. However, due to its low computational intensity, its performance is typically limited by the available memory bandwidth. By optimizing the storage representation of the data within such matrices, this limitation can be lifted and the performance increased. This applies not only to hierarchical matrices but for also for other low-rank approximation schemes, e.g. block low-rank matrices.

[574]  arXiv:2405.03458 [pdf, other]
Title: SSyncOA: Self-synchronizing Object-aligned Watermarking to Resist Cropping-paste Attacks
Comments: 7 pages, 5 figures (Have been accepted by ICME 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Modern image processing tools have made it easy for attackers to crop the region or object of interest in images and paste it into other images. The challenge this cropping-paste attack poses to the watermarking technology is that it breaks the synchronization of the image watermark, introducing multiple superimposed desynchronization distortions, such as rotation, scaling, and translation. However, current watermarking methods can only resist a single type of desynchronization and cannot be applied to protect the object's copyright under the cropping-paste attack. With the finding that the key to resisting the cropping-paste attack lies in robust features of the object to protect, this paper proposes a self-synchronizing object-aligned watermarking method, called SSyncOA. Specifically, we first constrain the watermarked region to be aligned with the protected object, and then synchronize the watermark's translation, rotation, and scaling distortions by normalizing the object invariant features, i.e., its centroid, principal orientation, and minimum bounding square, respectively. To make the watermark embedded in the protected object, we introduce the object-aligned watermarking model, which incorporates the real cropping-paste attack into the encoder-noise layer-decoder pipeline and is optimized end-to-end. Besides, we illustrate the effect of different desynchronization distortions on the watermark training, which confirms the necessity of the self-synchronization process. Extensive experiments demonstrate the superiority of our method over other SOTAs.

[575]  arXiv:2405.03462 [pdf, ps, other]
Title: A Lightweight Neural Architecture Search Model for Medical Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Accurate classification of medical images is essential for modern diagnostics. Deep learning advancements led clinicians to increasingly use sophisticated models to make faster and more accurate decisions, sometimes replacing human judgment. However, model development is costly and repetitive. Neural Architecture Search (NAS) provides solutions by automating the design of deep learning architectures. This paper presents ZO-DARTS+, a differentiable NAS algorithm that improves search efficiency through a novel method of generating sparse probabilities by bi-level optimization. Experiments on five public medical datasets show that ZO-DARTS+ matches the accuracy of state-of-the-art solutions while reducing search times by up to three times.

[576]  arXiv:2405.03467 [pdf, ps, other]
Title: Welfare Loss in Connected Resource Allocation
Comments: Appears in the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024
Subjects: Computer Science and Game Theory (cs.GT)

We study the allocation of indivisible goods that form an undirected graph and investigate the worst-case welfare loss when requiring that each agent must receive a connected subgraph. Our focus is on both egalitarian and utilitarian welfare. Specifically, we introduce the concept of egalitarian (resp., utilitarian) price of connectivity, which captures the worst-case ratio between the optimal egalitarian (resp., utilitarian) welfare among all allocations and that among the connected allocations. We provide tight or asymptotically tight bounds on the price of connectivity for various large classes of graphs when there are two agents as well as for paths, stars and cycles in the general case. Many of our results are supplemented with algorithms which find connected allocations with a welfare guarantee corresponding to the price of connectivity.

[577]  arXiv:2405.03470 [pdf, other]
Title: Motion Planning under Uncertainty: Integrating Learning-Based Multi-Modal Predictors into Branch Model Predictive Control
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

In complex traffic environments, autonomous vehicles face multi-modal uncertainty about other agents' future behavior. To address this, recent advancements in learningbased motion predictors output multi-modal predictions. We present our novel framework that leverages Branch Model Predictive Control(BMPC) to account for these predictions. The framework includes an online scenario-selection process guided by topology and collision risk criteria. This efficiently selects a minimal set of predictions, rendering the BMPC realtime capable. Additionally, we introduce an adaptive decision postponing strategy that delays the planner's commitment to a single scenario until the uncertainty is resolved. Our comprehensive evaluations in traffic intersection and random highway merging scenarios demonstrate enhanced comfort and safety through our method.

[578]  arXiv:2405.03473 [pdf, ps, other]
Title: A Minimum-Jerk Approach to Handle Singularities in Virtual Fixtures
Comments: 8 pages, 6 figures
Subjects: Robotics (cs.RO)

Implementing virtual fixtures in guiding tasks constrains the movement of the robot's end effector to specific curves within its workspace. However, incorporating guiding frameworks may encounter discontinuities when optimizing the reference target position to the nearest point relative to the current robot position. This article aims to give a geometric interpretation of such discontinuities, with specific reference to the commonly adopted Gauss-Newton algorithm. The effect of such discontinuities, defined as Euclidean Distance Singularities, is experimentally proved. We then propose a solution that is based on a Linear Quadratic Tracking problem with minimum jerk command, then compare and validate the performances of the proposed framework in two different human-robot interaction scenarios.

[579]  arXiv:2405.03474 [pdf, other]
Title: Fast Approximate Determinants Using Rational Functions
Comments: 22 pages, 17 figures
Subjects: Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)

We show how rational function approximations to the logarithm, such as $\log z \approx (z^2 - 1)/(z^2 + 6z + 1)$, can be turned into fast algorithms for approximating the determinant of a very large matrix. We empirically demonstrate that when combined with a good preconditioner, the third order rational function approximation offers a very good trade-off between speed and accuracy when measured on matrices coming from Mat\'ern-$5/2$ and radial basis function Gaussian process kernels. In particular, it is significantly more accurate on those matrices than the state-of-the-art stochastic Lanczos quadrature method for approximating determinants while running at about the same speed.

[580]  arXiv:2405.03476 [pdf, other]
Title: DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks
Subjects: Robotics (cs.RO)

Effective execution of long-horizon tasks with dexterous robotic hands remains a significant challenge in real-world problems. While learning from human demonstrations have shown encouraging results, they require extensive data collection for training. Hence, decomposing long-horizon tasks into reusable primitive skills is a more efficient approach. To achieve so, we developed DexSkills, a novel supervised learning framework that addresses long-horizon dexterous manipulation tasks using primitive skills. DexSkills is trained to recognize and replicate a select set of skills using human demonstration data, which can then segment a demonstrated long-horizon dexterous manipulation task into a sequence of primitive skills to achieve one-shot execution by the robot directly. Significantly, DexSkills operates solely on proprioceptive and tactile data, i.e., haptic data. Our real-world robotic experiments show that DexSkills can accurately segment skills, thereby enabling autonomous robot execution of a diverse range of tasks.

[581]  arXiv:2405.03478 [pdf, other]
Title: Synthetic Datasets for Program Similarity Research
Subjects: Cryptography and Security (cs.CR)

Program similarity has become an increasingly popular area of research with various security applications such as plagiarism detection, author identification, and malware analysis. However, program similarity research faces a few unique dataset quality problems in evaluating the effectiveness of novel approaches. First, few high-quality datasets for binary program similarity exist and are widely used in this domain. Second, there are potentially many different, disparate definitions of what makes one program similar to another and in many cases there is often a large semantic gap between the labels provided by a dataset and any useful notion of behavioral or semantic similarity. In this paper, we present HELIX - a framework for generating large, synthetic program similarity datasets. We also introduce Blind HELIX, a tool built on top of HELIX for extracting HELIX components from library code automatically using program slicing. We evaluate HELIX and Blind HELIX by comparing the performance of program similarity tools on a HELIX dataset to a hand-crafted dataset built from multiple, disparate notions of program similarity. Using Blind HELIX, we show that HELIX can generate realistic and useful datasets of virtually infinite size for program similarity research with ground truth labels that embody practical notions of program similarity. Finally, we discuss the results and reason about relative tool ranking.

[582]  arXiv:2405.03480 [pdf, other]
Title: Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search
Comments: Accepted at SIGIR 2024 (Full Paper)
Subjects: Information Retrieval (cs.IR)

The future of conversational agents will provide users with personalized information responses. However, a significant challenge in developing models is the lack of large-scale dialogue datasets that span multiple sessions and reflect real-world user preferences. Previous approaches rely on experts in a wizard-of-oz setup that is difficult to scale, particularly for personalized tasks. Our method, LAPS, addresses this by using large language models (LLMs) to guide a single human worker in generating personalized dialogues. This method has proven to speed up the creation process and improve quality. LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations, including extracting user preferences. When compared to existing datasets, LAPS-produced conversations are as natural and diverse as expert-created ones, which stays in contrast with fully synthetic methods. The collected dataset is suited to train preference extraction and personalized response generation. Our results show that responses generated explicitly using extracted preferences better match user's actual preferences, highlighting the value of using extracted preferences over simple dialogue history. Overall, LAPS introduces a new method to leverage LLMs to create realistic personalized conversational data more efficiently and effectively than previous methods.

[583]  arXiv:2405.03481 [pdf, other]
Title: AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers
Subjects: Machine Learning (cs.LG)

Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power. However, the quadratic complexity of self-attention mechanism in GTs has limited their scalability, and previous approaches to address this issue often suffer from expressiveness degradation or lack of versatility. To address this issue, we propose AnchorGT, a novel attention architecture for GTs with global receptive field and almost linear complexity, which serves as a flexible building block to improve the scalability of a wide range of GT models. Inspired by anchor-based GNNs, we employ structurally important $k$-dominating node set as anchors and design an attention mechanism that focuses on the relationship between individual nodes and anchors, while retaining the global receptive field for all nodes. With its intuitive design, AnchorGT can easily replace the attention module in various GT models with different network architectures and structural encodings, resulting in reduced computational overhead without sacrificing performance. In addition, we theoretically prove that AnchorGT attention can be strictly more expressive than Weisfeiler-Lehman test, showing its superiority in representing graph structures. Our experiments on three state-of-the-art GT models demonstrate that their AnchorGT variants can achieve better results while being faster and significantly more memory efficient.

[584]  arXiv:2405.03482 [pdf, ps, other]
Title: Managing Renewable Energy Resources Using Equity-Market Risk Tools - the Efficient Frontiers
Comments: 9 pages, 3 figures, 10 ref
Subjects: Systems and Control (eess.SY)

The energy market, and specifically the renewable sector carries volatility and risks, similar to the financial market. Here, we leverage on a well-established, return-risk approach, commonly used by equity portfolio-managers and apply it to energy resources. We visualize the relationship between the resources' costs and their risks in terms of efficient frontiers. We apply this analysis to publically available data for various US regions: Central, Eastern and Western coasts. Since risk management is contingent on costs, this approach sheds useful light in assessing dynamic pricing in modern electrical grids. By integrating geographical and temporal dimensions into our research, we aim at providing more nuanced and context-specific recommendations for energy resource allocation. This approach may help decision-makers in the renewable energy sector to make informed choices that account for regional variations, climatic conditions, and long-term performance trends.

[585]  arXiv:2405.03483 [pdf, ps, other]
Title: On certain matrix algebras related to quasi-Toeplitz matrices
Subjects: Numerical Analysis (math.NA)

Let $A_\alpha$ be the semi-infinite tridiagonal matrix having subdiagonal and superdiagonal unit entries, $(A_\alpha)_{11}=\alpha$, where $\alpha\in\mathbb C$, and zero elsewhere. A basis $\{P_0,P_1,P_2,\ldots\}$ of the linear space $\mathcal P_\alpha$ spanned by the powers of $A_\alpha$ is determined, where $P_0=I$, $P_n=T_n+H_n$, $T_n$ is the symmetric Toeplitz matrix having ones in the $n$th super- and sub-diagonal, zeros elsewhere, and $H_n$ is the Hankel matrix with first row $[\theta\alpha^{n-2}, \theta\alpha^{n-3}, \ldots, \theta, \alpha, 0, \ldots]$, where $\theta=\alpha^2-1$. The set $\mathcal P_\alpha$ is an algebra, and for $\alpha\in\{-1,0,1\}$, $H_n$ has only one nonzero anti-diagonal. This fact is exploited to provide a better representation of symmetric quasi-Toeplitz matrices $\mathcal {QT}_S$, where, instead of representing a generic matrix $A\in\mathcal{QT}_S$ as $A=T+K$, where $T$ is Toeplitz and $K$ is compact, it is represented as $A=P+H$, where $P\in\mathcal P_\alpha$ and $H$ is compact. It is shown experimentally that the matrix arithmetic obtained this way is much more effective than that implemented in the CQT-Toolbox of Numer.~Algo. 81(2):741--769, 2019.

[586]  arXiv:2405.03484 [pdf, other]
Title: Whispy: Adapting STT Whisper Models to Real-Time Environments
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Large general-purpose transformer models have recently become the mainstay in the realm of speech analysis. In particular, Whisper achieves state-of-the-art results in relevant tasks such as speech recognition, translation, language identification, and voice activity detection. However, Whisper models are not designed to be used in real-time conditions, and this limitation makes them unsuitable for a vast plethora of practical applications. In this paper, we introduce Whispy, a system intended to bring live capabilities to the Whisper pretrained models. As a result of a number of architectural optimisations, Whispy is able to consume live audio streams and generate high level, coherent voice transcriptions, while still maintaining a low computational cost. We evaluate the performance of our system on a large repository of publicly available speech datasets, investigating how the transcription mechanism introduced by Whispy impacts on the Whisper output. Experimental results show how Whispy excels in robustness, promptness, and accuracy.

[587]  arXiv:2405.03485 [pdf, other]
Title: LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
Comments: 9 pages,7 figures, SIGGRAPH 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific motions to the correct body parts. To address this issue, we propose a two-stage pipeline to overcome this challenge: it first employs large language models (LLMs) to decompose global motion descriptions into part-specific narratives, which are then processed by independent body-part motion encoders to ensure precise local semantic alignment. Finally, an attention-based full-body optimizer refines the motion generation results and guarantees the overall coherence. Our experiments demonstrate that LGTM gains significant improvements in generating locally accurate, semantically-aligned human motion, marking a notable advancement in text-to-motion applications. Code and data for this paper are available at https://github.com/L-Sun/LGTM

[588]  arXiv:2405.03486 [pdf, other]
Title: UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

Image safety classifiers play an important role in identifying and mitigating the spread of unsafe images online (e.g., images including violence, hateful rhetoric, etc.). At the same time, with the advent of text-to-image models and increasing concerns about the safety of AI models, developers are increasingly relying on image safety classifiers to safeguard their models. Yet, the performance of current image safety classifiers remains unknown for real-world and AI-generated images. To bridge this research gap, in this work, we propose UnsafeBench, a benchmarking framework that evaluates the effectiveness and robustness of image safety classifiers. First, we curate a large dataset of 10K real-world and AI-generated images that are annotated as safe or unsafe based on a set of 11 unsafe categories of images (sexual, violent, hateful, etc.). Then, we evaluate the effectiveness and robustness of five popular image safety classifiers, as well as three classifiers that are powered by general-purpose visual language models. Our assessment indicates that existing image safety classifiers are not comprehensive and effective enough in mitigating the multifaceted problem of unsafe images. Also, we find that classifiers trained only on real-world images tend to have degraded performance when applied to AI-generated images. Motivated by these findings, we design and implement a comprehensive image moderation tool called PerspectiveVision, which effectively identifies 11 categories of real-world and AI-generated unsafe images. The best PerspectiveVision model achieves an overall F1-Score of 0.810 on six evaluation datasets, which is comparable with closed-source and expensive state-of-the-art models like GPT-4V. UnsafeBench and PerspectiveVision can aid the research community in better understanding the landscape of image safety classification in the era of generative AI.

[589]  arXiv:2405.03488 [pdf, other]
Title: Accurate and Fast Approximate Graph Pattern Mining at Scale
Comments: 15 pages, 12 figures
Subjects: Performance (cs.PF); Data Structures and Algorithms (cs.DS)

Approximate graph pattern mining (A-GPM) is an important data analysis tool for many graph-based applications. There exist sampling-based A-GPM systems to provide automation and generalization over a wide variety of use cases. However, there are two major obstacles that prevent existing A-GPM systems being adopted in practice. First, the termination mechanism that decides when to end sampling lacks theoretical backup on confidence, and is unstable and slow in practice. Second, they suffer poor performance when dealing with the "needle-in-the-hay" cases, because a huge number of samples are required to converge, given the extremely low hit rate of their fixed sampling schemes. We build ScaleGPM, an accurate and fast A-GPM system that removes the two obstacles. First, we propose a novel on-the-fly convergence detection mechanism to achieve stable termination and provide theoretical guarantee on the confidence, with negligible overhead. Second, we propose two techniques to deal with the "needle-in-the-hay" problem, eager-verify and hybrid sampling. Our eager-verify method improves sampling hit rate by pruning unpromising candidates as early as possible. Hybrid sampling improves performance by automatically choosing the better scheme between fine-grained and coarse-grained sampling schemes. Experiments show that our online convergence detection mechanism can detect convergence and results in stable and rapid termination with theoretically guaranteed confidence. We show the effectiveness of eager-verify in improving the hit rate, and the scheme-selection mechanism in correctly choosing the better scheme for various cases. Overall, ScaleGPM achieves a geomean average of 565x (up to 610169x) speedup over the state-of-the-art A-GPM system, Arya. In particular, ScaleGPM handles billion-scale graphs in seconds, where existing systems either run out of memory or fail to complete in hours.

[590]  arXiv:2405.03489 [pdf, other]
Title: On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations
Comments: 15 pages, 2 figures
Subjects: Software Engineering (cs.SE)

Numerous DL-based approaches have garnered considerable attention in the field of software Log Anomaly Detection. However, a practical challenge persists: the class imbalance in the public data commonly used to train the DL models. This imbalance is characterized by a substantial disparity in the number of abnormal log sequences compared to normal ones, for example, anomalies represent less than 1% of one of the most popular datasets. Previous research has indicated that existing DLLAD approaches may exhibit unsatisfactory performance, particularly when confronted with datasets featuring severe class imbalances. Mitigating class imbalance through data resampling has proven effective for other software engineering tasks, however, it has been unexplored for LAD thus far. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across three datasets and explore the impact of resampling ratios of normal to abnormal data on ten data resampling methods. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. In most cases, certain undersampling and hybrid methods show limited effectiveness. Additionally, by exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD.

[591]  arXiv:2405.03491 [pdf, other]
Title: Jointly Learning Cost and Constraints from Demonstrations for Safe Trajectory Generation
Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Subjects: Robotics (cs.RO)

Learning from Demonstration allows robots to mimic human actions. However, these methods do not model constraints crucial to ensure safety of the learned skill. Moreover, even when explicitly modelling constraints, they rely on the assumption of a known cost function, which limits their practical usability for task with unknown cost. In this work we propose a two-step optimization process that allow to estimate cost and constraints by decoupling the learning of cost functions from the identification of unknown constraints within the demonstrated trajectories. Initially, we identify the cost function by isolating the effect of constraints on parts of the demonstrations. Subsequently, a constraint leaning method is used to identify the unknown constraints. Our approach is validated both on simulated trajectories and a real robotic manipulation task. Our experiments show the impact that incorrect cost estimation has on the learned constraints and illustrate how the proposed method is able to infer unknown constraints, such as obstacles, from demonstrated trajectories without any initial knowledge of the cost.

[592]  arXiv:2405.03500 [pdf, other]
Title: A Rate-Distortion-Classification Approach for Lossy Image Compression
Authors: Yuefeng Zhang
Comments: 15 pages
Journal-ref: Digital Signal Processing Volume 141, September 2023, 104163
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

In lossy image compression, the objective is to achieve minimal signal distortion while compressing images to a specified bit rate. The increasing demand for visual analysis applications, particularly in classification tasks, has emphasized the significance of considering semantic distortion in compressed images. To bridge the gap between image compression and visual analysis, we propose a Rate-Distortion-Classification (RDC) model for lossy image compression, offering a unified framework to optimize the trade-off between rate, distortion, and classification accuracy. The RDC model is extensively analyzed both statistically on a multi-distribution source and experimentally on the widely used MNIST dataset. The findings reveal that the RDC model exhibits desirable properties, including monotonic non-increasing and convex functions, under certain conditions. This work provides insights into the development of human-machine friendly compression methods and Video Coding for Machine (VCM) approaches, paving the way for end-to-end image compression techniques in real-world applications.

[593]  arXiv:2405.03501 [pdf, other]
Title: Boosting Single Positive Multi-label Classification with Generalized Robust Loss
Comments: 14 pages, 5 figures, 6 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and robust losses, mostly leading to unacceptable false negatives. To address this issue, we first propose a generalized loss framework based on expected risk minimization to provide soft pseudo labels, and point out that the former losses can be seamlessly converted into our framework. In particular, we design a novel robust loss based on our framework, which enjoys flexible coordination between false positives and false negatives, and can additionally deal with the imbalance between positive and negative samples. Extensive experiments show that our approach can significantly improve SPML performance and outperform the vast majority of state-of-the-art methods on all the four benchmarks.

[594]  arXiv:2405.03502 [pdf, ps, other]
Title: Human-Variability-Respecting Optimal Control for Physical Human-Machine Interaction
Subjects: Systems and Control (eess.SY)

Physical Human-Machine Interaction plays a pivotal role in facilitating collaboration across various domains. When designing appropriate model-based controllers to assist a human in the interaction, the accuracy of the human model is crucial for the resulting overall behavior of the coupled system. When looking at state-of-the-art control approaches, most methods rely on a deterministic model or no model at all of the human behavior. This poses a gap to the current neuroscientific standard regarding human movement modeling, which uses stochastic optimal control models that include signal-dependent noise processes and therefore describe the human behavior much more accurate than the deterministic counterparts. To close this gap by including these stochastic human models in the control design, we introduce a novel design methodology resulting in a Human-Variability-Respecting Optimal Control that explicitly incorporates the human noise processes and their influence on the mean and variability behavior of a physically coupled human-machine system. Our approach results in an improved overall system performance, i.e. higher accuracy and lower variability in target point reaching, while allowing to shape the joint variability, for example to preserve human natural variability patterns.

[595]  arXiv:2405.03506 [pdf, other]
Title: Spin-Wave Voices: Sonification of Nanoscale Spin Waves as an Engagement and Research Tool
Comments: Accepted to ICAD2024 conference proceedings
Subjects: Human-Computer Interaction (cs.HC)

Magnonics is an emerging research field that addresses the use of spin waves (magnons), purely magnetic waves, for information transport and processing. Spin waves are a potential replacement for electric current in modern computational devices that would make them more compact and energy efficient. The field is yet little known, even among physicists. Additionally, with the development of new measuring techniques and computational physics, the obtained magnetic data becomes more complex, in some cases including 3D vector fields and time-resolution. This work presents an approach to the audio-visual representation of the spin waves and discusses its use as a tool for science communication exhibits and possible data analysis tool. The work also details an instance of such an exhibit presented at the annual international digital art exhibition Ars Electronica Festival in 2022.

[596]  arXiv:2405.03509 [pdf, other]
Title: Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning
Subjects: Software Engineering (cs.SE)

Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Specifically, Code2API guides the LLMs through well-designed prompts to generate well-formed APIs for given code snippets. To elicit knowledge and logical reasoning from LLMs, we used chain-of-thought (CoT) reasoning and few-shot in-context learning, which can help the LLMs fully understand the APIzation task and solve it step by step in a manner similar to a developer. Our evaluations show that Code2API achieves a remarkable accuracy in identifying method parameters (65%) and return statements (66%) equivalent to human-generated ones, surpassing the current state-of-the-art approach, APIzator, by 15.0% and 16.5% respectively. Moreover, compared with APIzator, our user study demonstrates that Code2API exhibits superior performance in generating meaningful method names, even surpassing the human-level performance, and developers are more willing to use APIs generated by our approach, highlighting the applicability of our tool in practice. Finally, we successfully extend our framework to the Python dataset, achieving a comparable performance with Java, which verifies the generalizability of our tool.

[597]  arXiv:2405.03511 [pdf, ps, other]
Title: Extremal Separation Problems for Temporal Instance Queries
Comments: Accepted for IJCAI 2024
Subjects: Databases (cs.DB); Logic in Computer Science (cs.LO)

The separation problem for a class Q of database queries is to find a query in Q that distinguishes between a given set of `positive' and `negative' data examples. Separation provides explanations of examples and underpins the query-by-example paradigm to support database users in constructing and refining queries. As the space of all separating queries can be large, it is helpful to succinctly represent this space by means of its most specific (logically strongest) and general (weakest) members. We investigate this extremal separation problem for classes of instance queries formulated in linear temporal logic LTL with the operators conjunction, next, and eventually. Our results range from tight complexity bounds for verifying and counting extremal separators to algorithms computing them.

[598]  arXiv:2405.03513 [pdf, other]
Title: QBER: Quantifying Cyber Risks for Strategic Decisions
Comments: 10 pages, 9 equations, 3 tables, 2 figures
Subjects: Cryptography and Security (cs.CR); Computational Engineering, Finance, and Science (cs.CE)

Quantifying cyber risks is essential for organizations to grasp their vulnerability to threats and make informed decisions. However, current approaches still need to work on blending economic viewpoints to provide insightful analysis. To bridge this gap, we introduce QBER approach to offer decision-makers measurable risk metrics. The QBER evaluates losses from cyberattacks, performs detailed risk analyses based on existing cybersecurity measures, and provides thorough cost assessments. Our contributions involve outlining cyberattack probabilities and risks, identifying Technical, Economic, and Legal (TEL) impacts, creating a model to gauge impacts, suggesting risk mitigation strategies, and examining trends and challenges in implementing widespread Cyber Risk Quantification (CRQ). The QBER approach serves as a guided approach for organizations to assess risks and strategically invest in cybersecurity.

[599]  arXiv:2405.03514 [pdf, other]
Title: Development of Ultra-Portable 3D Mapping Systems for Emergency Services
Comments: 6 pages, 20 figures, accepted to the IEEE ICRA Workshop on Field Robotics 2024
Subjects: Robotics (cs.RO)

Miniaturization of cameras and LiDAR sensors has enabled the development of wearable 3D mapping systems for emergency responders. These systems have the potential to revolutionize response capabilities by providing real-time, high-fidelity maps of dynamic and hazardous environments. We present our recent efforts towards the development of such ultra-portable 3D mapping systems. We review four different sensor configurations, either helmet-mounted or body-worn, with two different mapping algorithms that were implemented and evaluated during field trials. The paper discusses the experimental results with the aim to stimulate further discussion within the portable 3D mapping research community.

[600]  arXiv:2405.03516 [pdf, other]
Title: GI-SMN: Gradient Inversion Attack against Federated Learning without Prior Knowledge
Comments: 18 pages, 10 figures, conference
Subjects: Machine Learning (cs.LG)

Federated learning (FL) has emerged as a privacy-preserving machine learning approach where multiple parties share gradient information rather than original user data. Recent work has demonstrated that gradient inversion attacks can exploit the gradients of FL to recreate the original user data, posing significant privacy risks. However, these attacks make strong assumptions about the attacker, such as altering the model structure or parameters, gaining batch normalization statistics, or acquiring prior knowledge of the original training set, etc. Consequently, these attacks are not possible in real-world scenarios. To end it, we propose a novel Gradient Inversion attack based on Style Migration Network (GI-SMN), which breaks through the strong assumptions made by previous gradient inversion attacks. The optimization space is reduced by the refinement of the latent code and the use of regular terms to facilitate gradient matching. GI-SMN enables the reconstruction of user data with high similarity in batches. Experimental results have demonstrated that GI-SMN outperforms state-of-the-art gradient inversion attacks in both visual effect and similarity metrics. Additionally, it also can overcome gradient pruning and differential privacy defenses.

[601]  arXiv:2405.03518 [pdf, other]
Title: Reinforcement Nash Equilibrium Solver
Comments: IJCAI 2024
Subjects: Computer Science and Game Theory (cs.GT)

Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as "inexact solvers", or "solvers" for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $\alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $\alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.

[602]  arXiv:2405.03519 [pdf, other]
Title: Low-light Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this competition we employed a model fusion approach to achieve object detection results close to those of real images. Our method is based on the CO-DETR model, which was trained on two sets of data: one containing images under dark conditions and another containing images enhanced with low-light conditions. We used various enhancement techniques on the test data to generate multiple sets of prediction results. Finally, we applied a clustering aggregation method guided by IoU thresholds to select the optimal results.

[603]  arXiv:2405.03520 [pdf, other]
Title: Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Comments: This survey will be regularly updated at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical laws. In this survey, we embark on a comprehensive exploration of the latest advancements in world models. Our analysis navigates through the forefront of generative methodologies in video generation, where world models stand as pivotal constructs facilitating the synthesis of highly realistic visual content. Additionally, we scrutinize the burgeoning field of autonomous-driving world models, meticulously delineating their indispensable role in reshaping transportation and urban mobility. Furthermore, we delve into the intricacies inherent in world models deployed within autonomous agents, shedding light on their profound significance in enabling intelligent interactions within dynamic environmental contexts. At last, we examine challenges and limitations of world models, and discuss their potential future directions. We hope this survey can serve as a foundational reference for the research community and inspire continued innovation. This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey.

[604]  arXiv:2405.03523 [pdf, other]
Title: Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC
Comments: 2 pages, 1 figure, accepted as a poster at the RISC-V Summit Europe 2024
Subjects: Hardware Architecture (cs.AR)

We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling a higher core utilization. The tapeout-ready version of Basilisk implemented in IHP's open 130 nm technology achieves an operation frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA design flow presented in Iguana, and a higher 55 % core utilization compared to 50 % in the baseline design. Through collaboration with EDA tool developers and domain experts, Basilisk exemplifies a synergistic effort towards competitive open-source electronic design automation (EDA) tools for research and industry applications.

[605]  arXiv:2405.03524 [pdf, other]
Title: Exploring knowledge graph-based neural-symbolic system from application perspective
Authors: Shenzhe Zhu
Subjects: Artificial Intelligence (cs.AI)

The rapid advancement in artificial intelligence (AI), particularly through deep neural networks, has catalyzed significant progress in fields such as vision and text processing. Nonetheless, the pursuit of AI systems that exhibit human-like reasoning and interpretability continues to pose a substantial challenge. The Neural-Symbolic paradigm, which integrates the deep learning prowess of neural networks with the reasoning capabilities of symbolic systems, presents a promising pathway toward developing more transparent and comprehensible AI systems. Within this paradigm, the Knowledge Graph (KG) emerges as a crucial element, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, predominantly utilizing the triple (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, elucidating how KG underpins this integration across three key categories: enhancing the reasoning and interpretability of neural networks through the incorporation of symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes directions for future research in the domain of Neural-Symbolic AI.

[606]  arXiv:2405.03526 [pdf, other]
Title: ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

In this paper, a reinforcement-learning-based scheduling framework is proposed and implemented to optimize the application-layer quality-of-service (QoS) of a practical wireless local area network (WLAN) suffering from unknown interference. Particularly, application-layer tasks of file delivery and delay-sensitive communication, e.g., screen projection, in a WLAN with enhanced distributed channel access (EDCA) mechanism, are jointly scheduled by adjusting the contention window sizes and application-layer throughput limitation, such that their QoS, including the throughput of file delivery and the round trip time of the delay-sensitive communication, can be optimized. Due to the unknown interference and vendor-dependent implementation of the network interface card, the relation between the scheduling policy and the system QoS is unknown. Hence, a reinforcement learning method is proposed, in which a novel Q-network is trained to map from the historical scheduling parameters and QoS observations to the current scheduling action. It is demonstrated on a testbed that the proposed framework can achieve a significantly better QoS than the conventional EDCA mechanism.

[607]  arXiv:2405.03529 [pdf, ps, other]
Title: Quasi-Monte Carlo for Bayesian design of experiment problems governed by parametric PDEs
Comments: 43 pages, 3 figures
Subjects: Numerical Analysis (math.NA)

This paper contributes to the study of optimal experimental design for Bayesian inverse problems governed by partial differential equations (PDEs). We derive estimates for the parametric regularity of multivariate double integration problems over high-dimensional parameter and data domains arising in Bayesian optimal design problems. We provide a detailed analysis for these double integration problems using two approaches: a full tensor product and a sparse tensor product combination of quasi-Monte Carlo (QMC) cubature rules over the parameter and data domains. Specifically, we show that the latter approach significantly improves the convergence rate, exhibiting performance comparable to that of QMC integration of a single high-dimensional integral. Furthermore, we numerically verify the predicted convergence rates for an elliptic PDE problem with an unknown diffusion coefficient in two spatial dimensions, offering empirical evidence supporting the theoretical results and highlighting practical applicability.

[608]  arXiv:2405.03530 [pdf, ps, other]
Title: Semi-autonomous Robotic Disassembly Enhanced by Mixed Reality
Subjects: Robotics (cs.RO)

In this study, we introduce "SARDiM," a modular semi-autonomous platform enhanced with mixed reality for industrial disassembly tasks. Through a case study focused on EV battery disassembly, SARDiM integrates Mixed Reality, object segmentation, teleoperation, force feedback, and variable autonomy. Utilising the ROS, Unity, and MATLAB platforms, alongside a joint impedance controller, SARDiM facilitates teleoperated disassembly. The approach combines FastSAM for real-time object segmentation, generating data which is subsequently processed through a cluster analysis algorithm to determine the centroid and orientation of the components, categorizing them by size and disassembly priority. This data guides the MoveIt platform in trajectory planning for the Franka Robot arm. SARDiM provides the capability to switch between two teleoperation modes: manual and semi-autonomous with variable autonomy. Each was evaluated using four different Interface Methods (IM): direct view, monitor feed, mixed reality with monitor feed, and point cloud mixed reality. Evaluations across the eight IMs demonstrated a 40.61% decrease in joint limit violations using Mode 2. Moreover, Mode 2-IM4 outperformed Mode 1-IM1 by achieving a 2.33%-time reduction while considerably increasing safety, making it optimal for operating in hazardous environments at a safe distance, with the same ease of use as teleoperation with a direct view of the environment.

[609]  arXiv:2405.03534 [pdf, other]
Title: Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer
Comments: ICLR 2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named $Meta$-$Evolve$ that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be shared, so our approach can significantly outperform naive one-to-one policy transfer. We present a heuristic approach to determine an optimized robot evolution tree. Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2$\times$ and one-to-six transfer of agile locomotion policy by 2.4$\times$ in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers.

[610]  arXiv:2405.03535 [pdf, ps, other]
Title: Asymptotic-preserving hybridizable discontinuous Galerkin method for the Westervelt quasilinear wave equation
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

We discuss the asymptotic-preserving properties of a hybridizable discontinuous Galerkin method for the Westervelt model of ultrasound waves. More precisely, we show that the proposed method is robust with respect to small values of the sound diffusivity damping parameter~$\delta$ by deriving low- and high-order energy stability estimates, and \emph{a priori} error bounds that are independent of~$\delta$. Such bounds are then used to show that, when~$\delta \rightarrow 0^+$, the method remains stable and the discrete acoustic velocity potential~$\psi_h^{(\delta)}$ converges to~$\psi_h^{(0)}$, where the latter is the singular vanishing dissipation limit. Moreover, we prove optimal convergence for the approximation of the acoustic particle velocity variable~$\bv = \nabla \psi$. The established theoretical results are illustrated with some numerical experiments.

[611]  arXiv:2405.03536 [pdf, other]
Title: Extensional and Non-extensional Functions as Processes
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

Following Milner's seminal paper, the representation of functions as processes has received considerable attention. For pure $\lambda$-calculus, the process representations yield (at best) non-extensional $\lambda $-theories (i.e., $\beta$ rule holds, whereas $\eta$ does not).
In the paper, we study how to obtain extensional representations, and how to move between extensional and non-extensional representations. Using Internal $\pi$, $\mathrm{I}\pi$ (a subset of the $\pi$-calculus in which all outputs are bound), we develop a refinement of Milner's original encoding of functions as processes that is parametric on certain abstract components called wires. These are, intuitively, processes whose task is to connect two end-point channels. We show that when a few algebraic properties of wires hold, the encoding yields a $\lambda$-theory. Exploiting the symmetries and dualities of $\mathrm{I}\pi$, we isolate three main classes of wires. The first two have a sequential behaviour and are dual of each other; the third has a parallel behaviour and is the dual of itself. We show the adoption of the parallel wires yields an extensional $\lambda$-theory; in fact, it yields an equality that coincides with that of B\"ohm trees with infinite $\eta$. In contrast, the other two classes of wires yield non-extensional $\lambda$-theories whose equalities are those of the L\'evy-Longo and B\"ohm trees.

[612]  arXiv:2405.03537 [pdf, other]
Title: Exploring the Efficacy of Federated-Continual Learning Nodes with Attention-Based Classifier for Robust Web Phishing Detection: An Empirical Investigation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Web phishing poses a dynamic threat, requiring detection systems to quickly adapt to the latest tactics. Traditional approaches of accumulating data and periodically retraining models are outpaced. We propose a novel paradigm combining federated learning and continual learning, enabling distributed nodes to continually update models on streams of new phishing data, without accumulating data. These locally adapted models are then aggregated at a central server via federated learning. To enhance detection, we introduce a custom attention-based classifier model with residual connections, tailored for web phishing, leveraging attention mechanisms to capture intricate phishing patterns. We evaluate our hybrid learning paradigm across continual learning strategies (cumulative, replay, MIR, LwF) and model architectures through an empirical investigation. Our main contributions are: (1) a new hybrid federated-continual learning paradigm for robust web phishing detection, and (2) a novel attention + residual connections based model explicitly designed for this task, attaining 0.93 accuracy, 0.90 precision, 0.96 recall and 0.93 f1-score with the LwF strategy, outperforming traditional approaches in detecting emerging phishing threats while retaining past knowledge.

[613]  arXiv:2405.03541 [pdf, other]
Title: RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Object detection algorithms particularly those based on YOLO have demonstrated remarkable efficiency in balancing speed and accuracy. However, their application in brain tumour detection remains underexplored. This study proposes RepVGG-GELAN, a novel YOLO architecture enhanced with RepVGG, a reparameterized convolutional approach for object detection tasks particularly focusing on brain tumour detection within medical images. RepVGG-GELAN leverages the RepVGG architecture to improve both speed and accuracy in detecting brain tumours. Integrating RepVGG into the YOLO framework aims to achieve a balance between computational efficiency and detection performance. This study includes a spatial pyramid pooling-based Generalized Efficient Layer Aggregation Network (GELAN) architecture which further enhances the capability of RepVGG. Experimental evaluation conducted on a brain tumour dataset demonstrates the effectiveness of RepVGG-GELAN surpassing existing RCS-YOLO in terms of precision and speed. Specifically, RepVGG-GELAN achieves an increased precision of 4.91% and an increased AP50 of 2.54% over the latest existing approach while operating at 240.7 GFLOPs. The proposed RepVGG-GELAN with GELAN architecture presents promising results establishing itself as a state-of-the-art solution for accurate and efficient brain tumour detection in medical images. The implementation code is publicly available at https://github.com/ThensiB/RepVGG-GELAN.

[614]  arXiv:2405.03544 [pdf, ps, other]
Title: A Formal Model of Security Controls' Capabilities and Its Applications to Policy Refinement and Incident Management
Subjects: Cryptography and Security (cs.CR)

Enforcing security requirements in networked information systems relies on security controls to mitigate the risks from increasingly dangerous threats. Configuring security controls is challenging; even nowadays, administrators must perform it without adequate tool support. Hence, this process is plagued by errors that translate to insecure postures, security incidents, and a lack of promptness in answering threats. This paper presents the Security Capability Model (SCM), a formal model that abstracts the features that security controls offer for enforcing security policies, which includes an Information Model that depicts the basic concepts related to rules (i.e., conditions, actions, events) and policies (i.e., conditions' evaluation, resolution strategies, default actions), and a Data Model that covers the capabilities needed to describe different types of filtering and channel protection controls. Following state-of-the-art design patterns, the model allows for generating abstract versions of the security controls' languages and a model-driven approach for translating abstract policies into device-specific configuration settings. By validating its effectiveness in real-world scenarios, we show that SCM enables the automation of different and complex security tasks, i.e., accurate and granular security control comparison, policy refinement, and incident response. Lastly, we present opportunities for extensions and integration with other frameworks and models.

[615]  arXiv:2405.03545 [pdf, other]
Title: Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors
Authors: Amit Moryossef
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses a critical flaw in MediaPipe Holistic's hand Region of Interest (ROI) prediction, which struggles with non-ideal hand orientations, affecting sign language recognition accuracy. We propose a data-driven approach to enhance ROI estimation, leveraging an enriched feature set including additional hand keypoints and the z-dimension. Our results demonstrate better estimates, with higher Intersection-over-Union compared to the current method. Our code and optimizations are available at https://github.com/sign-language-processing/mediapipe-hand-crop-fix.

[616]  arXiv:2405.03546 [pdf, other]
Title: CCDM: Continuous Conditional Diffusion Models for Image Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Continuous Conditional Generative Modeling (CCGM) aims to estimate the distribution of high-dimensional data, typically images, conditioned on scalar continuous variables known as regression labels. While Continuous conditional Generative Adversarial Networks (CcGANs) were initially designed for this task, their adversarial training mechanism remains vulnerable to extremely sparse or imbalanced data, resulting in suboptimal outcomes. To enhance the quality of generated images, a promising alternative is to replace CcGANs with Conditional Diffusion Models (CDMs), renowned for their stable training process and ability to produce more realistic images. However, existing CDMs encounter challenges when applied to CCGM tasks due to several limitations such as inadequate U-Net architectures and deficient model fitting mechanisms for handling regression labels. In this paper, we introduce Continuous Conditional Diffusion Models (CCDMs), the first CDM designed specifically for the CCGM task. CCDMs address the limitations of existing CDMs by introducing specially designed conditional diffusion processes, a modified denoising U-Net with a custom-made conditioning mechanism, a novel hard vicinal loss for model fitting, and an efficient conditional sampling procedure. With comprehensive experiments on four datasets with varying resolutions ranging from 64x64 to 192x192, we demonstrate the superiority of the proposed CCDM over state-of-the-art CCGM models, establishing new benchmarks in CCGM. Extensive ablation studies validate the model design and implementation configuration of the proposed CCDM. Our code is publicly available at https://github.com/UBCDingXin/CCDM.

[617]  arXiv:2405.03547 [pdf, other]
Title: Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions
Comments: Accepted to International Conference on Machine Learning (ICML) 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer vision. Their incorporation has been rapid and transformative, marking a significant paradigm shift in the field of machine learning research.
However, the field of experimental design, grounded on black-box optimization, has been much less affected by such a paradigm shift, even though integrating LLMs with optimization presents a unique landscape ripe for exploration. In this position paper, we frame the field of black-box optimization around sequence-based foundation models and organize their relationship with previous literature. We discuss the most promising ways foundational language models can revolutionize optimization, which include harnessing the vast wealth of information encapsulated in free-form text to enrich task comprehension, utilizing highly flexible sequence models such as Transformers to engineer superior optimization strategies, and enhancing performance prediction over previously unseen search spaces.

[618]  arXiv:2405.03548 [pdf, other]
Title: MAmmoTH2: Scaling Instructions from the Web
Comments: Work in Progress
Subjects: Computation and Language (cs.CL)

Instruction tuning improves the reasoning abilities of large language models (LLMs), with data quality and scalability being the crucial factors. Most instruction tuning data come from human crowd-sourcing or GPT-4 distillation. We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. Our approach involves (1) recalling relevant documents, (2) extracting instruction-response pairs, and (3) refining the extracted pairs using open-source LLMs. Fine-tuning base LLMs on this dataset, we build MAmmoTH2 models, which significantly boost performance on reasoning benchmarks. Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 34% on MATH and from 36% to 67% on GSM8K without training on any in-domain data. Further training MAmmoTH2 on public instruction tuning datasets yields MAmmoTH2-Plus, achieving state-of-the-art performance on several reasoning and chatbot benchmarks. Our work demonstrates how to harvest large-scale, high-quality instruction data without costly human annotation or GPT-4 distillation, providing a new paradigm for building better instruction tuning data.

[619]  arXiv:2405.03553 [pdf, other]
Title: AlphaMath Almost Zero: process Supervision without process
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these models still struggle with complex problems that require multiple reasoning steps, frequently leading to logical or numerical errors. While numerical mistakes can largely be addressed by integrating a code interpreter, identifying logical errors within intermediate steps is more challenging. Moreover, manually annotating these steps for training is not only expensive but also demands specialized expertise. In this study, we introduce an innovative approach that eliminates the need for manual annotation by leveraging the Monte Carlo Tree Search (MCTS) framework to generate both the process supervision and evaluation signals automatically. Essentially, when a LLM is well pre-trained, only the mathematical questions and their final answers are required to generate our training data, without requiring the solutions. We proceed to train a step-level value model designed to improve the LLM's inference process in mathematical domains. Our experiments indicate that using automatically generated solutions by LLMs enhanced with MCTS significantly improves the model's proficiency in dealing with intricate mathematical reasoning tasks.

[620]  arXiv:2405.03555 [pdf, other]
Title: A Comprehensive Overview and Survey of O-RAN: Exploring Slicing-aware Architecture, Deployment Options, and Use Cases
Comments: 45 pages, 12 figures, 4 tables, and 190 references
Subjects: Networking and Internet Architecture (cs.NI)

Open-radio access network (O-RAN) seeks to establish principles of openness, programmability, automation, intelligence, and hardware-software disaggregation with interoperable interfaces. It advocates for multi-vendorism and multi-stakeholderism within a cloudified and virtualized wireless infrastructure, aimed at enhancing the deployment, operation, and maintenance of RAN architecture. This enhancement promises increased flexibility, performance optimization, service innovation, energy efficiency, and cost efficiency in fifth-generation (5G), sixth-generation (6G), and future networks. One of the key features of the O-RAN architecture is its support for network slicing, which entails interaction with other slicing domains within a mobile network, notably the transport network (TN) domain and the core network (CN) domain, to realize end-to-end (E2E) network slicing. The study of this feature requires exploring the stances and contributions of diverse standards development organizations (SDOs). In this context, we note that despite the ongoing industrial deployments and standardization efforts, the research and standardization communities have yet to comprehensively address network slicing in O-RAN. To address this gap, this survey paper provides a comprehensive exploration of network slicing in O-RAN through an in-depth review of specification documents from O-RAN Alliance and research papers from leading industry and academic institutions. The paper commences with an overview of the ongoing standardization efforts and open-source contributions associated with O-RAN, subsequently delving into the latest O-RAN architecture with an emphasis on its slicing aspects. Further, the paper explores deployment scenarios for network slicing within O-RAN, examining options for the deployment and orchestration of O-RAN and TN network slice subnets...

[621]  arXiv:2405.03561 [pdf, other]
Title: Model- and Data-Based Control of Self-Balancing Robots: Practical Educational Approach with LabVIEW and Arduino
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

A two-wheeled self-balancing robot (TWSBR) is non-linear and unstable system. This study compares the performance of model-based and data-based control strategies for TWSBRs, with an explicit practical educational approach. Model-based control (MBC) algorithms such as Lead-Lag and PID control require a proficient dynamic modeling and mathematical manipulation to drive the linearized equations of motions and develop the appropriate controller. On the other side, data-based control (DBC) methods, like fuzzy control, provide a simpler and quicker approach to designing effective controllers without needing in-depth understanding of the system model. In this paper, the advantages and disadvantages of both MBC and DBC using a TWSBR are illustrated. All controllers were implemented and tested on the OSOYOO self-balancing kit, including an Arduino microcontroller, MPU-6050 sensor, and DC motors. The control law and the user interface are constructed using the LabVIEW-LINX toolkit. A real-time hardware-in-loop experiment validates the results, highlighting controllers that can be implemented on a cost-effective platform.

[622]  arXiv:2405.03562 [pdf, other]
Title: ID-centric Pre-training for Recommendation
Subjects: Information Retrieval (cs.IR)

Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered universal across domains via PLM. Unfortunately, the behavioral information in ID embeddings is still verified to be dominating in PLM-based recommendation models compared to modality information and thus limits these models' performance. In this work, we propose a novel ID-centric recommendation pre-training paradigm (IDP), which directly transfers informative ID embeddings learned in pre-training domains to item representations in new domains. Specifically, in pre-training stage, besides the ID-based sequential model for recommendation, we also build a Cross-domain ID-matcher (CDIM) learned by both behavioral and modality information. In the tuning stage, modality information of new domain items is regarded as a cross-domain bridge built by CDIM. We first leverage the textual information of downstream domain items to retrieve behaviorally and semantically similar items from pre-training domains using CDIM. Next, these retrieved pre-trained ID embeddings, rather than certain textual embeddings, are directly adopted to generate downstream new items' embeddings. Through extensive experiments on real-world datasets, both in cold and warm settings, we demonstrate that our proposed model significantly outperforms all baselines. Codes will be released upon acceptance.

[623]  arXiv:2405.03565 [pdf, other]
Title: Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing
Comments: Accepted to AAAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes to unseen classes both difficult and inefficient. (2) Rare labeled novel samples usually cannot provide enough supervision signals to enable the model to adjust from the source distribution to the target distribution, especially for complicated scenarios. To alleviate the above issues, we propose a simple and effective strategy for few-shot and zero-shot text classification. We aim to liberate the model from the confines of seen classes, thereby enabling it to predict unseen categories without the necessity of training on seen classes. Specifically, for mining more related unseen category knowledge, we utilize a large pre-trained language model to generate pseudo novel samples, and select the most representative ones as category anchors. After that, we convert the multi-class classification task into a binary classification task and use the similarities of query-anchor pairs for prediction to fully leverage the limited supervision signals. Extensive experiments on six widely used public datasets show that our proposed method can outperform other strong baselines significantly in few-shot and zero-shot tasks, even without using any seen class samples.

[624]  arXiv:2405.03567 [pdf, other]
Title: Deep Space Separable Distillation for Lightweight Acoustic Scene Classification
Authors: ShuQi Ye, Yuan Tian
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.

[625]  arXiv:2405.03568 [pdf, other]
Title: Majority consensus thresholds in competitive Lotka--Volterra populations
Comments: 32 pages, to appear in PODC 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

One of the key challenges in synthetic biology is devising robust signaling primitives for engineered microbial consortia. In such systems, a fundamental signal amplification problem is the majority consensus problem: given a system with two input species with initial difference of $\Delta$ in population sizes, what is the probability that the system reaches a state in which only the initial majority species is present?
In this work, we consider a discrete and stochastic version of competitive Lotka--Volterra dynamics, a standard model of microbial community dynamics. We identify new threshold properties for majority consensus under different types of interference competition:
- We show that under so-called self-destructive interference competition between the two input species, majority consensus can be reached with high probability if the initial difference satisfies $\Delta \in \Omega(\log^2 n)$, where $n$ is the initial population size. This gives an exponential improvement compared to the previously known bound of $\Omega(\sqrt{n \log n})$ by Cho et al. [Distributed Computing, 2021] given for a special case of the competitive Lotka--Volterra model. In contrast, we show that an initial gap of $\Delta \in \Omega(\sqrt{\log n})$ is necessary.
- On the other hand, we prove that under non-self-destructive interference competition, an initial gap of $\Omega(\sqrt{n})$ is necessary to succeed with high probability and that a $\Omega(\sqrt{n \log n})$ gap is sufficient.
This shows a strong qualitative gap between the performance of self-destructive and non-self-destructive interference competition. Moreover, we show that if in addition the populations exhibit interference competition between the individuals of the same species, then majority consensus cannot always be solved with high probability, no matter what the difference in the initial population counts.

[626]  arXiv:2405.03572 [pdf, other]
Title: RoboCar: A Rapidly Deployable Open-Source Platform for Autonomous Driving Research
Subjects: Robotics (cs.RO)

This paper introduces RoboCar, an open-source research platform for autonomous driving developed at the University of Luxembourg. RoboCar provides a modular, cost-effective framework for the development of experimental Autonomous Driving Systems (ADS), utilizing the 2018 KIA Soul EV. The platform integrates a robust hardware and software architecture that aligns with the vehicle's existing systems, minimizing the need for extensive modifications. It supports various autonomous driving functions and has undergone real-world testing on public roads in Luxembourg City. This paper outlines the platform's architecture, integration challenges, and initial test results, offering insights into its application in advancing autonomous driving research. RoboCar is available to anyone at https://github.com/sntubix/robocar and is released under an open-source MIT license.

[627]  arXiv:2405.03573 [pdf, other]
Title: Demystifying Anonymity: Uncovering the Structure Underlying "Read-Write Wait-Free Covering"
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The study of particular synchronization problems in anonymous shared-memory models -- be it processor anonymity, memory anonymity, or full anonymity -- has produced ad hoc, so-called covering arguments in which processors overwrite each other's writes. Those arguments give us proverbial fish, but they do not teach us how to fish. In this paper, we take a step back to ask more general questions.
First, what does it mean to solve a task under processor anonymity? With tasks such as renaming, the traditional notion obviously does not apply. Instead of restricting ourselves to colorless tasks, we propose using the notion of group solvability, which allows transferring any task to processor-anonymous models.
Second, we consider solving tasks read-write wait-free under full anonymity, and we ask what we call the eventual-pattern question: if anonymous processors forever read and write in anonymous shared-memory, learning about inputs of other processors, what is the structure of the eventually-stable sets of inputs that processors learn? Solving the eventual-pattern question leads us to a group solution to the snapshot task and to M(M-1)/2-renaming, where M is the number of distinct inputs. Finally, using the snapshot solution, we easily obtain a solution to obstruction-free consensus.

[628]  arXiv:2405.03574 [pdf, other]
Title: ILILT: Implicit Learning of Inverse Lithography Technologies
Comments: 12 pages, 10 figures, accepted by International Conference on Machine Learning (ICML24)
Subjects: Machine Learning (cs.LG)

Lithography, transferring chip design masks to the silicon wafer, is the most important phase in modern semiconductor manufacturing flow. Due to the limitations of lithography systems, Extensive design optimizations are required to tackle the design and silicon mismatch. Inverse lithography technology (ILT) is one of the promising solutions to perform pre-fabrication optimization, termed mask optimization. Because of mask optimization problems' constrained non-convexity, numerical ILT solvers rely heavily on good initialization to avoid getting stuck on sub-optimal solutions. Machine learning (ML) techniques are hence proposed to generate mask initialization for ILT solvers with one-shot inference, targeting faster and better convergence during ILT. This paper addresses the question of \textit{whether ML models can directly generate high-quality optimized masks without engaging ILT solvers in the loop}. We propose an implicit learning ILT framework: ILILT, which leverages the implicit layer learning method and lithography-conditioned inputs to ground the model. Trained to understand the ILT optimization procedure, ILILT can outperform the state-of-the-art machine learning solutions, significantly improving efficiency and quality.

[629]  arXiv:2405.03575 [pdf, other]
Title: A Valuation Framework for Customers Impacted by Extreme Temperature-Related Outages
Subjects: Systems and Control (eess.SY)

Extreme temperature outages can lead to not just economic losses but also various non-energy impacts (NEI) due to significant degradation of indoor operating conditions caused by service disruptions. However, existing resilience assessment approaches lack specificity for extreme temperature conditions. They often overlook temperature-related mortality and neglect the customer characteristics and grid response in the calculation, despite the significant influence of these factors on NEI-related economic losses. This paper aims to address these gaps by introducing a comprehensive framework to estimate the impact of resilience enhancement not only on the direct economic losses incurred by customers but also on potential NEI, including mortality and the value of statistical life during extreme temperature-related outages. The proposed resilience valuation integrates customer characteristics and grid response variables based on a scalable grid simulation environment. This study adopts a holistic approach to quantify customer-oriented economic impacts, utilizing probabilistic loss scenarios that incorporate health-related factors and damage/loss models as a function of exposure for valuation. The proposed methodology is demonstrated through comparative resilient outage planning, using grid response models emulating a Texas weather zone during the 2021 winter storm Uri. The case study results show that enhanced outage planning with hardened infrastructure can improve the system resilience and thereby reduce the relative risk of mortality by 16% and save the total costs related to non-energy impacts by 74%. These findings underscore the efficacy of the framework by assessing the financial implications of each case, providing valuable insights for decision-makers and stakeholders involved in extreme-weather related resilience planning for risk management and mitigation strategies.

[630]  arXiv:2405.03582 [pdf, other]
Title: Functional Latent Dynamics for Irregularly Sampled Time Series Forecasting
Subjects: Machine Learning (cs.LG)

Irregularly sampled time series with missing values are often observed in multiple real-world applications such as healthcare, climate and astronomy. They pose a significant challenge to standard deep learn- ing models that operate only on fully observed and regularly sampled time series. In order to capture the continuous dynamics of the irreg- ular time series, many models rely on solving an Ordinary Differential Equation (ODE) in the hidden state. These ODE-based models tend to perform slow and require large memory due to sequential operations and a complex ODE solver. As an alternative to complex ODE-based mod- els, we propose a family of models called Functional Latent Dynamics (FLD). Instead of solving the ODE, we use simple curves which exist at all time points to specify the continuous latent state in the model. The coefficients of these curves are learned only from the observed values in the time series ignoring the missing values. Through extensive experi- ments, we demonstrate that FLD achieves better performance compared to the best ODE-based model while reducing the runtime and memory overhead. Specifically, FLD requires an order of magnitude less time to infer the forecasts compared to the best performing forecasting model.

[631]  arXiv:2405.03585 [pdf, other]
Title: The Sociotechnical Stack: Opportunities for Social Computing Research in Non-consensual Intimate Media
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

Non-consensual intimate media (NCIM) involves sharing intimate content without the depicted person's consent, including "revenge porn" and sexually explicit deepfakes. While NCIM has received attention in legal, psychological, and communication fields over the past decade, it is not sufficiently addressed in computing scholarship. This paper addresses this gap by linking NCIM harms to the specific technological components that facilitate them. We introduce the sociotechnical stack, a conceptual framework designed to map the technical stack to its corresponding social impacts. The sociotechnical stack allows us to analyze sociotechnical problems like NCIM, and points toward opportunities for computing research. We propose a research roadmap for computing and social computing communities to deter NCIM perpetration and support victim-survivors through building and rebuilding technologies.

[632]  arXiv:2405.03588 [pdf, other]
Title: Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function
Subjects: Numerical Analysis (math.NA); Symbolic Computation (cs.SC)

We provide tools to help automate the error analysis of algorithms that evaluate simple functions over the floating-point numbers. The aim is to obtain tight relative error bounds for these algorithms, expressed as a function of the unit round-off. Due to the discrete nature of the set of floating-point numbers, the largest errors are often intrinsically "arithmetic" in the sense that their appearance may depend on specific bit patterns in the binary representations of intermediate variables, which may be present only for some precisions. We focus on generic (i.e., parameterized by the precision) and analytic over-estimations that still capture the correlations between the errors made at each step of the algorithms. Using methods from computer algebra, which we adapt to the particular structure of the polynomial systems that encode the errors, we obtain bounds with a linear term in the unit round-off that is sharp in manycases. An explicit quadratic bound is given, rather than the $O()$-estimate that is more common in this area. This is particularly important when using low precision formats, which are increasingly common in modern processors. Using this approach, we compare five algorithms for computing the hypotenuse function, ranging from elementary to quite challenging.

[633]  arXiv:2405.03590 [pdf, other]
Title: Deep Clustering with Self-Supervision using Pairwise Similarities
Subjects: Machine Learning (cs.LG)

Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. In this paper, we propose a novel deep clustering framework with self-supervision using pairwise similarities (DCSS). The proposed method consists of two successive phases. In the first phase, we propose to form hypersphere-like groups of similar data points, i.e. one hypersphere per cluster, employing an autoencoder that is trained using cluster-specific losses. The hyper-spheres are formed in the autoencoder's latent space. In the second phase, we propose to employ pairwise similarities to create a $K$-dimensional space that is capable of accommodating more complex cluster distributions, hence providing more accurate clustering performance. $K$ is the number of clusters. The autoencoder's latent space obtained in the first phase is used as the input of the second phase. The effectiveness of both phases is demonstrated on seven benchmark datasets by conducting a rigorous set of experiments.

[634]  arXiv:2405.03594 [pdf, other]
Title: Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up to 70% sparsity. We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning method and sparse pretraining of those models on a subset of the SlimPajama dataset mixed with a Python subset of The Stack dataset. We exhibit training acceleration due to sparsity on Cerebras CS-3 chips that closely matches theoretical scaling. In addition, we establish inference acceleration of up to 3x on CPUs by utilizing Neural Magic's DeepSparse engine and 1.7x on GPUs through Neural Magic's nm-vllm engine. The above gains are realized via sparsity alone, thus enabling further gains through additional use of quantization. Specifically, we show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x. We demonstrate these results across diverse, challenging tasks, including chat, instruction following, code generation, arithmetic reasoning, and summarization to prove their generality. This work paves the way for rapidly creating smaller and faster LLMs without sacrificing accuracy.

[635]  arXiv:2405.03595 [pdf, other]
Title: GREEN: Generative Radiology Report Evaluation and Error Notation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches."

[636]  arXiv:2405.03605 [pdf, other]
Title: Trackable Island-model Genetic Algorithms at Wafer Scale
Comments: arXiv admin note: substantial text overlap with arXiv:2404.10861
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC)

Emerging ML/AI hardware accelerators, like the 850,000 processor Cerebras Wafer-Scale Engine (WSE), hold great promise to scale up the capabilities of evolutionary computation. However, challenges remain in maintaining visibility into underlying evolutionary processes while efficiently utilizing these platforms' large processor counts. Here, we focus on the problem of extracting phylogenetic information from digital evolution on the WSE platform. We present a tracking-enabled asynchronous island-based genetic algorithm (GA) framework for WSE hardware. Emulated and on-hardware GA benchmarks with a simple tracking-enabled agent model clock upwards of 1 million generations a minute for population sizes reaching 16 million. This pace enables quadrillions of evaluations a day. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions. In particular, we demonstrate extraction of clear phylometric signals that differentiate wafer-scale runs with adaptive dynamics enabled versus disabled. Together, these benchmark and validation trials reflect strong potential for highly scalable evolutionary computation that is both efficient and observable. Kernel code implementing the island-model GA supports drop-in customization to support any fixed-length genome content and fitness criteria, allowing it to be leveraged to advance research interests across the community.

[637]  arXiv:2405.03609 [pdf, ps, other]
Title: Decision algorithms for reversibility of one-dimensional non-linear cellular automata under null boundary conditions
Comments: in Chinese language
Subjects: Computational Complexity (cs.CC)

The property of reversibility is quite meaningful for the classic theoretical computer science model, cellular automata. For the reversibility problem for a CA under null boundary conditions, while linear rules have been studied a lot, the non-linear rules remain unexplored at present. The paper investigates the reversibility problem of general one-dimensional CA on a finite field $\mathbb{Z}_p$, and proposes an approach to optimize the Amoroso's infinite CA surjectivity detection algorithm. This paper proposes algorithms for deciding the reversibility of one-dimensional CA under null boundary conditions. We propose a method to decide the strict reversibility of one-dimensional CA under null boundary conditions. We also provide a bucket chain based algorithm for calculating the reversibility function of one-dimensional CA under null boundary conditions. These decision algorithms work for not only linear rules but also non-linear rules. In addition, it has been confirmed that the reversibility function always has a period, and its periodicity is related to the periodicity of the corresponding bucket chain. Some of our experiment results of reversible CA are presented in the paper, complementing and validating the theoretical aspects, and thereby further supporting the research conclusions of this paper.

[638]  arXiv:2405.03613 [pdf, other]
Title: Dual Relation Mining Network for Zero-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, which can lead to classification ambiguity when different attributes share similar attention regions, and semantic relationship between attributes is rarely discussed. To alleviate the above problems, we propose a Dual Relation Mining Network (DRMN) to enable more effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer. Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion and conducts spatial attention for visual to semantic embedding. Moreover, an attribute-guided channel attention is utilized to decouple entangled semantic features. For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images. Additionally, a global classification branch is introduced as a complement to human-defined semantic attributes, and we then combine the results with attribute-based classification. Extensive experiments demonstrate that the proposed DRMN leads to new state-of-the-art performances on three standard ZSL benchmarks, i.e., CUB, SUN, and AwA2.

[639]  arXiv:2405.03614 [pdf, ps, other]
Title: Repairing with Zero Skip Cost
Subjects: Information Theory (cs.IT)

To measure repair latency at helper nodes, we introduce a new metric called skip cost that quantifies the number of contiguous sections accessed on a disk. We provide explicit constructions of zigzag codes and fractional repetition codes that incur zero skip cost

[640]  arXiv:2405.03615 [pdf, other]
Title: Nonnegative Matrix Factorization in Dimensionality Reduction: A Survey
Comments: 10 Paes, 2 figures, to be appear in acm computing survey
Subjects: Machine Learning (cs.LG)

Dimensionality Reduction plays a pivotal role in improving feature learning accuracy and reducing training time by eliminating redundant features, noise, and irrelevant data. Nonnegative Matrix Factorization (NMF) has emerged as a popular and powerful method for dimensionality reduction. Despite its extensive use, there remains a need for a comprehensive analysis of NMF in the context of dimensionality reduction. To address this gap, this paper presents a comprehensive survey of NMF, focusing on its applications in both feature extraction and feature selection. We introduce a classification of dimensionality reduction, enhancing understanding of the underlying concepts. Subsequently, we delve into a thorough summary of diverse NMF approaches used for feature extraction and selection. Furthermore, we discuss the latest research trends and potential future directions of NMF in dimensionality reduction, aiming to highlight areas that need further exploration and development.

[641]  arXiv:2405.03616 [pdf, other]
Title: A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Context. Nowadays, 83% of software developers use Large Language Models (LLMs) to generate code. LLMs recently became essential to increase the productivity of software developers and decrease the time and cost of software development. Developers ranging from novices to experts use LLM tools not only to detect and patch bugs, but also to integrate generated code into their software. However, as of today there is no objective assessment of the energy efficiency of the source code generated by LLM tools. Released in August 2023, Code Llama is one of the most recent LLM tools.
Goal. In this paper, we present an empirical study that assesses the energy efficiency of Code Llama with respect to human-written source code.
Method. We design an experiment involving three human-written benchmarks implemented in C++, JavaScript, and Python. We ask Code Llama to generate the code of the benchmarks using different prompts and temperatures. Therefore, we execute both implementations and profile their energy efficiency.
Results. Our study shows that the energy efficiency of code generated by Code Llama is heavily-dependent on the chosen programming language and the specific code problem at hand. Also, human implementations tend to be more energy efficient overall, with generated JavaScript code outperforming its human counterpart. Moreover, explicitly asking Code Llama to generate energy-efficient code results in an equal or worse energy efficiency, as well as using different temperatures seems not to affect the energy efficiency of generated code.
Conclusions. According to our results, code generated using Code Llama does not guarantee energy efficiency, even when prompted to do so. Therefore, software developers should evaluate the energy efficiency of generated code before integrating it into the software system under development.

[642]  arXiv:2405.03619 [pdf, ps, other]
Title: The trade-offs between Monolithic vs. Distributed Architectures
Subjects: Software Engineering (cs.SE)

Software architects frequently engage in trade-off analysis, often confronting sub-optimal solutions due to unforeseen or overlooked disadvantages. Such outcomes can detrimentally affect a company's business operations and resource allocation. This article conducts a critical review of archi- tectural styles, particularly focusing on the strengths and weaknesses of both monolithic and distributed architectures, and their relationship to architectural characteristics. It also explores the role of cloud computing in transitioning from monolithic to distributed-based applications. Utilizing a broad range of sources, including papers and books from both industry and academia, this research provides an overview from theoretical foundations to practical applications. A notable trend observed is a shift back from distributed to monolithic architectures, possibly due to factors such as cost, complexity, and performance.

[643]  arXiv:2405.03620 [pdf, other]
Title: Detecting Android Malware: From Neural Embeddings to Hands-On Validation with BERTroid
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

As cyber threats and malware attacks increasingly alarm both individuals and businesses, the urgency for proactive malware countermeasures intensifies. This has driven a rising interest in automated machine learning solutions. Transformers, a cutting-edge category of attention-based deep learning methods, have demonstrated remarkable success. In this paper, we present BERTroid, an innovative malware detection model built on the BERT architecture. Overall, BERTroid emerged as a promising solution for combating Android malware. Its ability to outperform state-of-the-art solutions demonstrates its potential as a proactive defense mechanism against malicious software attacks. Additionally, we evaluate BERTroid on multiple datasets to assess its performance across diverse scenarios. In the dynamic landscape of cybersecurity, our approach has demonstrated promising resilience against the rapid evolution of malware on Android systems. While the machine learning model captures broad patterns, we emphasize the role of manual validation for deeper comprehension and insight into these behaviors. This human intervention is critical for discerning intricate and context-specific behaviors, thereby validating and reinforcing the model's findings.

[644]  arXiv:2405.03624 [pdf, ps, other]
Title: $ε$-Policy Gradient for Online Pricing
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistical Finance (q-fin.ST); Machine Learning (stat.ML)

Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $\epsilon$-policy gradient algorithm for the online pricing learning task. The algorithm extends $\epsilon$-greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the exploration cost in terms of the exploration probability $\epsilon$ and the exploitation cost in terms of the gradient descent optimization and gradient estimation errors. The algorithm achieves an expected regret of order $\mathcal{O}(\sqrt{T})$ (up to a logarithmic factor) over $T$ trials.

[645]  arXiv:2405.03628 [pdf, ps, other]
Title: State-Aware Timeliness in Energy Harvesting IoT Systems Monitoring a Markovian Source
Comments: Submitted for journal publication. arXiv admin note: text overlap with arXiv:1907.03826
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In this study, we investigate the optimal transmission policies within an energy harvesting status update system, where the demand for status updates depends on the state of the source. The system monitors a two-state Markovian source that characterizes a stochastic process, which can be in either a normal state or an alarm state, with a higher demand for fresh updates when the source is in the alarm state. We propose a metric to capture the freshness of status updates for each state of the stochastic process by introducing two Age of Information (AoI) variables, extending the definition of AoI to account for the state changes of the stochastic process. We formulate the problem as a Markov Decision Process (MDP), utilizing a transition cost function that applies linear and non-linear penalties based on AoI and the state of the stochastic process. Through analytical investigation, we delve into the structure of the optimal transmission policy for the resulting MDP problem. Furthermore, we evaluate the derived policies via numerical results and demonstrate their effectiveness in reserving energy in anticipation of forthcoming alarm states.

[646]  arXiv:2405.03629 [pdf, other]
Title: Configuration-Constrained Tube MPC for Tracking
Comments: 14 pages, paper presented to the L-CSS/CDC combined submission
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a novel tube-based Model Predictive Control (MPC) framework for tracking varying setpoint references with linear systems subject to additive and multiplicative uncertainties. The MPC controllers designed using this framework exhibit recursively feasible for changing references, and robust asymptotic stability for piecewise constant references. The framework leverages configuration-constrained polytopes to parameterize the tubes, offering flexibility to optimize their shape. The efficacy of the approach is demonstrated through two numerical examples. The first example illustrates the theoretical results, and the second uses the framework to design a lane-change controller for an autonomous vehicle.

[647]  arXiv:2405.03632 [pdf, other]
Title: LaserEscape: Detecting and Mitigating Optical Probing Attacks
Subjects: Cryptography and Security (cs.CR)

The security of integrated circuits (ICs) can be broken by sophisticated physical attacks relying on failure analysis methods. Optical probing is one of the most prominent examples of such attacks, which can be accomplished in a matter of days, even with limited knowledge of the IC under attack. Unfortunately, few countermeasures are proposed in the literature, and none has been fabricated and tested in practice. These countermeasures usually require changing the standard cell libraries and, thus, are incompatible with digital and programmable platforms, such as field programmable gate arrays (FPGAs). In this work, we shift our attention from preventing the attack to detecting and responding to it. We introduce LaserEscape, the first fully digital and FPGA-compatible countermeasure to detect and mitigate optical probing attacks. LaserEscape incorporates digital delay-based sensors to reliably detect the physical alteration on the fabric caused by laser beam irradiations in real time. Furthermore, as a response to the attack, LaserEscape deploys real-time hiding approaches using randomized hardware reconfigurability. It realizes 1) moving target defense (MTD) to physically move the sensitive circuity under attack out of the probing field of focus to protect secret keys and 2) polymorphism to logically obfuscate the functionality of the targeted circuit to counter function extraction and reverse engineering attempts. We demonstrate the effectiveness and resiliency of our approach by performing optical probing attacks on protected and unprotected designs on a 28-nm FPGA. Our results show that optical probing attacks can be reliably detected and mitigated without interrupting the chip's operation.

[648]  arXiv:2405.03633 [pdf, other]
Title: Neural Graph Mapping for Dense SLAM with Efficient Loop Closure
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Existing neural field-based SLAM methods typically employ a single monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings, we propose a neural mapping framework which anchors lightweight neural fields to the pose graph of a sparse visual SLAM system. Our approach shows the ability to integrate large-scale loop closures, while limiting necessary reintegration. Furthermore, we verify the scalability of our approach by demonstrating successful building-scale mapping taking multiple loop closures into account during the optimization, and show that our method outperforms existing state-of-the-art approaches on large scenes in terms of quality and runtime. Our code is available at https://kth-rpl.github.io/neural_graph_mapping/.

[649]  arXiv:2405.03636 [pdf, other]
Title: Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey
Comments: Submitted to ACM Computing Surveys
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology enabling collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be "reverse engineered" to infer information about the private training data. It has been shown under a wide variety of settings that this premise for privacy does {\em not} hold.
In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which FL client privacy can be broken. We dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL. We conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.

[650]  arXiv:2405.03637 [pdf, other]
Title: Collage: Light-Weight Low-Precision Strategy for LLM Training
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)

Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15\%$ to $23\%$ less memory usage in practice.

[651]  arXiv:2405.03638 [pdf, other]
Title: Cosine Annealing Optimized Denoising Diffusion Error Correction Codes
Subjects: Information Theory (cs.IT)

To address the issue of increased bit error rates during the later stages of linear search in denoising diffusion error correction codes, we propose a novel method that optimizes denoising diffusion error correction codes (ECC) using cosine annealing. In response to the challenge of decoding long codewords, the proposed method employs a variance adjustment strategy during the reverse diffusion process, rather than maintaining a constant variance. By leveraging cosine annealing, this method effectively lowers the bit error rate and enhances decoding effciency. This letter extensively validates the approach through experiments and demonstrates signifcant improvements in bit error rate reduction and iteration effciency compared to existing methods. This advancement offers a promising solution for improving ECC decoding performance, potentially impacting secure digital communication practices.

[652]  arXiv:2405.03642 [pdf, other]
Title: Classification of Breast Cancer Histopathology Images using a Modified Supervised Contrastive Learning Method
Comments: 16 pages, 3 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep neural networks have reached remarkable achievements in medical image processing tasks, specifically classifying and detecting various diseases. However, when confronted with limited data, these networks face a critical vulnerability, often succumbing to overfitting by excessively memorizing the limited information available. This work addresses the challenge mentioned above by improving the supervised contrastive learning method to reduce the impact of false positives. Unlike most existing methods that rely predominantly on fully supervised learning, our approach leverages the advantages of self-supervised learning in conjunction with employing the available labeled data. We evaluate our method on the BreakHis dataset, which consists of breast cancer histopathology images, and demonstrate an increase in classification accuracy by 1.45% at the image level and 1.42% at the patient level compared to the state-of-the-art method. This improvement corresponds to 93.63% absolute accuracy, highlighting our approach's effectiveness in leveraging data properties to learn more appropriate representation space.

[653]  arXiv:2405.03643 [pdf, other]
Title: Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a hybrid framework for consistently producing high-quality object tracks by combining an automated object tracker with little human input. The key idea is to tailor a module for each dataset to intelligently decide when an object tracker is failing and so humans should be brought in to re-localize an object for continued tracking. Our approach leverages self-supervised learning on unlabeled videos to learn a tailored representation for a target object that is then used to actively monitor its tracked region and decide when the tracker fails. Since labeled data is not needed, our approach can be applied to novel object categories. Experiments on three datasets demonstrate our method outperforms existing approaches, especially for small, fast moving, or occluded objects.

[654]  arXiv:2405.03644 [pdf, other]
Title: When LLMs Meet Cybersecurity: A Systematic Literature Review
Comments: 36 pages, 7 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

The rapid advancements in large language models (LLMs) have opened new avenues across various fields, including cybersecurity, which faces an ever-evolving threat landscape and need for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper bridge this gap by providing a systematic literature review, encompassing an analysis of over 180 works, spanning across 25 LLMs and more than 10 downstream scenarios. Our comprehensive overview addresses three critical research questions: the construction of cybersecurity-oriented LLMs, LLMs' applications in various cybersecurity tasks, and the existing challenges and further research in this area. This study aims to shed light on the extensive potential of LLMs in enhancing cybersecurity practices, and serve as a valuable resource for applying LLMs in this doamin. We also maintain and regularly updated list of practical guides on LLMs for cybersecurity at https://github.com/tmylla/Awesome-LLM4Cybersecurity.

[655]  arXiv:2405.03646 [pdf, other]
Title: Content-Oblivious Leader Election on Rings
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

In content-oblivious computation, n nodes wish to compute a given task over an asynchronous network that suffers from an extremely harsh type of noise, which corrupts the content of all messages across all channels. In a recent work, Censor-Hillel, Cohen, Gelles, and Sela (Distributed Computing, 2023) showed how to perform arbitrary computations in a content-oblivious way in 2-edge connected networks but only if the network has a distinguished node (called root) to initiate the computation.
Our goal is to remove this assumption, which was conjectured to be necessary. Achieving this goal essentially reduces to performing a content-oblivious leader election since an elected leader can then serve as the root required to perform arbitrary content-oblivious computations. We focus on ring networks, which are the simplest 2-edge connected graphs. On oriented rings, we obtain a leader election algorithm with message complexity O(n*ID_max), where ID_max is the maximal assigned ID. As it turns out, this dependency on $ID_max$ is inherent: we show a lower bound of Omega(n*log(ID_max/n)) messages for content-oblivious leader election algorithms. We also extend our results to non-oriented rings, where nodes cannot tell which channel leads to which neighbor. In this case, however, the algorithm does not terminate but only reaches quiescence.

[656]  arXiv:2405.03649 [pdf, other]
Title: Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation
Comments: Accepted to IJCAI 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and propose a self-guided spurious correlation mitigation framework. Our framework automatically constructs fine-grained training labels tailored for a classifier obtained with empirical risk minimization to improve its robustness against spurious correlations. The fine-grained training labels are formulated with different prediction behaviors of the classifier identified in a novel spuriousness embedding space. We construct the space with automatically detected conceptual attributes and a novel spuriousness metric which measures how likely a class-attribute correlation is exploited for predictions. We demonstrate that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori and outperforms prior methods on five real-world datasets.

[657]  arXiv:2405.03650 [pdf, other]
Title: Generated Contents Enrichment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper, we investigate a novel artificial intelligence generation task, termed as generated contents enrichment (GCE). Different from conventional artificial intelligence contents generation task that enriches the given textual description implicitly with limited semantics for generating visually real content, our proposed GCE strives to perform content enrichment explicitly on both the visual and textual domain, from which the enriched contents are visually real, structurally reasonable, and semantically abundant. Towards to solve GCE, we propose a deep end-to-end method that explicitly explores the semantics and inter-semantic relationships during the enrichment. Specifically, we first model the input description as a semantic graph, wherein each node represents an object and each edge corresponds to the inter-object relationship. We then adopt Graph Convolutional Networks on top of the input scene description to predict the enriching objects and their relationships with the input objects. Finally, the enriched graph is fed into an image synthesis model to carry out the visual contents generation. Our experiments conducted on the Visual Genome dataset exhibit promising and visually plausible results.

[658]  arXiv:2405.03651 [pdf, other]
Title: Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders
Comments: ICLR 2024
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit either with dual-encoders (DE) or CUR matrix factorization. DE-based retrieve-and-rerank approaches suffer from poor recall on new domains and the retrieval with DE is decoupled from the CE. While CUR-based approaches can be more accurate than the DE-based approach, they require a prohibitively large number of CE calls to compute item embeddings, thus making it impractical for deployment at scale. In this paper, we address these shortcomings with our proposed sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity. We compute item embeddings offline by factorizing a sparse matrix containing query-item CE scores for a set of train queries. Our method produces a high-quality approximation while requiring only a fraction of CE calls as compared to CUR-based methods, and allows for leveraging DE to initialize the embedding space while avoiding compute- and resource-intensive finetuning of DE via distillation. At test time, the item embeddings remain fixed and retrieval occurs over rounds, alternating between a) estimating the test query embedding by minimizing error in approximating CE scores of items retrieved thus far, and b) using the updated test query embedding for retrieving more items. Our k-NN search method improves recall by up to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, our indexing approach achieves a speedup of up to 100x over CUR-based and 5x over DE distillation methods, while matching or improving k-NN search recall over baselines.

[659]  arXiv:2405.03652 [pdf, ps, other]
Title: Field-of-View Extension for Diffusion MRI via Deep Generative Models
Comments: 20 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tractography for corrupted data with incomplete FOV. Therefore, our approach provides a desirable alternative to discarding the valuable dMRI data, enabling subsequent tractography analyses that would otherwise be challenging or unattainable with corrupted data. Approach: We propose a framework based on a deep generative model that estimates the absent brain regions in dMRI scans with incomplete FOV. The model is capable of learning both the diffusion characteristics in diffusion-weighted images (DWI) and the anatomical features evident in the corresponding structural images for efficiently imputing missing slices of DWI outside of incomplete FOV. Results: For evaluating the imputed slices, on the WRAP dataset the proposed framework achieved PSNRb0=22.397, SSIMb0=0.905, PSNRb1300=22.479, SSIMb1300=0.893; on the NACC dataset it achieved PSNRb0=21.304, SSIMb0=0.892, PSNRb1300=21.599, SSIMb1300= 0.877. The proposed framework improved the tractography accuracy, as demonstrated by an increased average Dice score for 72 tracts (p < 0.001) on both the WRAP and NACC datasets. Conclusions: Results suggest that the proposed framework achieved sufficient imputation performance in dMRI data with incomplete FOV for improving whole-brain tractography, thereby repairing the corrupted data. Our approach achieved more accurate whole-brain tractography results with extended and complete FOV and reduced the uncertainty when analyzing bundles associated with Alzheimer's Disease.

[660]  arXiv:2405.03654 [pdf, other]
Title: Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks.

[661]  arXiv:2405.03658 [pdf, other]
Title: A review on data-driven constitutive laws for solids
Comments: 57 pages, 7 Figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Applied Physics (physics.app-ph)

This review article highlights state-of-the-art data-driven techniques to discover, encode, surrogate, or emulate constitutive laws that describe the path-independent and path-dependent response of solids. Our objective is to provide an organized taxonomy to a large spectrum of methodologies developed in the past decades and to discuss the benefits and drawbacks of the various techniques for interpreting and forecasting mechanics behavior across different scales. Distinguishing between machine-learning-based and model-free methods, we further categorize approaches based on their interpretability and on their learning process/type of required data, while discussing the key problems of generalization and trustworthiness. We attempt to provide a road map of how these can be reconciled in a data-availability-aware context. We also touch upon relevant aspects such as data sampling techniques, design of experiments, verification, and validation.

[662]  arXiv:2405.03659 [pdf, other]
Title: A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset.

[663]  arXiv:2405.03660 [pdf, other]
Title: CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification
Comments: 18 Pages, 4 Figures and Accepted in ICDAR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap. Our methodology and evaluation align with the established practices of this domain. Additionally, we propose zero-shot splits for the RVL-CDIP dataset. Furthermore, we introduce CICA (pronounced 'ki-ka'), a framework that enhances the zero-shot learning capabilities of CLIP. CICA consists of a novel 'content module' designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP's text and image features using a novel 'coupled-contrastive' loss. Our module improves CLIP's ZSL top-1 accuracy by 6.7% and GZSL harmonic mean by 24% on the RVL-CDIP dataset. Our module is lightweight and adds only 3.3% more parameters to CLIP. Our work sets the direction for future research in zero-shot document classification.

[664]  arXiv:2405.03661 [pdf, ps, other]
Title: Competitive strategies to use "warm start" algorithms with predictions
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

We consider the problem of learning and using predictions for warm start algorithms with predictions. In this setting, an algorithm is given an instance of a problem, and a prediction of the solution. The runtime of the algorithm is bounded by the distance from the predicted solution to the true solution of the instance. Previous work has shown that when instances are drawn iid from some distribution, it is possible to learn an approximately optimal fixed prediction (Dinitz et al, NeurIPS 2021), and in the adversarial online case, it is possible to compete with the best fixed prediction in hindsight (Khodak et al, NeurIPS 2022).
In this work we give competitive guarantees against stronger benchmarks that consider a set of $k$ predictions $\mathbf{P}$. That is, the "optimal offline cost" to solve an instance with respect to $\mathbf{P}$ is the distance from the true solution to the closest member of $\mathbf{P}$. This is analogous to the $k$-medians objective function. In the distributional setting, we show a simple strategy that incurs cost that is at most an $O(k)$ factor worse than the optimal offline cost. We then show a way to leverage learnable coarse information, in the form of partitions of the instance space into groups of "similar" instances, that allows us to potentially avoid this $O(k)$ factor.
Finally, we consider an online version of the problem, where we compete against offline strategies that are allowed to maintain a moving set of $k$ predictions or "trajectories," and are charged for how much the predictions move. We give an algorithm that does at most $O(k^4 \ln^2 k)$ times as much work as any offline strategy of $k$ trajectories. This algorithm is deterministic (robust to an adaptive adversary), and oblivious to the setting of $k$. Thus the guarantee holds for all $k$ simultaneously.

[665]  arXiv:2405.03662 [pdf, other]
Title: Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We describe a method for recovering the irradiance underlying a collection of images corrupted by atmospheric turbulence. Since supervised data is often technically impossible to obtain, assumptions and biases have to be imposed to solve this inverse problem, and we choose to model them explicitly. Rather than initializing a latent irradiance ("template") by heuristics to estimate deformation, we select one of the images as a reference, and model the deformation in this image by the aggregation of the optical flow from it to other images, exploiting a prior imposed by Central Limit Theorem. Then with a novel flow inversion module, the model registers each image TO the template but WITHOUT the template, avoiding artifacts related to poor template initialization. To illustrate the robustness of the method, we simply (i) select the first frame as the reference and (ii) use the simplest optical flow to estimate the warpings, yet the improvement in registration is decisive in the final reconstruction, as we achieve state-of-the-art performance despite its simplicity. The method establishes a strong baseline that can be further improved by integrating it seamlessly into more sophisticated pipelines, or with domain-specific methods if so desired.

[666]  arXiv:2405.03664 [pdf, other]
Title: A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance.
We introduce a new family of distances parameterized by $k \ge 0$, called $k$-RPW, that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $\mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance.
Using the partial $p$-Wasserstein distance, we extend our distance to any $p \in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the L\'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.

[667]  arXiv:2405.03666 [pdf, other]
Title: ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection
Comments: 16 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Bimanual manipulation is a longstanding challenge in robotics due to the large number of degrees of freedom and the strict spatial and temporal synchronization required to generate meaningful behavior. Humans learn bimanual manipulation skills by watching other humans and by refining their abilities through play. In this work, we aim to enable robots to learn bimanual manipulation behaviors from human video demonstrations and fine-tune them through interaction. Inspired by seminal work in psychology and biomechanics, we propose modeling the interaction between two hands as a serial kinematic linkage -- as a screw motion, in particular, that we use to define a new action space for bimanual manipulation: screw actions. We introduce ScrewMimic, a framework that leverages this novel action representation to facilitate learning from human demonstration and self-supervised policy fine-tuning. Our experiments demonstrate that ScrewMimic is able to learn several complex bimanual behaviors from a single human video demonstration, and that it outperforms baselines that interpret demonstrations and fine-tune directly in the original space of motion of both arms. For more information and video results, https://robin-lab.cs.utexas.edu/ScrewMimic/

[668]  arXiv:2405.03669 [pdf, other]
Title: IMELL Cut Elimination with Linear Overhead
Comments: Version with proofs of the FSCD 2024 paper with the same title
Subjects: Logic in Computer Science (cs.LO)

Recently, Accattoli introduced the Exponential Substitution Calculus (ESC) given by untyped proof terms for Intuitionistic Multiplicative Exponential Linear Logic (IMELL), endowed with rewriting rules at-a-distance for cut elimination. He also introduced a new cut elimination strategy, dubbed the good strategy, and showed that its number of steps is a time cost model with polynomial overhead for the ESC/IMELL, and the first such one.
Here, we refine Accattoli's result by introducing an abstract machine for ESC and proving that it implements the good strategy and computes cut-free terms/proofs within a linear overhead.

[669]  arXiv:2405.03671 [pdf, ps, other]
Title: Prompting Task Trees using Gemini: Methodologies and Insights
Authors: Pallavi Tandra
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Robots are the future of every technology where every advanced technology eventually will be used to make robots which are more efficient. The major challenge today is to train the robots exactly and empathetically using knowledge representation. This paper gives you insights of how we can use unstructured knowledge representation and convert them to meaningful structured representation with the help of prompt engineering which can be eventually used in the robots to make help them understand how human brain can make wonders with the minimal data or objects can providing to them.

[670]  arXiv:2405.03672 [pdf, other]
Title: Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre
Authors: Nicholas Carlini
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024. We first reveal significant flaws in the evaluation that point to clear signs of gradient masking. We then show the cause of this gradient masking: a bug in the original evaluation code. By fixing a single line of code in the original repository, we reduce Sabre's robust accuracy to 0%. In response to this, the authors modify the defense and introduce a new defense component not described in the original paper. But this fix contains a second bug; modifying one more line of code reduces robust accuracy to below baseline levels.

[671]  arXiv:2405.03673 [pdf, other]
Title: MemoryMamba: Memory-Augmented State Space Model for Defect Recognition
Comments: 15 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

As automation advances in manufacturing, the demand for precise and sophisticated defect detection technologies grows. Existing vision models for defect recognition methods are insufficient for handling the complexities and variations of defects in contemporary manufacturing settings. These models especially struggle in scenarios involving limited or imbalanced defect data. In this work, we introduce MemoryMamba, a novel memory-augmented state space model (SSM), designed to overcome the limitations of existing defect recognition models. MemoryMamba integrates the state space model with the memory augmentation mechanism, enabling the system to maintain and retrieve essential defect-specific information in training. Its architecture is designed to capture dependencies and intricate defect characteristics, which are crucial for effective defect detection. In the experiments, MemoryMamba was evaluated across four industrial datasets with diverse defect types and complexities. The model consistently outperformed other methods, demonstrating its capability to adapt to various defect recognition scenarios.

[672]  arXiv:2405.03674 [pdf, other]
Title: Anti-Heroes: An Ethics-focused Method for Responsible Designer Intentions
Subjects: Human-Computer Interaction (cs.HC)

HCI and design researchers have designed, adopted, and customized a range of ethics-focused methods to inscribe values and support ethical decision making in a design process. In this work-in-progress, we add to this body of resources, constructing a method that surfaces the designer's intentions in an action-focused way, encouraging consideration of both manipulative and value-centered roles. Anti-Heroes is a card deck that allows a designer to playfully take on pairs of manipulative (Anti-Hero) and value-centered (Hero) roles during design ideation/conceptualization, evaluation, and ethical dialogue. The card deck includes twelve cards with Anti-Hero and Hero faces, along with three action cards that include reflective questions for different play modes. Alongside the creation of the Anti-Hero card deck, we describe the evaluation and iteration of the card deck through playtesting sessions with four groups of three design students. We propose implications of Anti-Heros for technology and design education and practice.

[673]  arXiv:2405.03676 [pdf, other]
Title: Why is SAM Robust to Label Noise?
Subjects: Machine Learning (cs.LG)

Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art performances on natural image and language tasks. However, its most pronounced improvements (of tens of percent) is rather in the presence of label noise. Understanding SAM's label noise robustness requires a departure from characterizing the robustness of minimas lying in "flatter" regions of the loss landscape. In particular, the peak performance under label noise occurs with early stopping, far before the loss converges. We decompose SAM's robustness into two effects: one induced by changes to the logit term and the other induced by changes to the network Jacobian. The first can be observed in linear logistic regression where SAM provably up-weights the gradient contribution from clean examples. Although this explicit up-weighting is also observable in neural networks, when we intervene and modify SAM to remove this effect, surprisingly, we see no visible degradation in performance. We infer that SAM's effect in deeper networks is instead explained entirely by the effect SAM has on the network Jacobian. We theoretically derive the implicit regularization induced by this Jacobian effect in two layer linear networks. Motivated by our analysis, we see that cheaper alternatives to SAM that explicitly induce these regularization effects largely recover the benefits in deep networks trained on real-world datasets.

[674]  arXiv:2405.03677 [pdf, other]
Title: Towards A Human-in-the-Loop LLM Approach to Collaborative Discourse Analysis
Comments: In press at the 25th international conference on Artificial Intelligence in Education (AIED) Late-Breaking Results (LBR) track
Subjects: Computation and Language (cs.CL)

LLMs have demonstrated proficiency in contextualizing their outputs using human input, often matching or beating human-level performance on a variety of tasks. However, LLMs have not yet been used to characterize synergistic learning in students' collaborative discourse. In this exploratory work, we take a first step towards adopting a human-in-the-loop prompt engineering approach with GPT-4-Turbo to summarize and categorize students' synergistic learning during collaborative discourse. Our preliminary findings suggest GPT-4-Turbo may be able to characterize students' synergistic learning in a manner comparable to humans and that our approach warrants further investigation.

[675]  arXiv:2405.03682 [pdf, other]
Title: An Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas
Comments: Accepted at CVPR 2024 workshops. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a pipeline that leverages Stable Diffusion to improve inpainting results in the context of defurnishing -- the removal of furniture items from indoor panorama images. Specifically, we illustrate how increased context, domain-specific model fine-tuning, and improved image blending can produce high-fidelity inpaints that are geometrically plausible without needing to rely on room layout estimation. We demonstrate qualitative and quantitative improvements over other furniture removal techniques.

[676]  arXiv:2405.03685 [pdf, other]
Title: Language-Image Models with 3D Understanding
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formulation: as multi-turn question-answering. Next, we introduce a new MLLM named Cube-LLM and pre-train it on LV3D. We show that pure data scaling makes a strong 3D perception capability without 3D specific architectural design or training objective. Cube-LLM exhibits intriguing properties similar to LLMs: (1) Cube-LLM can apply chain-of-thought prompting to improve 3D understanding from 2D context information. (2) Cube-LLM can follow complex and diverse instructions and adapt to versatile input and output formats. (3) Cube-LLM can be visually prompted such as 2D box or a set of candidate 3D boxes from specialists. Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21.3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17.7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively. Cube-LLM also shows competitive results in general MLLM benchmarks such as refCOCO for 2D grounding with (87.0) average score, as well as visual question answering benchmarks such as VQAv2, GQA, SQA, POPE, etc. for complex reasoning. Our project is available at https://janghyuncho.github.io/Cube-LLM.

[677]  arXiv:2405.03687 [pdf, other]
Title: Monotone Randomized Apportionment
Subjects: Computer Science and Game Theory (cs.GT); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

Apportionment is the act of distributing the seats of a legislature among political parties (or states) in proportion to their vote shares (or populations). A famous impossibility by Balinski and Young (2001) shows that no apportionment method can be proportional up to one seat (quota) while also responding monotonically to changes in the votes (population monotonicity). Grimmett (2004) proposed to overcome this impossibility by randomizing the apportionment, which can achieve quota as well as perfect proportionality and monotonicity -- at least in terms of the expected number of seats awarded to each party. Still, the correlations between the seats awarded to different parties may exhibit bizarre non-monotonicities. When parties or voters care about joint events, such as whether a coalition of parties reaches a majority, these non-monotonicities can cause paradoxes, including incentives for strategic voting.
In this paper, we propose monotonicity axioms ruling out these paradoxes, and study which of them can be satisfied jointly with Grimmett's axioms. Essentially, we require that, if a set of parties all receive more votes, the probability of those parties jointly receiving more seats should increase. Our work draws on a rich literature on unequal probability sampling in statistics (studied as dependent randomized rounding in computer science). Our main result shows that a sampling scheme due to Sampford (1967) satisfies Grimmett's axioms and a notion of higher-order correlation monotonicity.

[678]  arXiv:2405.03688 [pdf, other]
Title: Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames
Comments: 15 pages, 9 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Adversarial information operations can destabilize societies by undermining fair elections, manipulating public opinions on policies, and promoting scams. Despite their widespread occurrence and potential impacts, our understanding of influence campaigns is limited by manual analysis of messages and subjective interpretation of their observable behavior. In this paper, we explore whether these limitations can be mitigated with large language models (LLMs), using GPT-3.5 as a case-study for coordinated campaign annotation. We first use GPT-3.5 to scrutinize 126 identified information operations spanning over a decade. We utilize a number of metrics to quantify the close (if imperfect) agreement between LLM and ground truth descriptions. We next extract coordinated campaigns from two large multilingual datasets from X (formerly Twitter) that respectively discuss the 2022 French election and 2023 Balikaran Philippine-U.S. military exercise in 2023. For each coordinated campaign, we use GPT-3.5 to analyze posts related to a specific concern and extract goals, tactics, and narrative frames, both before and after critical events (such as the date of an election). While the GPT-3.5 sometimes disagrees with subjective interpretation, its ability to summarize and interpret demonstrates LLMs' potential to extract higher-order indicators from text to provide a more complete picture of the information campaigns compared to previous methods.

[679]  arXiv:2405.03689 [pdf, other]
Title: Pose Priors from Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans. Our central insight is that since language is often used to describe physical interaction, large pretrained text-based models can act as priors on pose estimation.
We can thus leverage this insight to improve pose estimation by converting natural language descriptors, generated by a large multimodal model (LMM), into tractable losses to constrain the 3D pose optimization. Despite its simplicity, our method produces surprisingly compelling pose reconstructions of people in close contact, correctly capturing the semantics of the social and physical interactions. We demonstrate that our method rivals more complex state-of-the-art approaches that require expensive human annotation of contact points and training specialized models. Moreover, unlike previous approaches, our method provides a unified framework for resolving self-contact and person-to-person contact.

[680]  arXiv:2405.03690 [pdf, other]
Title: Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks. These models have the potential to be deployed in real-world applications such as robotics, AI assistants, medical imaging, and autonomous vehicles. The widespread adoption of Video-LMMs in our daily lives underscores the importance of ensuring and evaluating their robust performance in mirroring human-like reasoning and interaction capabilities in complex, real-world contexts. However, existing benchmarks for Video-LMMs primarily focus on general video comprehension abilities and neglect assessing their reasoning capabilities over complex videos in the real-world context, and robustness of these models through the lens of user prompts as text queries. In this paper, we present the Complex Video Reasoning and Robustness Evaluation Suite (CVRR-ES), a novel benchmark that comprehensively assesses the performance of Video-LMMs across 11 diverse real-world video dimensions. We evaluate 9 recent models, including both open-source and closed-source variants, and find that most of the Video-LMMs, {especially open-source ones,} struggle with robustness and reasoning when dealing with complex videos. Based on our analysis, we develop a training-free Dual-Step Contextual Prompting (DSCP) technique to enhance the performance of existing Video-LMMs. Our findings provide valuable insights for building the next generation of human-centric AI systems with advanced robustness and reasoning capabilities. Our dataset and code are publicly available at: https://mbzuai-oryx.github.io/CVRR-Evaluation-Suite/.

Cross-lists for Tue, 7 May 24

[681]  arXiv:2003.05708 (cross-list from q-fin.CP) [pdf, ps, other]
Title: Multilevel Monte Carlo with Numerical Smoothing for Robust and Efficient Computation of Probabilities and Densities
Subjects: Computational Finance (q-fin.CP); Numerical Analysis (math.NA)

The multilevel Monte Carlo (MLMC) method is highly efficient for estimating expectations of a functional of a solution to a stochastic differential equation (SDE). However, MLMC estimators may be unstable and have a poor (noncanonical) complexity in the case of low regularity of the functional. To overcome this issue, we extend our previously introduced idea of numerical smoothing in (Quantitative Finance, 23(2), 209-227, 2023), in the context of deterministic quadrature methods to the MLMC setting. The numerical smoothing technique is based on root-finding methods combined with one-dimensional numerical integration with respect to a single well-chosen variable. This study is motivated by the computation of probabilities of events, pricing options with a discontinuous payoff, and density estimation problems for dynamics where the discretization of the underlying stochastic processes is necessary. The analysis and numerical experiments reveal that the numerical smoothing significantly improves the strong convergence, and consequently, the complexity and robustness (by making the kurtosis at deep levels bounded) of the MLMC method. In particular, we show that numerical smoothing enables recovering the MLMC complexities obtained for Lipschitz functionals due to the optimal variance decay rate when using the Euler--Maruyama scheme. For the Milstein scheme, numerical smoothing recovers the canonical MLMC complexity even for the nonsmooth integrand mentioned above. Finally, our approach efficiently estimates univariate and multivariate density functions.

[682]  arXiv:2405.02339 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: Noise Models in the LISA Mission
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Systems and Control (eess.SY)

This document briefly describes the noise models and shapes used for the synthesis of the Drag-Free and Attitude Control System in the LISA space mission. LISA (Laser Interferometer Space Antenna) is one of the next large-class missions from the European Space Agency (ESA), expected to be launched in 2034. The main goal of the mission is to detect the gravitational waves, which are undulatory perturbations of the space-time fabric, extremely important to collect experimental proofs for the General Relativity Theory. In the 90s, different international collaborations of institutes laid the foundations for the first ground-based interferometers (see, e.g., LIGO and Virgo). However, ground-based interferometers have a limited bandwidth due to the Earth's environmental noises and short arm-length of few kilometers. Therefore, they cannot observe gravitational waves belonging to the portion of the spectrum below 1 Hz. This issue can be overcome by means of space-based interferometers, that can have arm-lengths up to millions of kilometers and exploit a quieter environment than the Earth's surface. The LISA system is affected by actuation, sensing and environmental disturbances and noises. Among the actuation noises we have those given by the Micro Propulsion System (MPS), the Gravitational Reference Sensor (GRS) and the Optical Assembly (OA) motor. Among the sensing noises we consider the interferometer, the Differential Wavefront Sensor (DWS) and the GRS. The environmental disturbances are given by the solar radiation pressure, the test-mass stiffness and self-gravity, and the environmental noises acting directly on the test-mass.

[683]  arXiv:2405.02340 (cross-list from stat.AP) [pdf, other]
Title: A Comprehensive Approach to Carbon Dioxide Emission Analysis in High Human Development Index Countries using Statistical and Machine Learning Techniques
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

Reducing Carbon dioxide (CO2) emission is vital at both global and national levels, given their significant role in exacerbating climate change. CO2 emission, stemming from a variety of industrial and economic activities, are major contributors to the greenhouse effect and global warming, posing substantial obstacles in addressing climate issues. It's imperative to forecast CO2 emission trends and classify countries based on their emission patterns to effectively mitigate worldwide carbon emission. This paper presents an in-depth comparative study on the determinants of CO2 emission in twenty countries with high Human Development Index (HDI), exploring factors related to economy, environment, energy use, and renewable resources over a span of 25 years. The study unfolds in two distinct phases: initially, statistical techniques such as Ordinary Least Squares (OLS), fixed effects, and random effects models are applied to pinpoint significant determinants of CO2 emission. Following this, the study leverages supervised and unsupervised machine learning (ML) methods to further scrutinize and understand the factors influencing CO2 emission. Seasonal AutoRegressive Integrated Moving Average with eXogenous variables (SARIMAX), a supervised ML model, is first used to predict emission trends from historical data, offering practical insights for policy formulation. Subsequently, Dynamic Time Warping (DTW), an unsupervised learning approach, is used to group countries by similar emission patterns. The dual-phase approach utilized in this study significantly improves the accuracy of CO2 emission predictions while also providing a deeper insight into global emission trends. By adopting this thorough analytical framework, nations can develop more focused and effective carbon reduction policies, playing a vital role in the global initiative to combat climate change.

[684]  arXiv:2405.02366 (cross-list from astro-ph.IM) [pdf, other]
Title: Bayesian and Convolutional Networks for Hierarchical Morphological Classification of Galaxies
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Machine Learning (cs.LG)

This work is focused on the morphological classification of galaxies following the Hubble sequence in which the different classes are arranged in a hierarchy. The proposed method, BCNN, is composed of two main modules. First, a convolutional neural network (CNN) is trained with images of the different classes of galaxies (image augmentation is carried out to balance some classes); the CNN outputs the probability for each class of the hierarchy, and its outputs/predictions feed the second module. The second module consists of a Bayesian network that represents the hierarchy and helps to improve the prediction accuracy by combining the predictions of the first phase while maintaining the hierarchical constraint (in a hierarchy, an instance associated with a node must be associated to all its ancestors), through probabilistic inference over the Bayesian network so that a consistent prediction is obtained. Different images from the Hubble telescope have been collected and labeled by experts, which are used to perform the experiments. The results show that BCNN performed better than several CNNs in multiple evaluation measures, reaching the next scores: 67% in exact match, 78% in accuracy, and 83% in hierarchical F-measure.

[685]  arXiv:2405.02372 (cross-list from stat.ML) [pdf, ps, other]
Title: Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence
Comments: Accepted at ICML2024
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the parameters linked to the data stream. Nevertheless, this presumption often proves unattainable in practical scenarios due to factors such as estimation errors, system updates, etc. This paper aims to take the first attempt to develop a triadic-OCD framework with certifiable robustness, provable optimality, and guaranteed convergence. In addition, the proposed triadic-OCD algorithm can be realized in a fully asynchronous distributed manner, easing the necessity of transmitting the data to a single server. This asynchronous mechanism also could mitigate the straggler issue that faced by traditional synchronous algorithm. We then analyze the non-asymptotic convergence property of triadic-OCD and derive its iteration complexity to achieve an $\epsilon$-optimal point. Finally, extensive experiments have been conducted to elucidate the effectiveness of the proposed method.

[686]  arXiv:2405.02373 (cross-list from math.OC) [pdf, other]
Title: Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints
Comments: arXiv admin note: text overlap with arXiv:2305.15558
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper studies an online optimal resource reservation problem in communication networks with job transfers where the goal is to minimize the reservation cost while maintaining the blocking cost under a certain budget limit. To tackle this problem, we propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints. We then analyze the performance of our algorithm by establishing an upper bound for the associated regret and the cumulative constraint violations. Finally, we present numerical experiments where we compare the performance of our algorithm with those of reinforcement learning where we show that our algorithm surpasses it.

[687]  arXiv:2405.02374 (cross-list from q-bio.QM) [pdf, other]
Title: Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Protein-protein interactions (PPIs) play a crucial role in numerous biological processes. Developing methods that predict binding affinity changes under substitution mutations is fundamental for modelling and re-engineering biological systems. Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations. With this contribution, we propose eGRAL, a novel SE(3) equivariant graph neural network (eGNN) architecture designed for predicting binding affinity changes from multiple amino acid substitutions in protein complexes. eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models. To address the limited availability of large-scale affinity assays with structural information, we generate a simulated dataset comprising approximately 500,000 data points. Our model is pre-trained on this dataset, then fine-tuned and tested on experimental data.

[688]  arXiv:2405.02383 (cross-list from stat.ML) [pdf, other]
Title: A Fresh Look at Sanity Checks for Saliency Maps
Comments: arXiv admin note: text overlap with arXiv:2401.06465
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The Model Parameter Randomisation Test (MPRT) is highly recognised in the eXplainable Artificial Intelligence (XAI) community due to its fundamental evaluative criterion: explanations should be sensitive to the parameters of the model they seek to explain. However, recent studies have raised several methodological concerns for the empirical interpretation of MPRT. In response, we propose two modifications to the original test: Smooth MPRT and Efficient MPRT. The former reduces the impact of noise on evaluation outcomes via sampling, while the latter avoids the need for biased similarity measurements by re-interpreting the test through the increase in explanation complexity after full model randomisation. Our experiments show that these modifications enhance the metric reliability, facilitating a more trustworthy deployment of explanation methods.

[689]  arXiv:2405.02406 (cross-list from quant-ph) [pdf, other]
Title: Analysis of Asynchronous Protocols for Entanglement Distribution in Quantum Networks
Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)

The distribution of entanglement in quantum networks is typically approached under idealized assumptions such as perfect synchronization and centralized control, while classical communication is often neglected. However, these assumptions prove impractical in large-scale networks. In this paper, we present a pragmatic perspective by exploring two minimal asynchronous protocols: a parallel scheme generating entanglement independently at the link level, and a sequential scheme extending entanglement iteratively from one party to the other. Our analysis incorporates non-uniform repeater spacings and classical communications and accounts for quantum memory decoherence. We evaluate network performance using metrics such as entanglement bit rate, end-to-end fidelity, and secret key rate for entanglement-based quantum key distribution. Our findings suggest the sequential scheme's superiority due to comparable performance with the parallel scheme, coupled with simpler implementation. Additionally, we propose a cutoff strategy to improve performance by discarding attempts with prolonged memory idle time, effectively eliminating low-quality entanglement links. Finally, we apply our methods to the real-world topology of SURFnet and report the performance as a function of memory coherence time.

[690]  arXiv:2405.02436 (cross-list from physics.app-ph) [pdf, other]
Title: Internet of Paint (IoP): Channel Modeling and Capacity Analysis for Terahertz Electromagnetic Nanonetworks Embedded in Paint
Comments: 15 pages, 17 figures, accepted for publication in the IEEE Journal on Selected Areas in Communications Special Issue on Electromagnetic Nanonetworks
Subjects: Applied Physics (physics.app-ph); Emerging Technologies (cs.ET)

This work opens a new chapter in the 100,000 year-old concept of paint, by leveraging innovations in nano-technology in the sub-THz frequency range. More specifically, the groundbreaking concept of Internet of Paint (IoP) is introduced along with a comprehensive channel model and a capacity analysis for nano-scale radios embedded in paint and communicating through paint. Nano-network devices, integrated within a paint medium, communicate via a multipath strategy, encompassing direct waves, reflections from interfaces, and lateral wave propagation. The evaluation incorporates three distinct paint types to assess path losses, received powers, and channel capacity. Analysis of path loss indicates a slight non-linear increase with both frequency and Line of Sight (LoS) distance between transceivers. Notably, paints with high refractive indexes result in the highest path loss. Moreover, burying transceivers at similar depths near the Air-Paint interface showcases promising performance of lateral waves with increasing LoS distance. Increasing paint layer depth leads to amplified attenuation, while total received power exhibits promising results when in close proximity to the Air-Paint interface but steeply declines with burial depth. Additionally, a substantial reduction in channel capacity is observed with LoS distance and burial depth, so transceivers need to be close together and in proximity of the A-P interface to communicate effectively. Comparing paint and air mediums, IoP demonstrates approximately two orders of magnitude reduction in channel capacity compared to air-based communication channels. This paper provides valuable insights into the potential of IoP communication within paint mediums and offers a foundation for further advancements in this emerging field.

[691]  arXiv:2405.02449 (cross-list from stat.ML) [pdf, other]
Title: Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
Comments: Published in International Conference on Machine Learning, ICML 2024. Code can be found in the Vertaix GitHub: this https URL Paper dedicated to Kwame Nkrumah
Subjects: Machine Learning (stat.ML); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.

[692]  arXiv:2405.02456 (cross-list from math.OC) [pdf, ps, other]
Title: Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where information for all tasks is accessible to a single server, and in the decentralized setting, where a network of agents, each given one task and observing local information, cooperate to find the solution of the globally constrained objective using local communication.
We first propose a primal-dual algorithm that provably converges to the globally optimal solution of this constrained formulation under exact gradient evaluations. When the gradient is unknown, we further develop a sampled-based actor-critic algorithm that finds the optimal policy using online samples of state, action, and reward. Finally, we study the extension of the algorithm to the linear function approximation setting.

[693]  arXiv:2405.02462 (cross-list from math.ST) [pdf, other]
Title: Finite Sample Analysis and Bounds of Generalization Error of Gradient Descent in In-Context Linear Regression
Subjects: Statistics Theory (math.ST); Numerical Analysis (math.NA); Probability (math.PR)

Recent studies show that transformer-based architectures emulate gradient descent during a forward pass, contributing to in-context learning capabilities - an ability where the model adapts to new tasks based on a sequence of prompt examples without being explicitly trained or fine tuned to do so. This work investigates the generalization properties of a single step of gradient descent in the context of linear regression with well-specified models. A random design setting is considered and analytical expressions are derived for the statistical properties of generalization error in a non-asymptotic (finite sample) setting. These expressions are notable for avoiding arbitrary constants, and thus offer robust quantitative information and scaling relationships. These results are contrasted with those from classical least squares regression (for which analogous finite sample bounds are also derived), shedding light on systematic and noise components, as well as optimal step sizes. Additionally, identities involving high-order products of Gaussian random matrices are presented as a byproduct of the analysis.

[694]  arXiv:2405.02480 (cross-list from econ.EM) [pdf, other]
Title: A Network Simulation of OTC Markets with Multiple Agents
Comments: 20 pages, 17 figures
Subjects: Econometrics (econ.EM); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

We present a novel agent-based approach to simulating an over-the-counter (OTC) financial market in which trades are intermediated solely by market makers and agent visibility is constrained to a network topology. Dynamics, such as changes in price, result from agent-level interactions that ubiquitously occur via market maker agents acting as liquidity providers. Two additional agents are considered: trend investors use a deep convolutional neural network paired with a deep Q-learning framework to inform trading decisions by analysing price history; and value investors use a static price-target to determine their trade directions and sizes. We demonstrate that our novel inclusion of a network topology with market makers facilitates explorations into various market structures. First, we present the model and an overview of its mechanics. Second, we validate our findings via comparison to the real-world: we demonstrate a fat-tailed distribution of price changes, auto-correlated volatility, a skew negatively correlated to market maker positioning, predictable price-history patterns and more. Finally, we demonstrate that our network-based model can lend insights into the effect of market-structure on price-action. For example, we show that markets with sparsely connected intermediaries can have a critical point of fragmentation, beyond which the market forms distinct clusters and arbitrage becomes rapidly possible between the prices of different market makers. A discussion is provided on future work that would be beneficial.

[695]  arXiv:2405.02488 (cross-list from stat.ML) [pdf, other]
Title: Modelling Sampling Distributions of Test Statistics with Autograd
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Computation (stat.CO)

Simulation-based inference methods that feature correct conditional coverage of confidence sets based on observations that have been compressed to a scalar test statistic require accurate modelling of either the p-value function or the cumulative distribution function (cdf) of the test statistic. If the model of the cdf, which is typically a deep neural network, is a function of the test statistic then the derivative of the neural network with respect to the test statistic furnishes an approximation of the sampling distribution of the test statistic. We explore whether this approach to modelling conditional 1-dimensional sampling distributions is a viable alternative to the probability density-ratio method, also known as the likelihood-ratio trick. Relatively simple, yet effective, neural network models are used whose predictive uncertainty is quantified through a variety of methods.

[696]  arXiv:2405.02497 (cross-list from math.OC) [pdf, other]
Title: Prediction techniques for dynamic imaging with online primal-dual methods
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)

Online optimisation facilitates the solution of dynamic inverse problems, such as image stabilisation, fluid flow monitoring, and dynamic medical imaging. In this paper, we improve upon previous work on predictive online primal-dual methods on two fronts. Firstly, we provide a more concise analysis that symmetrises previously unsymmetric regret bounds, and relaxes previous restrictive conditions on the dual predictor. Secondly, based on the latter, we develop several improved dual predictors. We numerically demonstrate their efficacy in image stabilisation and dynamic positron emission tomography.

[697]  arXiv:2405.02504 (cross-list from eess.IV) [pdf, other]
Title: Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain functional information conveyed by PET. To this end, we propose a functional imaging constrained diffusion (FICD) framework for 3D brain PET image synthesis with paired structural MRI as input condition, through a new constrained diffusion model (CDM). The FICD introduces noise to PET and then progressively removes it with CDM, ensuring high output fidelity throughout a stable training phase. The CDM learns to predict denoised PET with a functional imaging constraint introduced to ensure voxel-wise alignment between each denoised PET and its ground truth. Quantitative and qualitative analyses conducted on 293 subjects with paired T1-weighted MRI and 18F-fluorodeoxyglucose (FDG)-PET scans suggest that FICD achieves superior performance in generating FDG-PET data compared to state-of-the-art methods. We further validate the effectiveness of the proposed FICD on data from a total of 1,262 subjects through three downstream tasks, with experimental results suggesting its utility and generalizability.

[698]  arXiv:2405.02523 (cross-list from quant-ph) [pdf, other]
Title: Optimal Toffoli-Depth Quantum Adder
Comments: This paper is under review in ACM Transactions on Quantum Computing
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

Efficient quantum arithmetic circuits are commonly found in numerous quantum algorithms of practical significance. Till date, the logarithmic-depth quantum adders includes a constant coefficient k >= 2 while achieving the Toffoli-Depth of klog n + O(1). In this work, 160 alternative compositions of the carry-propagation structure are comprehensively explored to determine the optimal depth structure for a quantum adder. By extensively studying these structures, it is shown that an exact Toffoli-Depth of log n + O(1) is achievable. This presents a reduction of Toffoli-Depth by almost 50% compared to the best known quantum adder circuits presented till date. We demonstrate a further possible design by incorporating a different expansion of propagate and generate forms, as well as an extension of the modular framework. Our paper elaborates on these designs, supported by detailed theoretical analyses and simulation-based studies, firmly substantiating our claims of optimality. The results also mirror similar improvements, recently reported in classical adder circuit complexity.

[699]  arXiv:2405.02545 (cross-list from astro-ph.SR) [pdf, other]
Title: Prediction of Space Weather Events through Analysis of Active Region Magnetograms using Convolutional Neural Network
Authors: Shlesh Sakpal
Comments: 6 pages, 12 figures
Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Earth and Planetary Astrophysics (astro-ph.EP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Although space weather events may not directly affect human life, they have the potential to inflict significant harm upon our communities. Harmful space weather events can trigger atmospheric changes that result in physical and economic damages on a global scale. In 1989, Earth experienced the effects of a powerful geomagnetic storm that caused satellites to malfunction, while triggering power blackouts in Canada, along with electricity disturbances in the United States and Europe. With the solar cycle peak rapidly approaching, there is an ever-increasing need to prepare and prevent the damages that can occur, especially to modern-day technology, calling for the need of a comprehensive prediction system. This study aims to leverage machine learning techniques to predict instances of space weather (solar flares, coronal mass ejections, geomagnetic storms), based on active region magnetograms of the Sun. This was done through the use of the NASA DONKI service to determine when these solar events occur, then using data from the NASA Solar Dynamics Observatory to compile a dataset that includes magnetograms of active regions of the Sun 24 hours before the events. By inputting the magnetograms into a convolutional neural network (CNN) trained from this dataset, it can serve to predict whether a space weather event will occur, and what type of event it will be. The model was designed using a custom architecture CNN, and returned an accuracy of 90.27%, a precision of 85.83%, a recall of 91.78%, and an average F1 score of 92.14% across each class (Solar flare [Flare], geomagnetic storm [GMS], coronal mass ejection [CME]). Our results show that using magnetogram data as an input for a CNN is a viable method to space weather prediction. Future work can involve prediction of the magnitude of solar events.

[700]  arXiv:2405.02563 (cross-list from eess.SP) [pdf, other]
Title: Deep Representation Learning-Based Dynamic Trajectory Phenotyping for Acute Respiratory Failure in Medical Intensive Care Units
Comments: 9 pages
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Sepsis-induced acute respiratory failure (ARF) is a serious complication with a poor prognosis. This paper presents a deep representation learningbased phenotyping method to identify distinct groups of clinical trajectories of septic patients with ARF. For this retrospective study, we created a dataset from electronic medical records (EMR) consisting of data from sepsis patients admitted to medical intensive care units who required at least 24 hours of invasive mechanical ventilation at a quarternary care academic hospital in southeast USA for the years 2016-2021. A total of N=3349 patient encounters were included in this study. Clustering Representation Learning on Incomplete Time Series Data (CRLI) algorithm was applied to a parsimonious set of EMR variables in this data set. To validate the optimal number of clusters, the K-means algorithm was used in conjunction with dynamic time warping. Our model yielded four distinct patient phenotypes that were characterized as liver dysfunction/heterogeneous, hypercapnia, hypoxemia, and multiple organ dysfunction syndrome by a critical care expert. A Kaplan-Meier analysis to compare the 28-day mortality trends exhibited significant differences (p < 0.005) between the four phenotypes. The study demonstrates the utility of our deep representation learning-based approach in unraveling phenotypes that reflect the heterogeneity in sepsis-induced ARF in terms of different mortality outcomes and severity. These phenotypes might reveal important clinical insights into an effective prognosis and tailored treatment strategies.

[701]  arXiv:2405.02570 (cross-list from q-fin.CP) [pdf, ps, other]
Title: Gradient-enhanced sparse Hermite polynomial expansions for pricing and hedging high-dimensional American options
Subjects: Computational Finance (q-fin.CP); Numerical Analysis (math.NA)

We propose an efficient and easy-to-implement gradient-enhanced least squares Monte Carlo method for computing price and Greeks (i.e., derivatives of the price function) of high-dimensional American options. It employs the sparse Hermite polynomial expansion as a surrogate model for the continuation value function, and essentially exploits the fast evaluation of gradients. The expansion coefficients are computed by solving a linear least squares problem that is enhanced by gradient information of simulated paths. We analyze the convergence of the proposed method, and establish an error estimate in terms of the best approximation error in the weighted $H^1$ space, the statistical error of solving discrete least squares problems, and the time step size. We present comprehensive numerical experiments to illustrate the performance of the proposed method. The results show that it outperforms the state-of-the-art least squares Monte Carlo method with more accurate price, Greeks, and optimal exercise strategies in high dimensions but with nearly identical computational cost, and it can deliver comparable results with recent neural network-based methods up to dimension 100.

[702]  arXiv:2405.02588 (cross-list from math.OC) [pdf, ps, other]
Title: Inexact Adaptive Cubic Regularization Algorithms on Riemannian Manifolds and Application
Authors: Z. Y. Li, X. M. Wang
Comments: 15 pages, 1 table
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

The adaptive cubic regularization algorithm employing the inexact gradient and Hessian is proposed on general Riemannian manifolds, together with the iteration complexity to get an approximate second-order optimality under certain assumptions on accuracies about the inexact gradient and Hessian. The algorithm extends the inexact adaptive cubic regularization algorithm under true gradient in [Math. Program., 184(1-2): 35-70, 2020] to more general cases even in Euclidean settings. As an application, the algorithm is applied to solve the joint diagonalization problem on the Stiefel manifold. Numerical experiments illustrate that the algorithm performs better than the inexact trust-region algorithm in [Advances of the neural information processing systems, 31, 2018].

[703]  arXiv:2405.02603 (cross-list from physics.comp-ph) [pdf, other]
Title: A Massively Parallel Performance Portable Free-space Spectral Poisson Solver
Comments: 18 pages, 11 figures
Subjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC)

Vico et al. (2016) suggest a fast algorithm for computing volume potentials, beneficial to fields with problems requiring the solution of Poisson's equation with free-space boundary conditions, such as the beam and plasma physics communities. Currently, the standard method for solving the free-space Poisson equation is the algorithm of Hockney and Eastwood (1988), which is second order in convergence at best. The algorithm proposed by Vico et al. converges spectrally for sufficiently smooth functions i.e. faster than any fixed order in the number of grid points. In this paper, we implement a performance portable version of the traditional Hockney-Eastwood and the novel Vico-Greengard Poisson solver as part of the IPPL (Independent Parallel Particle Layer) library. For sufficiently smooth source functions, the Vico-Greengard algorithm achieves higher accuracy than the Hockney-Eastwood method with the same grid size, reducing the computational demands of high resolution simulations since one could use coarser grids to achieve them. More concretely, to get a relative error of $10^{-4}$ between the numerical and analytical solution, one requires only $16^3$ grid points in the former, but $128^3$ in the latter, more than a 99% memory footprint reduction. Additionally, we propose an algorithmic improvement to the Vico-Greengard method which further reduces its memory footprint. This is particularly important for GPUs which have limited memory resources, and should be taken into account when selecting numerical algorithms for performance portable codes. Finally, we showcase performance through GPU and CPU scaling studies on the Perlmutter (NERSC) supercomputer, with efficiencies staying above 50% in the strong scaling case.

[704]  arXiv:2405.02621 (cross-list from math.CO) [pdf, ps, other]
Title: Intersecting families with covering number $3$
Authors: Andrey Kupavskii
Comments: arXiv admin note: substantial text overlap with arXiv:1810.00920
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

A covering number of a family is the size of the smallest set that intersects all sets from the family. In 1978 Frankl determined for $n\ge n_0(k)$ the largest intersecting family of $k$-element subsets of $[n]$ with covering number $3$. In this paper, we essentially settle this problem, showing that the same family is extremal for any $k\ge 100$ and $n>2k$.

[705]  arXiv:2405.02630 (cross-list from quant-ph) [pdf, other]
Title: cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK
Comments: 10 pages, 14 figures
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulations become infeasible for qubit counts over 50, our evaluation demonstrates that cuTensorNet speeds up simulations to be completed within seconds on the NVIDIA A100 GPU, even for qubit counts approaching 784. By employing multi-GPU processing with Message Passing Interface (MPI), we document a marked decrease in computation times, effectively demonstrating the strong linear speedup of our approach for increasing data sizes. This enables QSVMs to operate efficiently on High-Performance Computing (HPC) systems, thereby opening a new window for researchers to explore complex quantum algorithms that have not yet been investigated. In accuracy assessments, our QSVM achieves up to 95\% on challenging classifications within the MNIST dataset for training sets larger than 100 instances, surpassing the capabilities of classical SVMs. These advancements position cuTensorNet within the cuQuantum SDK as a pivotal tool for scaling quantum machine learning simulations and potentially signpost the seamless integration of such computational strategies as pivotal within the Quantum-HPC ecosystem.

[706]  arXiv:2405.02679 (cross-list from physics.ao-ph) [pdf, other]
Title: Prévisions météorologiques basées sur l'intelligence artificielle : une révolution peut en cacher une autre
Comments: 8 pages, in French
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)

Artificial intelligence (AI), based on deep-learning algorithm using high-quality reanalysis datasets, is showing enormous potential for weather forecasting. In this context, the European Centre for Medium-Range Weather Forecasts (ECMWF) is developing a new forecasting system based on AI. Verification results of deterministic forecast for now are promising. However, the realism of weather forecasts based on AI is often questioned. Here, different types of realism are identified and we discuss, in particular, the relationship between structural realism and predictability of weather events. Furthermore, a statistical analysis of deterministic forecasts based on AI points to a realism/performance dilemma that a probabilistic approach should help to solve. -- L'intelligence artificielle (IA) bouleverse aujourd'hui le monde de la pr\'evision m\'et\'eorologique avec l'utilisation d'algorithmes d'apprentissage profond nourris par des champs de r\'eanalyses. Dans ce contexte, le Centre Europ\'een pour les Pr\'evisions M\'et\'eorologiques \`a Moyen Terme (CEPMMT) a d\'ecid\'e de d\'evelopper un nouveau syst\`eme de pr\'evisions resposant sur l'IA. Ces pr\'evisions, pour le moment de type d\'eterministe, montrent des r\'esultats prometteurs. Toutefois, le r\'ealisme de ce type de pr\'evisions reposant sur l'IA est souvent questionn\'e. Ici, nous identifions diff\'erents types de r\'ealisme et interrogeons notamment le rapport entre r\'ealisme structurel et pr\'evisibilit\'e des \'ev\^enements m\'et\'eorologiques. Une analyse statistique de pr\'evisions d\'eterministes reposant sur l'IA laisse apparaitre un dilemme r\'ealisme/performance qu'une approche probabiliste devrait aider \`a r\'esoudre.

[707]  arXiv:2405.02718 (cross-list from eess.SP) [pdf, other]
Title: Zak-OTFS: Pulse Shaping and the Tradeoff between Time/Bandwidth Expansion and Predictability
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

The Zak-OTFS input/output (I/O) relation is predictable and non-fading when the delay and Doppler periods are greater than the effective channel delay and Doppler spreads, a condition which we refer to as the crystallization condition. When the crystallization condition is satisfied, we describe how to integrate sensing and communication within a single Zak-OTFS subframe by transmitting a pilot in the center of the subframe and surrounding the pilot with a pilot region and guard band to mitigate interference between data symbols and pilot. At the receiver we first read off the effective channel taps within the pilot region, and then use the estimated channel taps to recover the data from the symbols received outside the pilot region. We introduce a framework for filter design in the delay-Doppler (DD) domain where the symplectic Fourier transform connects aliasing in the DD domain (predictability of the I/O relation) with time/bandwidth expansion. The choice of pulse shaping filter determines the fraction of pilot energy that lies outside the pilot region and the degradation in BER performance that results from the interference to data symbols. We demonstrate that Gaussian filters in the DD domain provide significant improvements in BER performance over the sinc and root raised cosine filters considered in previous work. We also demonstrate that, by limiting DD domain aliasing, Gaussian filters extend the region where the crystallization condition is satisfied. The Gaussian filters considered in this paper are a particular case of factorizable pulse shaping filters in the DD domain, and this family of filters may be of independent interest.

[708]  arXiv:2405.02753 (cross-list from math.OC) [pdf, ps, other]
Title: Unscented Trajectory Optimization
Comments: 21 pages, 11 figures 2023 AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, Aug 13-17, 2023
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Statistics Theory (math.ST); Computation (stat.CO)

In a nutshell, unscented trajectory optimization is the generation of optimal trajectories through the use of an unscented transform. Although unscented trajectory optimization was introduced by the authors about a decade ago, it is reintroduced in this paper as a special instantiation of tychastic optimal control theory. Tychastic optimal control theory (from \textit{Tyche}, the Greek goddess of chance) avoids the use of a Brownian motion and the resulting It\^{o} calculus even though it uses random variables across the entire spectrum of a problem formulation. This approach circumvents the enormous technical and numerical challenges associated with stochastic trajectory optimization. Furthermore, it is shown how a tychastic optimal control problem that involves nonlinear transformations of the expectation operator can be quickly instantiated using an unscented transform. These nonlinear transformations are particularly useful in managing trajectory dispersions be it associated with path constraints or targeted values of final-time conditions. This paper also presents a systematic and rapid process for formulating and computing the most desirable tychastic trajectory using an unscented transform. Numerical examples are used to illustrate how unscented trajectory optimization may be used for risk reduction and mission recovery caused by uncertainties and failures.

[709]  arXiv:2405.02783 (cross-list from stat.ML) [pdf, other]
Title: Linear Noise Approximation Assisted Bayesian Inference on Mechanistic Model of Partially Observed Stochastic Reaction Network
Authors: Wandi Xu, Wei Xie
Comments: 11 pages, 2 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

To support mechanism online learning and facilitate digital twin development for biomanufacturing processes, this paper develops an efficient Bayesian inference approach for partially observed enzymatic stochastic reaction network (SRN), a fundamental building block of multi-scale bioprocess mechanistic model. To tackle the critical challenges brought by the nonlinear stochastic differential equations (SDEs)-based mechanistic model with partially observed state and having measurement error, an interpretable Bayesian updating linear noise approximation (LNA) metamodel, incorporating the structure information of the mechanistic model, is proposed to approximate the likelihood of observations. Then, an efficient posterior sampling approach is developed by utilizing the gradients of the derived likelihood to speed up the convergence of MCMC. The empirical study demonstrates that the proposed approach has a promising performance.

[710]  arXiv:2405.02784 (cross-list from eess.IV) [pdf, other]
Title: MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

A transformer-based deep learning model, MR-Transformer, was developed for total knee replacement (TKR) prediction using magnetic resonance imaging (MRI). The model incorporates the ImageNet pre-training and captures three-dimensional (3D) spatial correlation from the MR images. The performance of the proposed model was compared to existing state-of-the-art deep learning models for knee injury diagnosis using MRI. Knee MR scans of four different tissue contrasts from the Osteoarthritis Initiative and Multicenter Osteoarthritis Study databases were utilized in the study. Experimental results demonstrated the state-of-the-art performance of the proposed model on TKR prediction using MRI.

[711]  arXiv:2405.02849 (cross-list from q-fin.CP) [pdf, other]
Title: Modelling Opaque Bilateral Market Dynamics in Financial Trading: Insights from a Multi-Agent Simulation Study
Comments: 13 pages, 3 figures
Subjects: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Exploring complex adaptive financial trading environments through multi-agent based simulation methods presents an innovative approach within the realm of quantitative finance. Despite the dominance of multi-agent reinforcement learning approaches in financial markets with observable data, there exists a set of systematically significant financial markets that pose challenges due to their partial or obscured data availability. We, therefore, devise a multi-agent simulation approach employing small-scale meta-heuristic methods. This approach aims to represent the opaque bilateral market for Australian government bond trading, capturing the bilateral nature of bank-to-bank trading, also referred to as "over-the-counter" (OTC) trading, and commonly occurring between "market makers". The uniqueness of the bilateral market, characterized by negotiated transactions and a limited number of agents, yields valuable insights for agent-based modelling and quantitative finance. The inherent rigidity of this market structure, which is at odds with the global proliferation of multilateral platforms and the decentralization of finance, underscores the unique insights offered by our agent-based model. We explore the implications of market rigidity on market structure and consider the element of stability, in market design. This extends the ongoing discourse on complex financial trading environments, providing an enhanced understanding of their dynamics and implications.

[712]  arXiv:2405.02852 (cross-list from eess.IV) [pdf, other]
Title: On Enhancing Brain Tumor Segmentation Across Diverse Populations with Convolutional Neural Networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Brain tumor segmentation is a fundamental step in assessing a patient's cancer progression. However, manual segmentation demands significant expert time to identify tumors in 3D multimodal brain MRI scans accurately. This reliance on manual segmentation makes the process prone to intra- and inter-observer variability. This work proposes a brain tumor segmentation method as part of the BraTS-GoAT challenge. The task is to segment tumors in brain MRI scans automatically from various populations, such as adults, pediatrics, and underserved sub-Saharan Africa. We employ a recent CNN architecture for medical image segmentation, namely MedNeXt, as our baseline, and we implement extensive model ensembling and postprocessing for inference. Our experiments show that our method performs well on the unseen validation set with an average DSC of 85.54% and HD95 of 27.88. The code is available on https://github.com/BioMedIA-MBZUAI/BraTS2024_BioMedIAMBZ.

[713]  arXiv:2405.02857 (cross-list from eess.IV) [pdf, other]
Title: I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution from other views. Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution. The through-plane branch supplements the limited information contained in low through-plane resolution from high in-plane resolution and enables continual and diverse feature learning. In-plane branch transforms features to the frequency domain and enforces an equal learning opportunity for all frequency bands in a global context learning paradigm. We further propose a cross-view block to take advantage of the information from all three views online. Extensive experiments on two public datasets demonstrate the effectiveness of I$^3$Net, and noticeably outperforms state-of-the-art super-resolution, video frame interpolation and slice interpolation methods by a large margin. We achieve 43.90dB in PSNR, with at least 1.14dB improvement under the upscale factor of $\times$2 on MSD dataset with faster inference. Code is available at https://github.com/DeepMed-Lab-ECNU/Medical-Image-Reconstruction.

[714]  arXiv:2405.02885 (cross-list from eess.SP) [pdf, other]
Title: Performance Analysis of Underwater Acoustic Channel Amid Jamming by Random Jammers
Subjects: Signal Processing (eess.SP); Emerging Technologies (cs.ET)

Underwater communication networks are increasingly popularized by various important maritime applications. However, this also leads to an increased threat landscape. This letter presents the first study that considers jamming attacks by random jammers present in the surroundings of legitimate transceivers in underwater acoustic communication systems. We investigate the impact of jamming attacks on various performance parameters of the legitimate underwater acoustic communication link. In particular, we investigate the legitimate link using stochastic geometry for important performance parameters, namely coverage probability, average rate, and energy efficiency of the link between two legitimate nodes, i.e., underwater and surface nodes. We then derive and present tractable expressions for these performance parameters. Finally, we performed a Monte Carlo simulation to validate our analysis. We plot the performance metrics against the transmit power, and jamming power for different intensities of the jammers in shallow, mid, and deep water scenarios. Results reveal that on average, jamming in deep water has a relatively high impact on the performance of legitimate link than in shallow water.

[715]  arXiv:2405.02942 (cross-list from physics.optics) [pdf, other]
Title: Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens
Comments: Accepted to Optics & Laser Technology
Subjects: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)

We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360{\deg}x(35{\deg}~110{\deg}) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 lenses. Moreover, we establish a physical structure model of PAL using the ray tracing method and study the influence of its physical parameters on compactness ratio. In addition, for the evaluation of local tolerances of annular surfaces, we propose a tolerance analysis method suitable for ASPAL. This analytical method can effectively analyze surface irregularities on annular surfaces and provide clear guidance on manufacturing tolerances for ASPAL. Benefiting from high-precision glass molding and injection molding aspheric lens manufacturing techniques, we finally manufactured 20 ASPALs in small batches. The weight of an ASPAL prototype is only 8.5 g. Our framework provides promising insights for the application of panoramic systems in space and weight-constrained environmental sensing scenarios such as intelligent security, micro-UAVs, and micro-robots.

[716]  arXiv:2405.03008 (cross-list from eess.IV) [pdf, other]
Title: DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Comments: 8 pages, 8 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git

[717]  arXiv:2405.03054 (cross-list from quant-ph) [pdf, other]
Title: A Greedy Quantum Route-Generation Algorithm
Authors: Jordan Makansi
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)

Routing and scheduling problems with time windows have long been important optimization problems for logistics and planning. Many classical heuristics and exact methods exist for such problems. However, there are no satisfactory methods for generating routes using quantum computing (QC), for mainly two reasons: inequality constraints, and the trade-off of feasibility and solution quality. Inequality constraints are typically handled using slack variables; and feasible solutions are found by filtering samples. These challenges are amplified in the presence of noise inherent in QC. Here, we propose a greedy algorithm that generates routes by using information from all samples obtained from the quantum computer. By noticing the relationship between qubits in our formulation as a directed acyclic graph (DAG), we designed an algorithm that adaptively constructs a feasible solution.
We prove its convergence to a feasible solution, and illustrate its efficacy by solving the Fleet Sizing Vehicle Routing Problem with Time Windows (FSVRPTW). Our computational results show that this method obtains a lower objective value than the current state-of-the-art annealing approaches, both classical and hybrid, for the same amount of time using D-Wave Hybrid Solvers. We also show its robustness to noise on D-Wave Advantage 4.1 through computational results as compared to the filtering approach on DWaveSampler, even when the filtering approach is given a longer annealing time, and a larger sample size.

[718]  arXiv:2405.03063 (cross-list from math.ST) [pdf, other]
Title: Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection
Authors: Jingbo Liu
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.

[719]  arXiv:2405.03069 (cross-list from math.LO) [pdf, ps, other]
Title: On Probabilistic and Causal Reasoning with Summation Operators
Subjects: Logic (math.LO); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Logic in Computer Science (cs.LO)

Ibeling et al. (2023). axiomatize increasingly expressive languages of causation and probability, and Mosse et al. (2024) show that reasoning (specifically the satisfiability problem) in each causal language is as difficult, from a computational complexity perspective, as reasoning in its merely probabilistic or "correlational" counterpart. Introducing a summation operator to capture common devices that appear in applications -- such as the $do$-calculus of Pearl (2009) for causal inference, which makes ample use of marginalization -- van der Zander et al. (2023) partially extend these earlier complexity results to causal and probabilistic languages with marginalization. We complete this extension, fully characterizing the complexity of probabilistic and causal reasoning with summation, demonstrating that these again remain equally difficult. Surprisingly, allowing free variables for random variable values results in a system that is undecidable, so long as the ranges of these random variables are unrestricted. We finally axiomatize these languages featuring marginalization (or more generally summation), resolving open questions posed by Ibeling et al. (2023).

[720]  arXiv:2405.03083 (cross-list from stat.ME) [pdf, other]
Title: Causal K-Means Clustering
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Causal effects are often characterized with population summaries. These might provide an incomplete picture when there are heterogeneous treatment effects across subgroups. Since the subgroup structure is typically unknown, it is more challenging to identify and evaluate subgroup effects than population effects. We propose a new solution to this problem: Causal k-Means Clustering, which harnesses the widely-used k-means clustering algorithm to uncover the unknown subgroup structure. Our problem differs significantly from the conventional clustering setup since the variables to be clustered are unknown counterfactual functions. We present a plug-in estimator which is simple and readily implementable using off-the-shelf algorithms, and study its rate of convergence. We also develop a new bias-corrected estimator based on nonparametric efficiency theory and double machine learning, and show that this estimator achieves fast root-n rates and asymptotic normality in large nonparametric models. Our proposed methods are especially useful for modern outcome-wide studies with multiple treatment levels. Further, our framework is extensible to clustering with generic pseudo-outcomes, such as partially observed outcomes or otherwise unknown functions. Finally, we explore finite sample properties via simulation, and illustrate the proposed methods in a study of treatment programs for adolescent substance abuse.

[721]  arXiv:2405.03092 (cross-list from cond-mat.mtrl-sci) [pdf, other]
Title: Bayesian optimization for stable properties amid processing fluctuations in sputter deposition
Journal-ref: J. Vac. Sci. Technol. A 1 May 2024; 42 (3): 033408
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Optimization and Control (math.OC)

We introduce a Bayesian optimization approach to guide the sputter deposition of molybdenum thin films, aiming to achieve desired residual stress and sheet resistance while minimizing susceptibility to stochastic fluctuations during deposition. Thin films are pivotal in numerous technologies, including semiconductors and optical devices, where their properties are critical. Sputter deposition parameters, such as deposition power, vacuum chamber pressure, and working distance, influence physical properties like residual stress and resistance. Excessive stress and high resistance can impair device performance, necessitating the selection of optimal process parameters. Furthermore, these parameters should ensure the consistency and reliability of thin film properties, assisting in the reproducibility of the devices. However, exploring the multidimensional design space for process optimization is expensive. Bayesian optimization is ideal for optimizing inputs/parameters of general black-box functions without reliance on gradient information. We utilize Bayesian optimization to optimize deposition power and pressure using a custom-built objective function incorporating observed stress and resistance data. Additionally, we integrate prior knowledge of stress variation with pressure into the objective function to prioritize films least affected by stochastic variations. Our findings demonstrate that Bayesian optimization effectively explores the design space and identifies optimal parameter combinations meeting desired stress and resistance specifications.

[722]  arXiv:2405.03123 (cross-list from math.OC) [pdf, other]
Title: Revealing Decision Conservativeness Through Inverse Distributionally Robust Optimization
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper introduces Inverse Distributionally Robust Optimization (I-DRO) as a method to infer the conservativeness level of a decision-maker, represented by the size of a Wasserstein metric-based ambiguity set, from the optimal decisions made using Forward Distributionally Robust Optimization (F-DRO). By leveraging the Karush-Kuhn-Tucker (KKT) conditions of the convex F-DRO model, we formulate I-DRO as a bi-linear program, which can be solved using off-the-shelf optimization solvers. Additionally, this formulation exhibits several advantageous properties. We demonstrate that I-DRO not only guarantees the existence and uniqueness of an optimal solution but also establishes the necessary and sufficient conditions for this optimal solution to accurately match the actual conservativeness level in F-DRO. Furthermore, we identify three extreme scenarios that may impact I-DRO effectiveness. Our case study applies F-DRO for power system scheduling under uncertainty and employs I-DRO to recover the conservativeness level of system operators. Numerical experiments based on an IEEE 5-bus system and a realistic NYISO 11-zone system demonstrate I-DRO performance in both normal and extreme scenarios.

[723]  arXiv:2405.03129 (cross-list from eess.SP) [pdf, other]
Title: Active Sensing for Multiuser Beam Tracking with Reconfigurable Intelligent Surface
Authors: Han Han, Tao Jiang, Wei Yu
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channel sensing intervals, the AP then adaptively configures the beamformers and the RIS reflection coefficients for subsequent data transmission based on the received pilots. This is an active sensing problem, because channel sensing involves configuring the RIS coefficients during the pilot stage and the optimal sensing strategy should exploit the trajectory of channel state information (CSI) from previously received pilots. Analytical solution to such an active sensing problem is very challenging. In this paper, we propose a deep learning framework utilizing a recurrent neural network (RNN) to automatically summarize the time-varying CSI obtained from the periodically received pilots into state vectors. These state vectors are then mapped to the AP beamformers and RIS reflection coefficients for subsequent downlink data transmissions, as well as the RIS reflection coefficients for the next round of uplink channel sensing. The mappings from the state vectors to the downlink beamformers and the RIS reflection coefficients for both channel sensing and downlink data transmission are performed using graph neural networks (GNNs) to account for the interference among the UEs. Simulations demonstrate significant and interpretable performance improvement of the proposed approach over the existing data-driven methods with nonadaptive channel sensing schemes.

[724]  arXiv:2405.03130 (cross-list from stat.ML) [pdf, other]
Title: Deep Learning for Causal Inference: A Comparison of Architectures for Heterogeneous Treatment Effect Estimation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Causal inference has gained much popularity in recent years, with interests ranging from academic, to industrial, to educational, and all in between. Concurrently, the study and usage of neural networks has also grown profoundly (albeit at a far faster rate). What we aim to do in this blog write-up is demonstrate a Neural Network causal inference architecture. We develop a fully connected neural network implementation of the popular Bayesian Causal Forest algorithm, a state of the art tree based method for estimating heterogeneous treatment effects. We compare our implementation to existing neural network causal inference methodologies, showing improvements in performance in simulation settings. We apply our method to a dataset examining the effect of stress on sleep.

[725]  arXiv:2405.03141 (cross-list from eess.IV) [pdf, other]
Title: Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.

[726]  arXiv:2405.03180 (cross-list from stat.ML) [pdf, other]
Title: Braced Fourier Continuation and Regression for Anomaly Detection
Authors: Josef Sabuda
Comments: 16 pages, 9 figures, associated Github link: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

In this work, the concept of Braced Fourier Continuation and Regression (BFCR) is introduced. BFCR is a novel and computationally efficient means of finding nonlinear regressions or trend lines in arbitrary one-dimensional data sets. The Braced Fourier Continuation (BFC) and BFCR algorithms are first outlined, followed by a discussion of the properties of BFCR as well as demonstrations of how BFCR trend lines may be used effectively for anomaly detection both within and at the edges of arbitrary one-dimensional data sets. Finally, potential issues which may arise while using BFCR for anomaly detection as well as possible mitigation techniques are outlined and discussed. All source code and example data sets are either referenced or available via GitHub, and all associated code is written entirely in Python.

[727]  arXiv:2405.03198 (cross-list from stat.ML) [pdf, other]
Title: Stability Evaluation via Distributional Perturbation Analysis
Comments: Accepted by ICML 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this paper, we utilize the optimal transport (OT) discrepancy with moment constraints on the \textit{(sample, density)} space to quantify this perturbation. Therefore, our stability evaluation criterion can address both \emph{data corruptions} and \emph{sub-population shifts} -- the two most common types of distribution shifts in real-world scenarios. To further realize practical benefits, we present a series of tractable convex formulations and computational methods tailored to different classes of loss functions. The key technical tool to achieve this is the strong duality theorem provided in this paper. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion's ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models.

[728]  arXiv:2405.03230 (cross-list from eess.SP) [pdf, other]
Title: Slicing for Dense Smart Factory Network: Current State, Scenarios, Challenges and Expectations
Comments: Submitted to IEEE ETFA 2024 - IEEE International Conference on Emerging Technologies and Factory Automation, 8 Pages
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

In the era of Industry 4.0, smart factories have emerged as a paradigm shift, redefining manufacturing with the integration of advanced digital technologies. Central to this transformation is the deployment of 5G networks, offering unprecedented levels of connectivity, speed, reliability, and ultra-low latency. Among the revolutionary features of 5G is network slicing, a technology that offers enhanced capabilities through the customization of network resources by allowing multiple logical networks (or slices) to run on top of a shared physical infrastructure. This capability is particularly crucial in the densely packed and highly dynamic environment of smart factories, where diverse applications - from robotic automation to real-time analytics - demand varying network requirements. In this paper, we present a comprehensive overview of the integration of slicing in smart factory networks, emphasizing its critical role in enhancing operational efficiency and supporting the diverse requirements of future manufacturing processes. We elaborate on the recent advances, and technical scenarios, including indoor factory propagation conditions, traffic characteristics, system requirements, slice-aware radio resource management, network elements, enabling technologies and current standardisation efforts. Additionally, we identify open research challenges as well as key technical issues stifling deployments. Finally, we speculate on the future trajectory of slicing-enabled smart factories, emphasizing the need for continuous adaptation to emerging technologies.

[729]  arXiv:2405.03270 (cross-list from math.CO) [pdf, ps, other]
Title: Matroid-reachability-based decomposition into arborescences
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

The problem of matroid-reachability-based packing of arborescences was solved by Kir\'aly. Here we solve the corresponding decomposition problem that turns out to be more complicated. The result is obtained from the solution of the more general problem of matroid-reachability-based $(\ell,\ell')$-limited packing of arborescences where we are given a lower bound $\ell$ and an upper bound $\ell'$ on the total number of arborescences in the packing. The problem is considered for branchings and in directed hypergraphs as well.

[730]  arXiv:2405.03293 (cross-list from astro-ph.IM) [pdf, other]
Title: Deep Learning and genetic algorithms for cosmological Bayesian inference speed-up
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

In this paper, we present a novel approach to accelerate the Bayesian inference process, focusing specifically on the nested sampling algorithms. Bayesian inference plays a crucial role in cosmological parameter estimation, providing a robust framework for extracting theoretical insights from observational data. However, its computational demands can be substantial, primarily due to the need for numerous likelihood function evaluations. Our proposed method utilizes the power of deep learning, employing feedforward neural networks to approximate the likelihood function dynamically during the Bayesian inference process. Unlike traditional approaches, our method trains neural networks on-the-fly using the current set of live points as training data, without the need for pre-training. This flexibility enables adaptation to various theoretical models and datasets. We perform simple hyperparameter optimization using genetic algorithms to suggest initial neural network architectures for learning each likelihood function. Once sufficient accuracy is achieved, the neural network replaces the original likelihood function. The implementation integrates with nested sampling algorithms and has been thoroughly evaluated using both simple cosmological dark energy models and diverse observational datasets. Additionally, we explore the potential of genetic algorithms for generating initial live points within nested sampling inference, opening up new avenues for enhancing the efficiency and effectiveness of Bayesian inference methods.

[731]  arXiv:2405.03347 (cross-list from math.NT) [pdf, ps, other]
Title: Perfect codes over non-prime power alphabets: an approach based on Diophantine equations
Comments: 8 pages
Subjects: Number Theory (math.NT); Information Theory (cs.IT)

The classification of perfect codes over non-prime power alphabets has been an open problem for which there have been no new results in almost $50$ years. In this paper, we show non-existence of perfect $2-$error correcting codes over $q-$ary alphabets for more than 170 new values of $q$. Our methods rely on techniques from the resolution of generalised Ramanujan--Nagell equations and from computational number theory.

[732]  arXiv:2405.03408 (cross-list from astro-ph.IM) [pdf, other]
Title: An Image Quality Evaluation and Masking Algorithm Based On Pre-trained Deep Neural Networks
Comments: Accepted by the AJ. The code could be downloaded from: this https URL with DOI of: 10.12149/101415
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Solar and Stellar Astrophysics (astro-ph.SR); Computer Vision and Pattern Recognition (cs.CV)

With the growing amount of astronomical data, there is an increasing need for automated data processing pipelines, which can extract scientific information from observation data without human interventions. A critical aspect of these pipelines is the image quality evaluation and masking algorithm, which evaluates image qualities based on various factors such as cloud coverage, sky brightness, scattering light from the optical system, point spread function size and shape, and read-out noise. Occasionally, the algorithm requires masking of areas severely affected by noise. However, the algorithm often necessitates significant human interventions, reducing data processing efficiency. In this study, we present a deep learning based image quality evaluation algorithm that uses an autoencoder to learn features of high quality astronomical images. The trained autoencoder enables automatic evaluation of image quality and masking of noise affected areas. We have evaluated the performance of our algorithm using two test cases: images with point spread functions of varying full width half magnitude, and images with complex backgrounds. In the first scenario, our algorithm could effectively identify variations of the point spread functions, which can provide valuable reference information for photometry. In the second scenario, our method could successfully mask regions affected by complex regions, which could significantly increase the photometry accuracy. Our algorithm can be employed to automatically evaluate image quality obtained by different sky surveying projects, further increasing the speed and robustness of data processing pipelines.

[733]  arXiv:2405.03435 (cross-list from cond-mat.dis-nn) [pdf, other]
Title: A method for quantifying the generalization capabilities of generative models for solving Ising models
Comments: 10 pages, 7 figures
Journal-ref: Mach. Learn.: Sci. Technol. 5 (2024) 025011
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

For Ising models with complex energy landscapes, whether the ground state can be found by neural networks depends heavily on the Hamming distance between the training datasets and the ground state. Despite the fact that various recently proposed generative models have shown good performance in solving Ising models, there is no adequate discussion on how to quantify their generalization capabilities. Here we design a Hamming distance regularizer in the framework of a class of generative models, variational autoregressive networks (VAN), to quantify the generalization capabilities of various network architectures combined with VAN. The regularizer can control the size of the overlaps between the ground state and the training datasets generated by networks, which, together with the success rates of finding the ground state, form a quantitative metric to quantify their generalization capabilities. We conduct numerical experiments on several prototypical network architectures combined with VAN, including feed-forward neural networks, recurrent neural networks, and graph neural networks, to quantify their generalization capabilities when solving Ising models. Moreover, considering the fact that the quantification of the generalization capabilities of networks on small-scale problems can be used to predict their relative performance on large-scale problems, our method is of great significance for assisting in the Neural Architecture Search field of searching for the optimal network architectures when solving large-scale Ising models.

[734]  arXiv:2405.03468 (cross-list from stat.ML) [pdf, other]
Title: Hierarchic Flows to Estimate and Sample High-dimensional Probabilities
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)

Finding low-dimensional interpretable models of complex physical fields such as turbulence remains an open question, 80 years after the pioneer work of Kolmogorov. Estimating high-dimensional probability distributions from data samples suffers from an optimization and an approximation curse of dimensionality. It may be avoided by following a hierarchic probability flow from coarse to fine scales. This inverse renormalization group is defined by conditional probabilities across scales, renormalized in a wavelet basis. For a $\varphi^4$ scalar potential, sampling these hierarchic models avoids the critical slowing down at the phase transition. An outstanding issue is to also approximate non-Gaussian fields having long-range interactions in space and across scales. We introduce low-dimensional models with robust multiscale approximations of high order polynomial energies. They are calculated with a second wavelet transform, which defines interactions over two hierarchies of scales. We estimate and sample these wavelet scattering models to generate 2D vorticity fields of turbulence, and images of dark matter densities.

[735]  arXiv:2405.03472 (cross-list from math.OC) [pdf, other]
Title: A Symplectic Analysis of Alternating Mirror Descent
Comments: 95 pages, 3 figures
Subjects: Optimization and Control (math.OC); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Dynamical Systems (math.DS); Numerical Analysis (math.NA)

Motivated by understanding the behavior of the Alternating Mirror Descent (AMD) algorithm for bilinear zero-sum games, we study the discretization of continuous-time Hamiltonian flow via the symplectic Euler method. We provide a framework for analysis using results from Hamiltonian dynamics, Lie algebra, and symplectic numerical integrators, with an emphasis on the existence and properties of a conserved quantity, the modified Hamiltonian (MH), for the symplectic Euler method. We compute the MH in closed-form when the original Hamiltonian is a quadratic function, and show that it generally differs from the other conserved quantity known previously in that case. We derive new error bounds on the MH when truncated at orders in the stepsize in terms of the number of iterations, $K$, and utilize this bound to show an improved $\mathcal{O}(K^{1/5})$ total regret bound and an $\mathcal{O}(K^{-4/5})$ duality gap of the average iterates for AMD. Finally, we propose a conjecture which, if true, would imply that the total regret for AMD goes as $\mathcal{O}\left(K^{\varepsilon}\right)$ and the duality gap of the average iterates as $\mathcal{O}\left(K^{-1+\varepsilon}\right)$ for any $\varepsilon>0$, and we can take $\varepsilon=0$ upon certain convergence conditions for the MH.

[736]  arXiv:2405.03517 (cross-list from math.CO) [pdf, ps, other]
Title: All $S_p$ notions of quantum expansion are equivalent
Comments: 5 pages
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Functional Analysis (math.FA); Quantum Physics (quant-ph)

In a recent work Li, Qiao, Wigderson, Wigderson and Zhang introduced notions of quantum expansion based on $S_p$ norms and posed as an open question if they were all equivalent. We give an affirmative answer to this question.

[737]  arXiv:2405.03521 (cross-list from cond-mat.supr-con) [pdf, other]
Title: Optimisation challenge for superconducting adiabatic neural network implementing XOR and OR boolean functions
Comments: 13 pages, 12 figures
Subjects: Superconductivity (cond-mat.supr-con); Artificial Intelligence (cs.AI)

In this article, we consider designs of simple analog artificial neural networks based on adiabatic Josephson cells with a sigmoid activation function. A new approach based on the gradient descent method is developed to adjust the circuit parameters, allowing efficient signal transmission between the network layers. The proposed solution is demonstrated on the example of the system implementing XOR and OR logical operations.

[738]  arXiv:2405.03542 (cross-list from eess.SP) [pdf, other]
Title: Enhancing Channel Estimation in Quantized Systems with a Generative Prior
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

Channel estimation in quantized systems is challenging, particularly in low-resolution systems. In this work, we propose to leverage a Gaussian mixture model (GMM) as generative prior, capturing the channel distribution of the propagation environment, to enhance a classical estimation technique based on the expectation-maximization (EM) algorithm for one-bit quantization. Thereby, a maximum a posteriori (MAP) estimate of the most responsible mixture component is inferred for a quantized received signal, which is subsequently utilized in the EM algorithm as side information. Numerical results demonstrate the significant performance improvement of our proposed approach over both a simplistic Gaussian prior and current state-of-the-art channel estimators. Furthermore, the proposed estimation framework exhibits adaptability to higher resolution systems and alternative generative priors.

[739]  arXiv:2405.03543 (cross-list from math.LO) [pdf, other]
Title: Axiomatizing the Logic of Ordinary Discourse
Subjects: Logic (math.LO); Logic in Computer Science (cs.LO)

Most non-classical logics are subclassical, that is, every inference/theorem they validate is also valid classically. A notable exception is the three-valued propositional Logic of Ordinary Discourse (OL) proposed and extensively motivated by W. S. Cooper as a more adequate candidate for formalizing everyday reasoning (in English). OL challenges classical logic not only by rejecting some theses, but also by accepting non-classically valid principles, such as so-called Aristotle's and Boethius' theses. Formally, OL shows a number of unusual features - it is non-structural, connexive, paraconsistent and contradictory - making it all the more interesting for the mathematical logician. We present our recent findings on OL and its structural companion (that we call sOL). We introduce Hilbert-style multiple-conclusion calculi for OL and sOL that are both modular and analytic, and easily allow us to obtain single-conclusion axiomatizations. We prove that sOL is algebraizable and single out its equivalent semantics, which turns out to be a discriminator variety generated by a three-element algebra. Having observed that sOL can express the connectives of other three-valued logics, we prove that it is definitionally equivalent to an expansion of the three-valued logic J3 of D'Ottaviano and da Costa, itself an axiomatic extension of paraconsistent Nelson logic.

[740]  arXiv:2405.03549 (cross-list from stat.ML) [pdf, other]
Title: Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Dynamical Systems (math.DS); Probability (math.PR)

Generative modeling via stochastic processes has led to remarkable empirical results as well as to recent advances in their theoretical understanding. In principle, both space and time of the processes can be discrete or continuous. In this work, we study time-continuous Markov jump processes on discrete state spaces and investigate their correspondence to state-continuous diffusion processes given by SDEs. In particular, we revisit the $\textit{Ehrenfest process}$, which converges to an Ornstein-Uhlenbeck process in the infinite state space limit. Likewise, we can show that the time-reversal of the Ehrenfest process converges to the time-reversed Ornstein-Uhlenbeck process. This observation bridges discrete and continuous state spaces and allows to carry over methods from one to the respective other setting. Additionally, we suggest an algorithm for training the time-reversal of Markov jump processes which relies on conditional expectations and can thus be directly related to denoising score matching. We demonstrate our methods in multiple convincing numerical experiments.

[741]  arXiv:2405.03579 (cross-list from stat.AP) [pdf, other]
Title: Some Statistical and Data Challenges When Building Early-Stage Digital Experimentation and Measurement Capabilities
Authors: C. H. Bryan Liu
Comments: PhD thesis. Imperial College London. Official library version available on: this https URL
Subjects: Applications (stat.AP); Databases (cs.DB); Methodology (stat.ME)

Digital experimentation and measurement (DEM) capabilities -- the knowledge and tools necessary to run experiments with digital products, services, or experiences and measure their impact -- are fast becoming part of the standard toolkit of digital/data-driven organisations in guiding business decisions. Many large technology companies report having mature DEM capabilities, and several businesses have been established purely to manage experiments for others. Given the growing evidence that data-driven organisations tend to outperform their non-data-driven counterparts, there has never been a greater need for organisations to build/acquire DEM capabilities to thrive in the current digital era.
This thesis presents several novel approaches to statistical and data challenges for organisations building DEM capabilities. We focus on the fundamentals associated with building DEM capabilities, which lead to a richer understanding of the underlying assumptions and thus enable us to develop more appropriate capabilities. We address why one should engage in DEM by quantifying the benefits and risks of acquiring DEM capabilities. This is done using a ranking under lower uncertainty model, enabling one to construct a business case. We also examine what ingredients are necessary to run digital experiments. In addition to clarifying the existing literature around statistical tests, datasets, and methods in experimental design and causal inference, we construct an additional dataset and detailed case studies on applying state-of-the-art methods. Finally, we investigate when a digital experiment design would outperform another, leading to an evaluation framework that compares competing designs' data efficiency.

[742]  arXiv:2405.03587 (cross-list from math.CO) [pdf, other]
Title: Non-detectable patterns hidden within sequences of bits
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

In this paper we construct families of bit sequences using combinatorial methods. Each sequence is derived by con- verting a collection of numbers encoding certain combinatorial nu- merics from objects exhibiting symmetry in various dimensions. Using the algorithms first described in [1] we show that the NIST testing suite described in publication 800-22 does not detect these symmetries hidden within these sequences.

[743]  arXiv:2405.03667 (cross-list from eess.SP) [pdf, other]
Title: Fault Detection and Monitoring using an Information-Driven Strategy: Method, Theory, and Application
Comments: 28 pages, 11 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

The ability to detect when a system undergoes an incipient fault is of paramount importance in preventing a critical failure. In this work, we propose an information-driven fault detection method based on a novel concept drift detector. The method is tailored to identifying drifts in input-output relationships of additive noise models (i.e., model drifts) and is based on a distribution-free mutual information (MI) estimator. Our scheme does not require prior faulty examples and can be applied distribution-free over a large class of system models. Our core contributions are twofold. First, we demonstrate the connection between fault detection, model drift detection, and testing independence between two random variables. Second, we prove several theoretical properties of the proposed MI-based fault detection scheme: (i) strong consistency, (ii) exponentially fast detection of the non-faulty case, and (iii) control of both significance levels and power of the test. To conclude, we validate our theory with synthetic data and the benchmark dataset N-CMAPSS of aircraft turbofan engines. These empirical results support the usefulness of our methodology in many practical and realistic settings, and the theoretical results show performance guarantees that other methods cannot offer.

Replacements for Tue, 7 May 24

[744]  arXiv:0901.1988 (replaced) [pdf, ps, other]
Title: Many-Help-One Problem for Gaussian Sources with a Tree Structure on their Correlation
Authors: Yasutada Oohama
Comments: 21 pages, 4 figures
Subjects: Information Theory (cs.IT)
[745]  arXiv:1704.04370 (replaced) [pdf, other]
Title: Fast Similarity Sketching
Comments: The original version was directly based on a conference paper of the same title from FOCS'17. This new version is substantially revised with some cleaner and stronger theorems, particularly concerning the high probability domain. Moreover, there is one more author, Jakob Houen. In addition, one of the old authors, Mathias, has changed surname from Knudsen to Langhede
Subjects: Data Structures and Algorithms (cs.DS)
[746]  arXiv:1902.01353 (replaced) [pdf, other]
Title: A Session Type System for Asynchronous Unreliable Broadcast Communication
Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)
[747]  arXiv:1905.10951 (replaced) [pdf, other]
Title: On the Evaluation Metric for Hashing
Subjects: Information Retrieval (cs.IR)
[748]  arXiv:2007.10609 (replaced) [pdf, other]
Title: SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level
Subjects: Human-Computer Interaction (cs.HC)
[749]  arXiv:2008.01503 (replaced) [pdf, other]
Title: Multiple Code Hashing for Efficient Image Retrieval
Comments: 12 pages, 9 figures, 3 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[750]  arXiv:2009.09538 (replaced) [pdf, other]
Title: Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms
Comments: 40 pages, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[751]  arXiv:2101.10867 (replaced) [pdf, other]
Title: On metrics robust to noise and deformations
Authors: William Leeb
Subjects: Functional Analysis (math.FA); Numerical Analysis (math.NA)
[752]  arXiv:2108.06009 (replaced) [pdf, ps, other]
Title: SAR image matching algorithm based on multi-class features
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[753]  arXiv:2109.11436 (replaced) [pdf, ps, other]
Title: Piecewise Padé-Chebyshev Reconstruction of Bivariate Piecewise Smooth Functions
Authors: Akansha Singh
Comments: 26 pages, 15 figures
Subjects: Numerical Analysis (math.NA)
[754]  arXiv:2109.11762 (replaced) [pdf, other]
Title: LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models
Comments: Contains 10 main pages, 21 figures, 3 tables
Journal-ref: Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '24)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[755]  arXiv:2109.15037 (replaced) [pdf, other]
Title: A Group Key Establishment Scheme
Subjects: Cryptography and Security (cs.CR)
[756]  arXiv:2110.13452 (replaced) [pdf, other]
Title: On the Optimization Landscape of Maximum Mean Discrepancy
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[757]  arXiv:2112.06433 (replaced) [pdf, other]
Title: Generate Point Clouds with Multiscale Details from Graph-Represented Structures
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[758]  arXiv:2112.10625 (replaced) [pdf, other]
Title: Privacy-Preserving Nonlinear Cloud-based Model Predictive Control via Affine Masking
Comments: 11 pages, 5 figures
Subjects: Systems and Control (eess.SY)
[759]  arXiv:2201.02797 (replaced) [pdf, other]
Title: A Unified Review of Deep Learning for Automated Medical Coding
Comments: ACM Computing Surveys
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[760]  arXiv:2201.06317 (replaced) [pdf, other]
Title: Language Model-Based Paired Variational Autoencoders for Robotic Language Learning
Comments: Published in: IEEE Transactions on Cognitive and Developmental Systems, 15:4, 3204452
Journal-ref: IEEE Transactions on Cognitive and Developmental Systems (Volume: 15, Issue: 4, December 2023)
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[761]  arXiv:2202.05420 (replaced) [pdf, other]
Title: A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability
Comments: NeurIPS 2022 camera-ready
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[762]  arXiv:2203.07831 (replaced) [pdf, other]
Title: Graph Convolutional Neural Networks Sensitivity under Probabilistic Error Model
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[763]  arXiv:2203.11076 (replaced) [pdf, other]
Title: Collaborative Learning for Cyberattack Detection in Blockchain Networks
Journal-ref: IEEE Transactions on Systems, Man, and Cybernetics: Systems (2024)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[764]  arXiv:2203.15968 (replaced) [pdf, other]
Title: Light Clients for Lazy Blockchains
Comments: Financial Cryptography and Data Security 2024 (FC24)
Subjects: Cryptography and Security (cs.CR)
[765]  arXiv:2204.04510 (replaced) [pdf, other]
Title: Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning
Comments: ICML 2024 (22 pages)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
[766]  arXiv:2204.07756 (replaced) [pdf, other]
Title: Visual Attention Methods in Deep Learning: An In-Depth Survey
Comments: Accepted in Information Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[767]  arXiv:2205.05505 (replaced) [pdf, other]
Title: Probability Distribution of Hypervolume Improvement in Bi-objective Bayesian Optimization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[768]  arXiv:2206.00251 (replaced) [pdf, other]
[769]  arXiv:2206.01393 (replaced) [pdf, ps, other]
Title: Simulation of Crowd Egress with Environmental Stressors
Comments: 20 pages, 14 figures
Subjects: Physics and Society (physics.soc-ph); Multiagent Systems (cs.MA); Adaptation and Self-Organizing Systems (nlin.AO)
[770]  arXiv:2206.05051 (replaced) [pdf, other]
Title: Temporal Inductive Logic Reasoning over Hypergraphs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[771]  arXiv:2206.07705 (replaced) [pdf, other]
Title: LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D Detection
Comments: Find the primary metrics for the 2022 Waymo Open Dataset 3D Camera-Only Detection Challenge at this https URL . Find the code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[772]  arXiv:2206.09418 (replaced) [pdf, other]
Title: LordNet: An Efficient Neural Network for Learning to Solve Parametric Partial Differential Equations without Simulated Data
Journal-ref: Neural Networks, 2024
Subjects: Machine Learning (cs.LG)
[773]  arXiv:2206.09821 (replaced) [pdf, other]
Title: Exceedance Probability Forecasting via Regression for Significant Wave Height Prediction
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[774]  arXiv:2206.10049 (replaced) [pdf, ps, other]
Title: The Capacity of 3 User Linear Computation Broadcast
Subjects: Information Theory (cs.IT)
[775]  arXiv:2206.12674 (replaced) [pdf, other]
Title: Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization
Authors: Igor Kuznetsov
Comments: Short version of this paper is accepted to AAMAS 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[776]  arXiv:2206.12977 (replaced) [pdf, ps, other]
Title: Adversarially Robust PAC Learnability of Real-Valued Functions
Comments: accepted to ICML2023
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[777]  arXiv:2207.02410 (replaced) [pdf, other]
Title: A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation
Comments: 12 pages, 5 figures
Journal-ref: Machine Intelligence Research, 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[778]  arXiv:2207.10170 (replaced) [pdf, other]
Title: Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks
Comments: ICLR 2024 Spotlight (top 5%)
Subjects: Artificial Intelligence (cs.AI)
[779]  arXiv:2207.13526 (replaced) [pdf, other]
Title: UltimateKalman: Flexible Kalman Filtering and Smoothing Using Orthogonal Transformations
Authors: Sivan Toledo
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
[780]  arXiv:2208.00841 (replaced) [pdf, other]
Title: Spline-Shaped Microstrip Edge-Fed Antenna for 77 GHz Automotive Radar Systems
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
[781]  arXiv:2208.05716 (replaced) [pdf, other]
Title: Task Aligned Meta-learning based Augmented Graph for Cold-Start Recommendation
Subjects: Information Retrieval (cs.IR)
[782]  arXiv:2208.06348 (replaced) [pdf, other]
Title: Can Brain Signals Reveal Inner Alignment with Human Languages?
Comments: EMNLP 2023 Findings
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[783]  arXiv:2209.05428 (replaced) [pdf, other]
Title: Elastic Context: Encoding Elasticity for Data-driven Models of Textiles
Subjects: Robotics (cs.RO)
[784]  arXiv:2209.07163 (replaced) [pdf, other]
Title: Morphology-Aware Interactive Keypoint Estimation
Comments: MICCAI 2022. The first two authors contributed equally. The last two authors are the co-corresponding authors
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[785]  arXiv:2209.07577 (replaced) [pdf, other]
Title: Experimental verification of the quantum nature of a neural network
Comments: arXiv admin note: text overlap with arXiv:2206.00005
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Quantum Physics (quant-ph)
[786]  arXiv:2209.11112 (replaced) [pdf, other]
Title: CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Comments: 17 pages, 11 figures, and 6 tables. arXiv admin note: text overlap with arXiv:2203.15149
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2477-2493, 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[787]  arXiv:2209.13816 (replaced) [pdf, other]
Title: Revisiting Few-Shot Learning from a Causal Perspective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[788]  arXiv:2210.03327 (replaced) [pdf, other]
Title: Enumeration of spatial manipulators by using the concept of Adjacency Matrix
Subjects: Robotics (cs.RO)
[789]  arXiv:2210.09430 (replaced) [pdf, other]
Title: Evaluating Search System Explainability with Psychometrics and Crowdsourcing
Comments: 11 pages, 4 figures, accepted at SIGIR 2024 as full paper
Subjects: Information Retrieval (cs.IR)
[790]  arXiv:2210.14972 (replaced) [pdf, other]
Title: Environment Design for Inverse Reinforcement Learning
Comments: to appear at ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[791]  arXiv:2211.01365 (replaced) [pdf, other]
Title: QuACK: Accelerating Gradient-Based Quantum Optimization with Koopman Operator Learning
Comments: Advances in Neural Information Processing Systems 36 (NeurIPS 2023) spotlight
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Computational Physics (physics.comp-ph)
[792]  arXiv:2211.10344 (replaced) [pdf, other]
Title: Physics-informed neural networks for operator equations with stochastic data
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
[793]  arXiv:2211.11312 (replaced) [pdf, other]
Title: Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack
Comments: Accepted in Pattern Recognition. arXiv admin note: substantial text overlap with arXiv:2103.05266
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[794]  arXiv:2211.12827 (replaced) [pdf, other]
Title: Video Instance Shadow Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[795]  arXiv:2211.13726 (replaced) [pdf, other]
Title: Lightweight Event-based Optical Flow Estimation via Iterative Deblurring
Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA'24), Yokohama, Japan, May 13-17, 2024. arXiv revision includes additional ablation studies results
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[796]  arXiv:2211.14214 (replaced) [pdf, other]
Title: Complexity Framework for Forbidden Subgraphs II: Edge Subdivision and the "H"-graphs
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[797]  arXiv:2212.03932 (replaced) [pdf, other]
Title: Low Variance Off-policy Evaluation with State-based Importance Sampling
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[798]  arXiv:2212.04486 (replaced) [pdf, other]
Title: A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[799]  arXiv:2212.04497 (replaced) [pdf, other]
Title: UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Comments: Accepted at IEEE TMI-2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[800]  arXiv:2212.05728 (replaced) [pdf, other]
Title: Synergy and Redundancy Dominated Effects in Time Series via Transfer Entropy Decompositions
Comments: Accepted to be presented at the NeurIT: Information theory in neuroscience and neuroengineering workshop. In connection with ISIT 2024
Subjects: Information Theory (cs.IT)
[801]  arXiv:2301.02426 (replaced) [pdf, other]
Title: Reversibility of elliptical slice sampling revisited
Comments: 25 pages
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
[802]  arXiv:2301.10022 (replaced) [pdf, other]
Title: Koopman neural operator as a mesh-free solver of non-linear partial differential equations
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Fluid Dynamics (physics.flu-dyn)
[803]  arXiv:2302.00808 (replaced) [pdf, other]
Title: ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Comments: To appear in Proceedings of the $\mathit{41}^{st}$ International Conference on Machine Learning (ICML), Vienna, Austria. PMLR 235, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[804]  arXiv:2302.06430 (replaced) [pdf, other]
Title: Deep Orthogonal Hypersphere Compression for Anomaly Detection
Comments: Published in ICLR 2024: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[805]  arXiv:2302.07751 (replaced) [pdf, ps, other]
Title: Fully Energy-Efficient Randomized Backoff: Slow Feedback Loops Yield Fast Contention Resolution
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[806]  arXiv:2302.08108 (replaced) [pdf, ps, other]
Title: User Response in Ad Auctions: An MDP Formulation of Long-Term Revenue Optimization
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[807]  arXiv:2302.08434 (replaced) [pdf, other]
Title: On marginal feature attributions of tree-based models
Comments: Minor corrections. 30 pages+appendix (64 pages in total), 10 figures. To appear in Foundations of Data Science
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
[808]  arXiv:2302.10128 (replaced) [pdf, other]
Title: Sketch In, Sketch Out: Accelerating both Learning and Inference for Structured Prediction with Kernels
Journal-ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:109-117, 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[809]  arXiv:2302.10149 (replaced) [pdf, other]
Title: Poisoning Web-Scale Training Datasets is Practical
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[810]  arXiv:2302.10442 (replaced) [pdf, other]
Title: Data-based Adaptive Refinement of Finite Element Thin Plate Spline
Authors: L. Fang, L.Stals
Subjects: Numerical Analysis (math.NA)
[811]  arXiv:2302.12461 (replaced) [pdf, other]
Title: Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
Comments: Final version accepted at FAccT 24
Journal-ref: The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT 24), June 3-6, 2024, Rio de Janeiro, Brazil
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[812]  arXiv:2303.00244 (replaced) [pdf, other]
Title: SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective
Comments: CVPRw 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[813]  arXiv:2303.02698 (replaced) [pdf, other]
Title: Robust affine point matching via quadratic assignment on Grassmannians
Comments: 8 pages, 23 figures; GitHub repository at (this https URL); Section IV: added comparison to GrassGraph (this https URL); notably, GrassGraph quickly loses accuracy on our test examples with noise and occlusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[814]  arXiv:2303.03090 (replaced) [pdf, other]
Title: Parallel Optimization with Hard Safety Constraints for Cooperative Planning of Connected Autonomous Vehicles
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[815]  arXiv:2303.03151 (replaced) [pdf, other]
Title: Resource-aware Cyber Deception for Microservice-based Applications
Subjects: Cryptography and Security (cs.CR)
[816]  arXiv:2303.08269 (replaced) [pdf, other]
Title: Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation when the SCAR assumption does not hold
Subjects: Machine Learning (cs.LG)
[817]  arXiv:2303.11876 (replaced) [pdf, other]
Title: An implicit function theorem for the stream calculus
Subjects: Logic in Computer Science (cs.LO)
[818]  arXiv:2303.14241 (replaced) [pdf, other]
Title: Data Depth and Core-based Trend Detection on Blockchain Transaction Networks
Comments: 10 pages
Journal-ref: Frontiers in Blockchain 7 (2024)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[819]  arXiv:2303.16621 (replaced) [pdf, other]
Title: AraSpot: Arabic Spoken Command Spotting
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[820]  arXiv:2303.17222 (replaced) [pdf, other]
Title: LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space
Comments: 7 pages, 3 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[821]  arXiv:2304.00762 (replaced) [pdf, ps, other]
Title: The (r, δ)-Locality of Repeated-Root Cyclic Codes with Prime Power Lengths
Subjects: Information Theory (cs.IT)
[822]  arXiv:2304.00962 (replaced) [pdf, other]
Title: RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Comments: To appear in CVPR2024 .project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[823]  arXiv:2304.03094 (replaced) [pdf, other]
Title: PopulAtion Parameter Averaging (PAPA)
Comments: Blog post: this https URL, Code: this https URL, TMLR journal publication: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[824]  arXiv:2304.06883 (replaced) [pdf, other]
Title: Intelligent Reflecting Surface Aided Wireless Communication Systems: Joint Location and Passive Beamforming Design
Comments: Following the publication of our work, we identified errors in our data analysis process. To uphold the standards of academic integrity and the accuracy of our findings, we feel it necessary to withdraw the current version of our paper. We plan to submit a revised version upon thorough review and correction of these errors
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[825]  arXiv:2304.12766 (replaced) [pdf, other]
Title: QuantProb: Generalizing Probabilities along with Predictions for a Pre-trained Classifier
Comments: Accepted at UAI 2024
Subjects: Machine Learning (cs.LG)
[826]  arXiv:2304.14724 (replaced) [pdf, other]
Title: Structural Parameterizations for Two Bounded Degree Problems Revisited
Comments: Abstract shortened to meet arXiv's requirements
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
[827]  arXiv:2305.01322 (replaced) [pdf, other]
Title: An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework
Comments: IEEE IJCNN 2023
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[828]  arXiv:2305.01723 (replaced) [pdf, other]
Title: Stance Detection: A Practical Guide to Classifying Political Beliefs in Text
Authors: Michael Burnham
Subjects: Computation and Language (cs.CL)
[829]  arXiv:2305.03122 (replaced) [pdf, ps, other]
Title: The Capacity of Classical Summation over a Quantum MAC with Arbitrarily Distributed Inputs and Entanglements
Subjects: Information Theory (cs.IT)
[830]  arXiv:2305.03707 (replaced) [pdf, other]
Title: Hardware Honeypot: Setting Sequential Reverse Engineering on a Wrong Track
Journal-ref: 2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS), Kielce, Poland, 2024, pp. 47-52
Subjects: Cryptography and Security (cs.CR)
[831]  arXiv:2305.08088 (replaced) [pdf, other]
Title: Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives
Comments: Accepted by LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[832]  arXiv:2305.11731 (replaced) [pdf, ps, other]
Title: Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[833]  arXiv:2305.12844 (replaced) [pdf, other]
Title: An Optimized Ensemble Deep Learning Model For Brain Tumor Classification
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[834]  arXiv:2305.14081 (replaced) [pdf, other]
Title: How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have
Comments: Accepted at LREC-COLING 2024
Subjects: Computation and Language (cs.CL)
[835]  arXiv:2305.14658 (replaced) [pdf, other]
Title: Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Comments: preprint
Subjects: Computation and Language (cs.CL)
[836]  arXiv:2305.14685 (replaced) [pdf, other]
Title: Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval
Comments: COLING 2024
Subjects: Information Retrieval (cs.IR)
[837]  arXiv:2305.15957 (replaced) [pdf, other]
Title: DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[838]  arXiv:2305.18569 (replaced) [pdf, other]
Title: Fairness of ChatGPT
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
[839]  arXiv:2305.19872 (replaced) [pdf, other]
Title: Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials
Comments: The Web Conference 2024 (12 pages)
Subjects: Machine Learning (cs.LG)
[840]  arXiv:2305.20076 (replaced) [pdf, other]
Title: Decision-Oriented Dialogue for Human-AI Collaboration
Comments: TACL 2024, pre-MIT Press publication version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[841]  arXiv:2306.01603 (replaced) [pdf, other]
Title: Decentralized Federated Learning: A Survey and Perspective
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
[842]  arXiv:2306.02105 (replaced) [pdf, other]
Title: Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models
Comments: Accepted at SIGUL-LREC 2024
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[843]  arXiv:2306.02568 (replaced) [pdf, other]
Title: Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Comments: Accepted by ICML 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[844]  arXiv:2306.02786 (replaced) [pdf, other]
Title: Navigating Explanatory Multiverse Through Counterfactual Path Geometry
Comments: Workshop on Counterfactuals in Minds and Machines at 2023 International Conference on Machine Learning (ICML)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[845]  arXiv:2306.03623 (replaced) [pdf, other]
Title: Spike-based computation using classical recurrent neural networks
Authors: Florent De Geeter (1), Damien Ernst (1 and 2), Guillaume Drion (1) ((1) Montefiore Institute, University of Liège, Liège, Belgium, (2) LTCI, Télécom Paris, Institut Polytechnique de Paris, France)
Comments: 17 pages, 8 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
[846]  arXiv:2306.04434 (replaced) [pdf, ps, other]
Title: Visions of augmented reality in popular culture: Power and (un)readable identities when the world becomes a screen
Journal-ref: Tidsskrift for Kjoennsforskning volume 45 2021 pages 89-104
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
[847]  arXiv:2306.05426 (replaced) [pdf, other]
Title: SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Comments: Poster, ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[848]  arXiv:2306.05792 (replaced) [pdf, ps, other]
Title: Reinforcement Learning for Mutation Operator Selection in Automated Program Repair
Subjects: Software Engineering (cs.SE)
[849]  arXiv:2306.07209 (replaced) [pdf, other]
Title: Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
[850]  arXiv:2306.10614 (replaced) [pdf, other]
Title: Identifiable causal inference with noisy treatment and no side information
Comments: 18 pages, 10 figures. Changes consist of polishing the original version. The experiments and results remain the same
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
[851]  arXiv:2306.11695 (replaced) [pdf, other]
Title: A Simple and Effective Pruning Approach for Large Language Models
Comments: ICLR 2024. Website at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[852]  arXiv:2306.11879 (replaced) [pdf, other]
Title: Open-Domain Text Evaluation via Contrastive Distribution Modeling
Comments: Accepted to ICML 2024
Subjects: Computation and Language (cs.CL)
[853]  arXiv:2306.12420 (replaced) [pdf, other]
Title: LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Comments: Published in NAACL 2024 Demo Track
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[854]  arXiv:2306.12422 (replaced) [pdf, other]
Title: DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Comments: ICLR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[855]  arXiv:2306.14222 (replaced) [pdf, other]
Title: Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Statistical Finance (q-fin.ST)
[856]  arXiv:2306.15447 (replaced) [pdf, other]
Title: Are aligned neural networks adversarially aligned?
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[857]  arXiv:2307.00186 (replaced) [pdf, other]
Title: How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain
Comments: Published as a journal paper at Journal of the American Medical Informatics Association (JAMIA). arXiv admin note: text overlap with arXiv:2305.18624
Subjects: Computation and Language (cs.CL)
[858]  arXiv:2307.02629 (replaced) [pdf, other]
Title: The landscape of compressibility measures for two-dimensional data
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[859]  arXiv:2307.02891 (replaced) [pdf, other]
Title: BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[860]  arXiv:2307.03195 (replaced) [pdf, other]
Title: A Comprehensive Survey of Artificial Intelligence Techniques for Talent Analytics
Comments: 61 pages, 15 figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[861]  arXiv:2307.04085 (replaced) [pdf, other]
Title: Vector Commitments with Efficient Updates
Comments: In Advances in Financial Technologies - AFT 2023
Subjects: Cryptography and Security (cs.CR)
[862]  arXiv:2307.04599 (replaced) [pdf, other]
Title: Bridging MDE and AI: A Systematic Review of Domain-Specific Languages and Model-Driven Practices in AI Software Systems Engineering
Comments: 57 pages, 2 figures, 8 tables
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[863]  arXiv:2307.04760 (replaced) [pdf, other]
Title: Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[864]  arXiv:2307.11880 (replaced) [pdf, ps, other]
Title: Bans vs. Warning Labels: Examining Support for Community-wide Moderation Interventions
Authors: Shagun Jhaver
Comments: arXiv admin note: text overlap with arXiv:2301.02208
Subjects: Human-Computer Interaction (cs.HC)
[865]  arXiv:2307.13883 (replaced) [pdf, other]
Title: ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis
Comments: ICLR 2024
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
[866]  arXiv:2307.15034 (replaced) [pdf, other]
Title: Guaranteed Approximation Bounds for Mixed-Precision Neural Operators
Comments: ICLR 2024
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
[867]  arXiv:2307.16033 (replaced) [pdf, other]
Title: CoVid-19 Detection leveraging Vision Transformers and Explainable AI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[868]  arXiv:2308.01654 (replaced) [pdf, other]
Title: Towards a Safe Real-Time Motion Planning Framework for Autonomous Driving Systems: An MPPI Approach
Subjects: Robotics (cs.RO)
[869]  arXiv:2308.01987 (replaced) [pdf, other]
Title: Bengali Fake Reviews: A Benchmark Dataset and Detection System
Subjects: Computation and Language (cs.CL)
[870]  arXiv:2308.02151 (replaced) [pdf, other]
Title: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[871]  arXiv:2308.03314 (replaced) [pdf, other]
Title: GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis
Comments: Accepted by IEEE/ACM ICSE 2024
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
[872]  arXiv:2308.04259 (replaced) [pdf, other]
Title: Generalized Forgetting Recursive Least Squares: Stability and Robustness Guarantees
Comments: Accepted to the IEEE Transactions on Automatic Control. Scheduled to appear in the 2024 November issue
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[873]  arXiv:2308.05381 (replaced) [pdf, other]
Title: An Exploratory Study of V-Model in Building ML-Enabled Software: A Systems Engineering Perspective
Authors: Jie JW Wu
Comments: 11 pages, 2 figures, 2 tables. Accepted at CAIN 2024 (3rd International Conference on AI Engineering - Software Engineering for AI)
Journal-ref: 2024 IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN)
Subjects: Software Engineering (cs.SE)
[874]  arXiv:2308.05882 (replaced) [pdf, other]
Title: GPLaSDI: Gaussian Process-based Interpretable Latent Space Dynamics Identification through Deep Autoencoder
Journal-ref: Computer Methods in Applied Mechanics and Engineering, 418A, 116535, 2024
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Numerical Analysis (math.NA)
[875]  arXiv:2308.07847 (replaced) [pdf, other]
Title: Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
Subjects: Cryptography and Security (cs.CR)
[876]  arXiv:2308.10638 (replaced) [pdf, other]
Title: SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Comments: Updated to camera ready version of CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
[877]  arXiv:2308.11267 (replaced) [pdf, other]
Title: Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes
Authors: David M. Bossens
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[878]  arXiv:2308.11471 (replaced) [pdf, other]
Title: Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)
Comments: IROS 2023 The Last-Mile Robotics Workshop
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[879]  arXiv:2308.11647 (replaced) [pdf, other]
Title: Optically-Transparent EM Skins for Outdoor-to-Indoor mm-Wave Wireless Communications
Journal-ref: IEEE Access, 2024
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY); Applied Physics (physics.app-ph)
[880]  arXiv:2308.12255 (replaced) [pdf, other]
Title: Absorbing boundary conditions for the Helmholtz equation using Gauss-Legendre quadrature reduced integrations
Authors: Koki Sagiyama
Comments: edit author list and tidy up script
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
[881]  arXiv:2308.12517 (replaced) [pdf, other]
Title: Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion
Comments: Accepted to IEEE Transactions on Robotics (T-RO) 2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[882]  arXiv:2308.14104 (replaced) [pdf, other]
Title: Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy
Comments: Accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG)
[883]  arXiv:2309.02050 (replaced) [pdf, other]
Title: Machine learning of network inference enhancement from noisy measurements
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
[884]  arXiv:2309.06837 (replaced) [pdf, other]
Title: Time-Optimal Gate-Traversing Planner for Autonomous Drone Racing
Subjects: Robotics (cs.RO)
[885]  arXiv:2309.07169 (replaced) [pdf, other]
Title: Spectral Convergence of Complexon Shift Operators
Comments: 9 pages, 2 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[886]  arXiv:2309.08464 (replaced) [pdf, ps, other]
Title: Differentially Private Average Consensus with Improved Accuracy-Privacy Trade-off
Subjects: Systems and Control (eess.SY)
[887]  arXiv:2309.09128 (replaced) [pdf, other]
Title: ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Comments: 18 pages, 7 figures, published at CHI 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[888]  arXiv:2309.10426 (replaced) [pdf, other]
Title: Multi-Object Graph Affordance Network: Goal-Oriented Planning through Learned Compound Object Affordances
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[889]  arXiv:2309.10908 (replaced) [pdf, other]
Title: Multicopy Reinforcement Learning Agents
Comments: Updates from earlier version: added a more basic "multiagent" algorithm to compare to and comparison graphs
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
[890]  arXiv:2309.11721 (replaced) [pdf, ps, other]
Title: Defining and Preventing Asymmetric Mempool DoS in Ethereum with saferAd
Subjects: Cryptography and Security (cs.CR)
[891]  arXiv:2309.13005 (replaced) [pdf, other]
Title: Towards Counterfactual Fairness-aware Domain Generalization in Changing Environments
Comments: IJCAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[892]  arXiv:2309.13285 (replaced) [pdf, other]
Title: Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning
Comments: Accepted to ICRA 2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[893]  arXiv:2309.14091 (replaced) [pdf, other]
Title: On the Benefit of Optimal Transport for Curriculum Reinforcement Learning
Subjects: Machine Learning (cs.LG)
[894]  arXiv:2309.14726 (replaced) [pdf, other]
Title: PLMM: Personal Large Language Models on Mobile Devices
Authors: Yuanhao Gong
Comments: arXiv admin note: substantial text overlap with arXiv:2307.13221
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)
[895]  arXiv:2309.16482 (replaced) [pdf, ps, other]
Title: Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
Comments: Accepted at HSCMA Sattelite Workshop at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[896]  arXiv:2309.17444 (replaced) [pdf, other]
Title: LLM-grounded Video Diffusion Models
Comments: ICLR 2024. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[897]  arXiv:2310.00809 (replaced) [pdf, other]
Title: Towards Causal Foundation Model: on Duality between Causal Inference and Attention
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
[898]  arXiv:2310.01055 (replaced) [pdf, other]
Title: Improved Crop and Weed Detection with Diverse Data Ensemble Learning in Agriculture
Comments: Accepted in CVPR Workshop as an Oral
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[899]  arXiv:2310.01288 (replaced) [pdf, other]
Title: Offline Tracking with Object Permanence
Comments: Accepted by IEEE Intelligent Vehicles Symposium (IV 2024). Camera ready version with supplementary material
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[900]  arXiv:2310.01352 (replaced) [pdf, other]
Title: RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Comments: v4: ICLR 2024 camera-ready version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[901]  arXiv:2310.01558 (replaced) [pdf, other]
Title: Making Retrieval-Augmented Language Models Robust to Irrelevant Context
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[902]  arXiv:2310.01668 (replaced) [pdf, other]
Title: Locality-Aware Graph-Rewiring in GNNs
Subjects: Machine Learning (cs.LG)
[903]  arXiv:2310.02043 (replaced) [pdf, other]
Title: View-Independent Adjoint Light Tracing for Lighting Design Optimization
Subjects: Graphics (cs.GR)
[904]  arXiv:2310.02401 (replaced) [pdf, other]
Title: FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[905]  arXiv:2310.02671 (replaced) [pdf, other]
Title: Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
Comments: 54 pages, 2 figures, ICLR 2024
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[906]  arXiv:2310.03789 (replaced) [pdf, other]
Title: Grokking as a First Order Phase Transition in Two Layer Networks
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
[907]  arXiv:2310.04966 (replaced) [pdf, other]
Title: Improved Active Learning via Dependent Leverage Score Sampling
Comments: To appear at ICLR 2024
Subjects: Machine Learning (cs.LG)
[908]  arXiv:2310.05175 (replaced) [pdf, other]
Title: Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Subjects: Machine Learning (cs.LG)
[909]  arXiv:2310.05440 (replaced) [pdf, other]
Title: Modeling and Simulation of Chemo-Elasto-Plastically Coupled Battery Active Particles
Subjects: Numerical Analysis (math.NA)
[910]  arXiv:2310.06387 (replaced) [pdf, other]
Title: Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[911]  arXiv:2310.07298 (replaced) [pdf, other]
Title: Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[912]  arXiv:2310.07626 (replaced) [pdf, other]
Title: Learning of Sea Surface Height Interpolation from Multi-variate Simulated Satellite Observations
Comments: Accepted to JAMES. 31 pages, minor revision
Subjects: Machine Learning (cs.LG)
[913]  arXiv:2310.08342 (replaced) [pdf, other]
Title: Discontinuous Galerkin approximations of the heterodimer model for protein-protein interaction
Subjects: Numerical Analysis (math.NA)
[914]  arXiv:2310.08744 (replaced) [pdf, other]
Title: Circuit Component Reuse Across Tasks in Transformer Language Models
Comments: Accepted at ICLR 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[915]  arXiv:2310.09453 (replaced) [src]
Title: Effects of Same-Race Mentorship Preferences on Academic Performance and Survival
Authors: Meijun Liu (1), Yi Bu (2), Daifeng Li (3), Ying Ding (4), Daniel E. Acuna (5) ((1) Institute for Global Public Policy, Fudan University, (2) Department of Information Management, Peking University, (3) School of Information Management, Sun Yat-sen University, (4) School of Information, University of Texas at Austin, (5) Department of Computer Science, University of Colorado at Boulder)
Comments: 1. After further evaluating the race prediction method, we observed unsatisfactory accuracy and F1 scores. The study's findings could be impacted by these subpar predictions. 2. Our study incorporates both US and non-US samples, revealing that non-US samples may introduce outliers and distort the results. We recognize that the study's findings and conclusions might be affected by data quality
Subjects: Social and Information Networks (cs.SI)
[916]  arXiv:2310.10690 (replaced) [pdf, other]
Title: Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming
Comments: Published in International Conference on Educational Data Mining (EDM) 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[917]  arXiv:2310.12952 (replaced) [pdf, other]
Title: Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning
Comments: Published in the proceedings of Artificial Intelligence and Statistics, AISTATS 2024. This paper is dedicated to Aline Sitoe Diatta. The code can be found on Vertaix's GitHub. Code for evaluating diversity using the Vendi scores can be found at this https URL Code for using the scores within Vendi Sampling can be found at this https URL
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Populations and Evolution (q-bio.PE)
[918]  arXiv:2310.12973 (replaced) [pdf, other]
Title: Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Comments: ICLR 2024 Spotlight. 23 pages, 13 figures. Code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[919]  arXiv:2310.13298 (replaced) [pdf, ps, other]
Title: Cache-Aided Communications in MISO Networks with Dynamic User Behavior
Comments: Accepted in IEEE Transaction On Wireless Communications, 2024. arXiv admin note: substantial text overlap with arXiv:2304.11623
Subjects: Information Theory (cs.IT)
[920]  arXiv:2310.14157 (replaced) [pdf, other]
Title: Genetic Algorithms with Neural Cost Predictor for Solving Hierarchical Vehicle Routing Problems
Subjects: Neural and Evolutionary Computing (cs.NE)
[921]  arXiv:2310.15526 (replaced) [pdf, other]
Title: Privacy Amplification for Matrix Mechanisms
Comments: Appearing in ICLR 2024. Changes made to match the conference version of the paper
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[922]  arXiv:2310.15872 (replaced) [pdf, other]
Title: KirchhoffNet: A Scalable Ultra Fast Analog Neural Network
Comments: 9 pages, 10 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
[923]  arXiv:2310.15937 (replaced) [pdf, other]
Title: A Behavioral Perspective on Models of Linear Dynamical Networks with Manifest Variables
Subjects: Systems and Control (eess.SY)
[924]  arXiv:2310.16802 (replaced) [pdf, other]
Title: From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
Subjects: Machine Learning (cs.LG)
[925]  arXiv:2310.17191 (replaced) [pdf, other]
Title: How do Language Models Bind Entities in Context?
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[926]  arXiv:2310.18042 (replaced) [pdf, other]
Title: Sui Lutris: A Blockchain Combining Broadcast and Consensus
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
[927]  arXiv:2310.18373 (replaced) [pdf, ps, other]
Title: Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[928]  arXiv:2310.18861 (replaced) [pdf, other]
Title: Peer-to-Peer Deep Learning for Beyond-5G IoT
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[929]  arXiv:2310.18936 (replaced) [pdf, ps, other]
Title: Adversarial Examples Are Not Real Features
Comments: NeurIPS 2023
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[930]  arXiv:2310.19041 (replaced) [pdf, other]
Title: On Linear Separation Capacity of Self-Supervised Representation Learning
Authors: Shulei Wang
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[931]  arXiv:2310.19567 (replaced) [pdf, other]
Title: CreoleVal: Multilingual Multitask Benchmarks for Creoles
Comments: Accepted to TACL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[932]  arXiv:2311.02198 (replaced) [pdf, other]
Title: Imitation Bootstrapped Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[933]  arXiv:2311.02495 (replaced) [pdf, ps, other]
Title: Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural Networks
Comments: 24 pages, 4 figures
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
[934]  arXiv:2311.02732 (replaced) [pdf, other]
Title: Solving High Dimensional Partial Differential Equations Using Tensor Neural Network and A Posteriori Error Estimators
Comments: 29 pages, 31 figures
Subjects: Numerical Analysis (math.NA)
[935]  arXiv:2311.02807 (replaced) [pdf, other]
Title: QualEval: Qualitative Evaluation for Model Improvement
Comments: NAACL 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[936]  arXiv:2311.03309 (replaced) [pdf, other]
Title: Neural Structure Learning with Stochastic Differential Equations
Comments: ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[937]  arXiv:2311.03486 (replaced) [pdf, other]
Title: Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback
Subjects: Human-Computer Interaction (cs.HC)
[938]  arXiv:2311.04451 (replaced) [pdf, ps, other]
Title: Pseduo-Random and de Bruijn Array Codes
Authors: Tuvi Etzion
Subjects: Information Theory (cs.IT)
[939]  arXiv:2311.06141 (replaced) [pdf, other]
Title: Federated Learning Across Decentralized and Unshared Archives for Remote Sensing Image Classification
Comments: Submitted to the IEEE Geoscience and Remote Sensing Magazine
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[940]  arXiv:2311.08306 (replaced) [pdf, other]
Title: On-the-Fly Fusion of Large Language Models and Machine Translation
Subjects: Computation and Language (cs.CL)
[941]  arXiv:2311.09592 (replaced) [pdf, other]
Title: Scalable and Adaptively Secure Any-Trust Distributed Key Generation and All-hands Checkpointing
Comments: 21 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[942]  arXiv:2311.09677 (replaced) [pdf, other]
Title: R-Tuning: Instructing Large Language Models to Say `I Don't Know'
Comments: NAACL 2024
Subjects: Computation and Language (cs.CL)
[943]  arXiv:2311.10047 (replaced) [pdf, ps, other]
Title: Frozen Set Design for Precoded Polar Codes
Comments: 16 pages, 13 figures, submitted to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)
[944]  arXiv:2311.10093 (replaced) [pdf, other]
Title: The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Comments: Accepted to SIGGRAPH 2024. Project page is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[945]  arXiv:2311.10361 (replaced) [pdf, other]
Title: Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration
Comments: Accepted to Expert Systems with Applications
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[946]  arXiv:2311.10781 (replaced) [pdf, other]
Title: Can Language Model Moderators Improve the Health of Online Discourse?
Comments: 9 pages, NAACL 2024 Main
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[947]  arXiv:2311.11385 (replaced) [pdf, other]
Title: Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts
Comments: Accepted at the Twelfth International Conference on Learning Representations (ICLR 2024)
Subjects: Machine Learning (cs.LG)
[948]  arXiv:2311.12015 (replaced) [pdf, other]
Title: GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Comments: 9 pages, 12 figures, 2 tables. Last updated on May 6th, 2024
Subjects: Robotics (cs.RO); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[949]  arXiv:2311.12052 (replaced) [pdf, other]
Title: MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
Comments: Accepted by ICML 2024. MagicPose and MagicDance are the same project. Website:this https URL Code:this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[950]  arXiv:2311.12609 (replaced) [pdf, other]
Title: Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources
Comments: 25 pages, 3 figures; several presentational changes
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)
[951]  arXiv:2311.14213 (replaced) [pdf, other]
Title: Learning to Solve Inverse Problems for Perceptual Sound Matching
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[952]  arXiv:2311.14902 (replaced) [pdf, other]
Title: Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[953]  arXiv:2311.17210 (replaced) [pdf, other]
Title: Ordinals and recursively defined functions on the reals
Subjects: Logic (math.LO); Discrete Mathematics (cs.DM)
[954]  arXiv:2311.17834 (replaced) [pdf, other]
Title: Spice-E : Structural Priors in 3D Diffusion using Cross-Entity Attention
Comments: Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[955]  arXiv:2312.00032 (replaced) [pdf, other]
Title: An algorithm for forensic toolmark comparisons
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Applications (stat.AP)
[956]  arXiv:2312.01185 (replaced) [pdf, other]
Title: A ripple in time: a discontinuity in American history
Comments: 7 pages, 8 figures; GitHub repository (this https URL); Section 3: added comparison to (this https URL); comments on a misleading accuracy claim in (this https URL)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[957]  arXiv:2312.01239 (replaced) [pdf, other]
Title: Motion Informed Needle Segmentation in Ultrasound Images
Comments: 7 pages, 4 figures, accepted at ISBI 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[958]  arXiv:2312.01606 (replaced) [pdf, ps, other]
Title: Investigating the ability of deep learning to predict Welding Depth and Pore Volume in Hairpin Welding
Subjects: Machine Learning (cs.LG)
[959]  arXiv:2312.02052 (replaced) [pdf, other]
Title: DUCK: Distance-based Unlearning via Centroid Kinematics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[960]  arXiv:2312.02445 (replaced) [pdf, other]
Title: LLaRA: Large Language-Recommendation Assistant
Comments: 11 pages, 5 figures
Subjects: Information Retrieval (cs.IR)
[961]  arXiv:2312.02959 (replaced) [pdf, other]
Title: Detecting algorithmic bias in medical-AI models using trees
Comments: 26 pages, 9 figures
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
[962]  arXiv:2312.05586 (replaced) [pdf, other]
Title: Deeper Understanding of Black-box Predictions via Generalized Influence Functions
Comments: 16 pages, 6 figures, and 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[963]  arXiv:2312.09950 (replaced) [pdf, other]
Title: Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations
Comments: 9 pages, 7 figures, AAAI-24
Journal-ref: AAAI, vol. 38, no. 10, pp. 11766-11774, Mar. 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[964]  arXiv:2312.10136 (replaced) [pdf, other]
Title: Gradient-based Parameter Selection for Efficient Fine-Tuning
Journal-ref: CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[965]  arXiv:2312.10561 (replaced) [pdf, other]
Title: Enabling Accelerators for Graph Computing
Comments: Northeastern University Doctoral Dissertation
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[966]  arXiv:2312.11387 (replaced) [pdf, other]
Title: Data-Driven Continuous-Time Framework for Frequency-Constrained Unit Commitment
Subjects: Systems and Control (eess.SY)
[967]  arXiv:2312.11831 (replaced) [pdf, ps, other]
Title: Locally-Minimal Probabilistic Explanations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[968]  arXiv:2312.13034 (replaced) [pdf, other]
Title: Modelling reliability of reversible circuits with 2D second-order cellular automata
Comments: 34+1 pages, 21 figures, accepted in Physica D
Subjects: Cellular Automata and Lattice Gases (nlin.CG); Discrete Mathematics (cs.DM); Mathematical Physics (math-ph)
[969]  arXiv:2312.14385 (replaced) [pdf, other]
Title: Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Comments: Published at 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Multimedia (cs.MM)
[970]  arXiv:2401.00595 (replaced) [pdf, other]
Title: State of What Art? A Call for Multi-Prompt LLM Evaluation
Comments: Accepted at TACL; pre-MIT Press publication version
Subjects: Computation and Language (cs.CL)
[971]  arXiv:2401.00813 (replaced) [pdf, other]
Title: Ultraspherical/Gegenbauer polynomials to unify 2D/3D Ambisonic directivity designs
Authors: Franz Zotter
Comments: 56 pages, 9 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[972]  arXiv:2401.00847 (replaced) [pdf, other]
Title: Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Authors: Jiye Lee, Hanbyul Joo
Comments: Accepted to CVPR 2024; Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[973]  arXiv:2401.01823 (replaced) [pdf, other]
Title: Detours for Navigating Instructional Videos
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[974]  arXiv:2401.01970 (replaced) [pdf, other]
Title: FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[975]  arXiv:2401.03470 (replaced) [pdf, other]
Title: FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[976]  arXiv:2401.05235 (replaced) [pdf, other]
Title: A Survey on Optimization Studies of Group Centrality Metrics
Subjects: Social and Information Networks (cs.SI); Optimization and Control (math.OC)
[977]  arXiv:2401.05377 (replaced) [pdf, ps, other]
[978]  arXiv:2401.05609 (replaced) [pdf, ps, other]
Title: A cable finite element formulation based on exact tension field for static nonlinear analysis of cable structures
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[979]  arXiv:2401.06462 (replaced) [pdf, other]
Title: AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding
Comments: 12 pages, 12 figures, 3 tables. This manuscript is under review by the IEEE Transactions on Visualization and Computer Graphics (TVCG)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[980]  arXiv:2401.06874 (replaced) [pdf, ps, other]
Title: A Joint Code and Belief Propagation Decoder Design for Quantum LDPC Codes
Comments: ISIT 2024 accepted version
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)
[981]  arXiv:2401.07085 (replaced) [pdf, other]
Title: Three Mechanisms of Feature Learning in the Exact Solution of a Latent Variable Model
Authors: Yizhou Xu, Liu Ziyin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[982]  arXiv:2401.07644 (replaced) [pdf, other]
Title: Resource Allocation in STAR-RIS-Aided SWIPT with RSMA via Meta-Learning
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[983]  arXiv:2401.08019 (replaced) [pdf, ps, other]
Title: Centrality of shortest paths: Algorithms and complexity results
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Optimization and Control (math.OC)
[984]  arXiv:2401.08392 (replaced) [pdf, other]
Title: DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[985]  arXiv:2401.08925 (replaced) [pdf, other]
Title: RandOhm: Mitigating Impedance Side-channel Attacks using Randomized Circuit Configurations
Subjects: Cryptography and Security (cs.CR)
[986]  arXiv:2401.09241 (replaced) [pdf, other]
Title: Biased-MPPI: Informing Sampling-Based Model Predictive Control by Fusing Ancillary Controllers
Comments: Accepted for Robotics and Automation Letters, April 2024
Subjects: Robotics (cs.RO)
[987]  arXiv:2401.09243 (replaced) [pdf, other]
Title: DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning
Comments: NeurIPS 2023 Train Offline Test Online Workshop and Competition (Best Paper Oral Presentation Winning Competition Submission)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[988]  arXiv:2401.09919 (replaced) [pdf, ps, other]
Title: Tractability of linear ill-posed problems in Hilbert space
Comments: 19 pages, final version, accepted for publication
Subjects: Numerical Analysis (math.NA)
[989]  arXiv:2401.10539 (replaced) [pdf, other]
Title: Quality-Diversity Algorithms Can Provably Be Helpful for Optimization
Comments: The conference version of this paper has appeared at IJCAI'24. This version contains all the proof details
Subjects: Neural and Evolutionary Computing (cs.NE)
[990]  arXiv:2401.10566 (replaced) [pdf, other]
Title: Robust Multi-Modal Density Estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[991]  arXiv:2401.10731 (replaced) [pdf, other]
Title: Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion
Comments: 11pages, 11figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[992]  arXiv:2401.11458 (replaced) [pdf, other]
Title: Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
Comments: Accepted by ICML2024
Subjects: Computation and Language (cs.CL)
[993]  arXiv:2401.11708 (replaced) [pdf, other]
Title: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Comments: ICML 2024. Project: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[994]  arXiv:2401.13516 (replaced) [pdf, other]
Title: Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces
Comments: arXiv admin note: substantial text overlap with arXiv:2308.09921, arXiv:2305.05943
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[995]  arXiv:2401.13539 (replaced) [pdf, ps, other]
Title: Dynamic Risk Management in Cyber Physical Systems
Subjects: Software Engineering (cs.SE)
[996]  arXiv:2401.13605 (replaced) [pdf, ps, other]
Title: Regulating AI-Based Remote Biometric Identification. Investigating the Public Demand for Bans, Audits, and Public Database Registrations
Subjects: Computers and Society (cs.CY)
[997]  arXiv:2401.14423 (replaced) [pdf, other]
Title: Prompt Design and Engineering: Introduction and Advanced Methods
Authors: Xavier Amatriain
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
[998]  arXiv:2401.15647 (replaced) [pdf, other]
Title: UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[999]  arXiv:2401.16250 (replaced) [pdf, ps, other]
Title: Efficient solution of ill-posed integral equations through averaging
Comments: 38 pages
Subjects: Numerical Analysis (math.NA)
[1000]  arXiv:2401.16417 (replaced) [pdf, ps, other]
Title: Channel Coding with Mean and Variance Cost Constraints
Subjects: Information Theory (cs.IT)
[1001]  arXiv:2401.16579 (replaced) [pdf, other]
Title: On Channel Simulation with Causal Rejection Samplers
Comments: Accepted to IEEE ISIT 2024, camera-ready version. 11 pages, 1 figure
Subjects: Information Theory (cs.IT)
[1002]  arXiv:2401.16663 (replaced) [pdf, other]
Title: VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
[1003]  arXiv:2401.16977 (replaced) [pdf, other]
Title: Performance Analysis of Generalized Product Codes with Irregular Degree Distribution
Comments: ISIT 2024 accepted version
Subjects: Information Theory (cs.IT)
[1004]  arXiv:2401.17791 (replaced) [pdf, other]
Title: Graph Transformers without Positional Encodings
Authors: Ayush Garg
Comments: Independent Research
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1005]  arXiv:2402.00253 (replaced) [pdf, other]
Title: A Survey on Hallucination in Large Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1006]  arXiv:2402.01327 (replaced) [pdf, other]
Title: Supervised Algorithmic Fairness in Distribution Shifts: A Survey
Comments: IJCAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[1007]  arXiv:2402.02338 (replaced) [pdf, other]
Title: NetLLM: Adapting Large Language Models for Networking
Comments: This paper has been accepted by ACM SIGCOMM 2024
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[1008]  arXiv:2402.02619 (replaced) [pdf, other]
Title: Increasing Trust in Language Models through the Reuse of Verified Circuits
Comments: 8 pages, 10 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[1009]  arXiv:2402.02771 (replaced) [pdf, other]
Title: TensoSDF: Roughness-aware Tensorial Representation for Robust Geometry and Material Reconstruction
Comments: Accepted by SIGGRAPH 2024
Subjects: Graphics (cs.GR)
[1010]  arXiv:2402.03162 (replaced) [pdf, other]
Title: Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1011]  arXiv:2402.03383 (replaced) [pdf, ps, other]
Title: A Collaborative Model-driven Network for MRI Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[1012]  arXiv:2402.03554 (replaced) [pdf, ps, other]
Title: Explicit Formula for Partial Information Decomposition
Subjects: Information Theory (cs.IT); Probability (math.PR)
[1013]  arXiv:2402.04400 (replaced) [pdf, other]
Title: CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[1014]  arXiv:2402.05957 (replaced) [pdf, other]
Title: Accelerating PDE Data Generation via Differential Operator Action in Solution Space
Subjects: Machine Learning (cs.LG)
[1015]  arXiv:2402.06357 (replaced) [pdf, other]
Title: The SkipSponge Attack: Sponge Weight Poisoning of Deep Neural Networks
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[1016]  arXiv:2402.07138 (replaced) [pdf, other]
Title: Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example
Comments: This paper is accepted to Proceedings of the 32nd ACM Symposium on the Foundations of Software Engineering (FSE - 2024), This is an author copy
Subjects: Software Engineering (cs.SE)
[1017]  arXiv:2402.07270 (replaced) [pdf, other]
Title: Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Comments: Accepted as Spotlight Paper for ICLR 2024. The first two authors contributed equally to this work
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1018]  arXiv:2402.07739 (replaced) [pdf, other]
Title: Task-conditioned adaptation of visual features in multi-task policy learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[1019]  arXiv:2402.08193 (replaced) [pdf, other]
Title: Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional Systems
Comments: Under conference submission
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[1020]  arXiv:2402.08318 (replaced) [pdf, other]
Title: Values That Are Explicitly Present in Fairy Tales: Comparing Samples from German, Italian and Portuguese Traditions
Comments: In Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[1021]  arXiv:2402.08907 (replaced) [pdf, other]
Title: Subgraph Pooling: Tackling Negative Transfer on Graphs
Comments: Accepted by IJCAI 24
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
[1022]  arXiv:2402.09809 (replaced) [pdf, ps, other]
Title: Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana
Subjects: Human-Computer Interaction (cs.HC)
[1023]  arXiv:2402.10481 (replaced) [pdf, ps, other]
Title: Emoji Driven Crypto Assets Market Reactions
Subjects: Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Statistical Finance (q-fin.ST)
[1024]  arXiv:2402.10517 (replaced) [pdf, other]
Title: Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Comments: To appear at ICML 2024. Code is available at this https URL
Subjects: Machine Learning (cs.LG)
[1025]  arXiv:2402.13008 (replaced) [pdf, other]
Title: Efficient Enumeration of Large Maximal k-Plexes
Comments: 18 pages
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
[1026]  arXiv:2402.14693 (replaced) [pdf, ps, other]
Title: Joint AP-UE Association and Power Factor Optimization for Distributed Massive MIMO
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
[1027]  arXiv:2402.16310 (replaced) [pdf, other]
Title: REPLAY: Modeling Time-Varying Temporal Regularities of Human Mobility for Location Prediction over Sparse Trajectories
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1028]  arXiv:2402.17152 (replaced) [pdf, other]
Title: Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Comments: 26 pages, 13 figures. ICML'24. Code available at this https URL
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[1029]  arXiv:2402.17220 (replaced) [pdf, ps, other]
Title: On the probability of a Pareto record
Authors: James Allen Fill (1), Ao Sun (1) ((1) The Johns Hopkins University)
Comments: 16 pages, 1 figure; this revision responds to three anonymous reviews; paper accepted to Probability in the Engineering and Informational Sciences
Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS)
[1030]  arXiv:2402.17493 (replaced) [pdf, ps, other]
Title: Predicting postoperative risks using large language models
Comments: Supplemental file available at: this https URL models publicly available at: this https URL AND this https URL
Subjects: Computation and Language (cs.CL)
[1031]  arXiv:2402.17613 (replaced) [pdf, other]
Title: Neural Automated Writing Evaluation with Corrective Feedback
Comments: Supported by the SoTL Seed Program at UBC
Subjects: Computation and Language (cs.CL)
[1032]  arXiv:2402.18312 (replaced) [pdf, other]
Title: How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[1033]  arXiv:2402.19146 (replaced) [pdf, other]
Title: Computing Longest Common Subsequence under Cartesian-Tree Matching Model
Subjects: Data Structures and Algorithms (cs.DS)
[1034]  arXiv:2402.19379 (replaced) [pdf, other]
Title: Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy
Comments: 20 pages; 13 visualizations (nine figures, four tables)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1035]  arXiv:2402.19404 (replaced) [pdf, other]
Title: EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[1036]  arXiv:2402.19422 (replaced) [pdf, other]
Title: PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Comments: CVPR 2024. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1037]  arXiv:2403.00170 (replaced) [pdf, other]
Title: AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph
Comments: 12 pages
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
[1038]  arXiv:2403.00540 (replaced) [pdf, other]
Title: Epsilon-Greedy Thompson Sampling to Bayesian Optimization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[1039]  arXiv:2403.01106 (replaced) [pdf, other]
Title: Distilling Text Style Transfer With Self-Explanation From LLMs
Comments: Accepted by NAACL Student Research Workshop 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1040]  arXiv:2403.01150 (replaced) [pdf, other]
Title: Singularity and Error Analysis of a Simple Quaternion Estimator
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)
[1041]  arXiv:2403.01296 (replaced) [pdf, ps, other]
Title: Rate-limited Shuffling for Distributed Computing
Comments: 6 pages and 2 figures
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
[1042]  arXiv:2403.01353 (replaced) [pdf, other]
Title: Spatially parallel decoding for multi-qubit lattice surgery
Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)
[1043]  arXiv:2403.01373 (replaced) [pdf, other]
Title: Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models
Comments: 10 pages
Subjects: Computation and Language (cs.CL)
[1044]  arXiv:2403.01384 (replaced) [pdf, other]
Title: On the Compressibility of Quantized Large Language Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1045]  arXiv:2403.02622 (replaced) [pdf, other]
Title: World Models for Autonomous Driving: An Initial Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[1046]  arXiv:2403.02795 (replaced) [pdf, other]
Title: Evaluating and Optimizing Educational Content with Large Language Model Judgments
Comments: 11 pages
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1047]  arXiv:2403.03134 (replaced) [pdf, other]
Title: Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
[1048]  arXiv:2403.03205 (replaced) [pdf, other]
Title: Finding Super-spreaders in Network Cascades
Comments: 32 pages, 3 figures. Main updates are (1) a relaxation of graph assumptions and (2) a slight sharpening of previous techniques that allows us to estimate the infection time of high-degree vertices from a single cascade
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Social and Information Networks (cs.SI); Probability (math.PR)
[1049]  arXiv:2403.03655 (replaced) [pdf, other]
Title: Kronos: A Secure and Generic Sharding Blockchain Consensus with Optimized Overhead
Subjects: Cryptography and Security (cs.CR)
[1050]  arXiv:2403.04161 (replaced) [pdf, other]
Title: SWAP-NAS: sample-wise activation patterns for ultra-fast NAS
Comments: ICLR2024 Spotlight
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[1051]  arXiv:2403.04306 (replaced) [pdf, other]
Title: Effectiveness Assessment of Recent Large Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[1052]  arXiv:2403.04307 (replaced) [pdf, other]
Title: HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
Subjects: Computation and Language (cs.CL)
[1053]  arXiv:2403.05370 (replaced) [pdf, other]
Title: On the Certification of the Kinematics of 3-DOF Spherical Parallel Manipulators
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[1054]  arXiv:2403.05771 (replaced) [pdf, other]
Title: Providing Safety Assurances for Systems with Unknown Dynamics
Comments: Submitted to L-CSS/CDC
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[1055]  arXiv:2403.05890 (replaced) [src]
Title: Towards Efficient Replay in Federated Incremental Learning
Comments: We are a collaborative article with a company, and the company needs to revise and verify the relevant information of the article before publishing it, including acknowledgments, authors, and some details
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[1056]  arXiv:2403.06172 (replaced) [pdf, other]
Title: Understanding Parents' Perceptions and Practices Toward Children's Security and Privacy in Virtual Reality
Comments: To appear in the IEEE Symposium on Security & Privacy (S&P), May 2024
Subjects: Human-Computer Interaction (cs.HC)
[1057]  arXiv:2403.06321 (replaced) [pdf, other]
Title: Vertex Block Descent
Subjects: Graphics (cs.GR)
[1058]  arXiv:2403.06659 (replaced) [pdf, other]
Title: Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
Comments: Accepted by ICML2024
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[1059]  arXiv:2403.08002 (replaced) [pdf, other]
Title: Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[1060]  arXiv:2403.08056 (replaced) [pdf, other]
Title: Improving Memory Dependence Prediction with Static Analysis
Comments: 15 pages
Subjects: Programming Languages (cs.PL); Hardware Architecture (cs.AR)
[1061]  arXiv:2403.08063 (replaced) [pdf, other]
Title: Towards Code Generation for Octree-Based Multigrid Solvers
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[1062]  arXiv:2403.08384 (replaced) [src]
Title: AADNet: Attention aware Demoiréing Network
Comments: Due to unauthorized access and upload, this paper has been withdrawn. It does not reflect the contributions or approval
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1063]  arXiv:2403.08694 (replaced) [pdf, other]
Title: TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning
Subjects: Computation and Language (cs.CL)
[1064]  arXiv:2403.09053 (replaced) [pdf, other]
Title: Towards a theory of model distillation
Comments: 46 pages, 5 figures. Please reach out with comments! Feedback is welcome
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[1065]  arXiv:2403.09858 (replaced) [pdf, other]
Title: FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections
Comments: arXiv admin note: substantial text overlap with arXiv:2312.03730
Subjects: Computation and Language (cs.CL)
[1066]  arXiv:2403.10318 (replaced) [pdf, other]
Title: Anytime Neural Architecture Search on Tabular Data
Subjects: Machine Learning (cs.LG)
[1067]  arXiv:2403.10522 (replaced) [pdf, other]
Title: Ordinal Classification with Distance Regularization for Robust Brain Age Prediction
Comments: Accepted in WACV 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[1068]  arXiv:2403.12421 (replaced) [pdf, other]
Title: Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy
Subjects: Robotics (cs.RO)
[1069]  arXiv:2403.14020 (replaced) [pdf, other]
Title: Zero-Knowledge Proof of Distinct Identity: a Standard-compatible Sybil-resistant Pseudonym Extension for C-ITS
Comments: Accepted for publication at IEEE IV 2024
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
[1070]  arXiv:2403.14176 (replaced) [pdf, other]
Title: ReFeree: Radar-based efficient global descriptor using a Feature and Free space for Place Recognition
Comments: 5 pages, 4 figures
Subjects: Robotics (cs.RO)
[1071]  arXiv:2403.15064 (replaced) [pdf, other]
Title: Recent Trends in 3D Reconstruction of General Non-Rigid Scenes
Comments: 42 pages, 18 figures, 5 tables; State-of-the-Art Report at EUROGRAPHICS 2024. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[1072]  arXiv:2403.15246 (replaced) [pdf, other]
Title: FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1073]  arXiv:2403.15400 (replaced) [pdf, other]
Title: Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections
Comments: 15 pages, 4, figures, presented at Voting'24. The current version includes some improved wording and fixes a few errors
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); Applications (stat.AP)
[1074]  arXiv:2403.15676 (replaced) [pdf, other]
Title: AC4: Algebraic Computation Checker for Circuit Constraints in ZKPs
Comments: 20 pages, 4 figures
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[1075]  arXiv:2403.16167 (replaced) [pdf, other]
Title: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[1076]  arXiv:2403.17141 (replaced) [pdf, other]
Title: MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1077]  arXiv:2403.18104 (replaced) [pdf, other]
Title: Mathematical Foundation and Corrections for Full Range Head Pose Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1078]  arXiv:2403.18415 (replaced) [src]
Title: The Topos of Transformer Networks
Comments: Requires major revision
Subjects: Machine Learning (cs.LG); Category Theory (math.CT)
[1079]  arXiv:2403.18453 (replaced) [pdf, other]
Title: Annotating Slack Directly on Your Verilog: Fine-Grained RTL Timing Evaluation for Early Optimization
Comments: Published as a conference paper at Design Automation Conference (DAC) 2024
Subjects: Hardware Architecture (cs.AR)
[1080]  arXiv:2403.18604 (replaced) [pdf, other]
Title: Modeling Sustainable City Trips: Integrating CO2e Emissions, Popularity, and Seasonality into Tourism Recommender Systems
Subjects: Information Retrieval (cs.IR)
[1081]  arXiv:2403.19680 (replaced) [pdf, other]
Title: A (1.999999)-approximation ratio for vertex cover problem
Subjects: Computational Complexity (cs.CC)
[1082]  arXiv:2403.19902 (replaced) [pdf, other]
Title: Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1083]  arXiv:2403.19924 (replaced) [pdf, other]
Title: SceneTracker: Long-term Scene Flow Estimation Network
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1084]  arXiv:2403.20288 (replaced) [pdf, other]
Title: Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain
Comments: Accepted for oral presentation at NAACL 2024, The 6th Clinical Natural Language Processing Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1085]  arXiv:2404.00897 (replaced) [pdf, ps, other]
Title: Machine Learning Robustness: A Primer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
[1086]  arXiv:2404.01078 (replaced) [pdf, other]
Title: Energy-based Model for Accurate Shapley Value Estimation in Interpretable Deep Learning Predictive Modeling
Subjects: Machine Learning (cs.LG)
[1087]  arXiv:2404.01568 (replaced) [pdf, other]
Title: A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)
Comments: ICML2024 Conference Paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)
[1088]  arXiv:2404.03191 (replaced) [pdf, other]
Title: CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1089]  arXiv:2404.03208 (replaced) [pdf, ps, other]
Title: HiMAL: A Multimodal Hierarchical Multi-task Auxiliary Learning framework for predicting and explaining Alzheimer disease progression
Comments: Currently under review in Journal of Medical Informatics Association (JAMIA). 6 figures, 3 tables
Subjects: Machine Learning (cs.LG)
[1090]  arXiv:2404.03295 (replaced) [pdf, other]
Title: The power of a single Haar random state: constructing and separating quantum pseudorandomness
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
[1091]  arXiv:2404.03304 (replaced) [pdf, other]
Title: Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Comments: 33 pages, 18 tables, and 10 figures. Our code is available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1092]  arXiv:2404.04069 (replaced) [pdf, other]
Title: Bidirectional Human Interactive AI Framework for Social Robot Navigation
Comments: Accepted by Robot Trust for Symbiotic Societies (RTSS) Workshop at ICRA 2024
Subjects: Robotics (cs.RO)
[1093]  arXiv:2404.04244 (replaced) [pdf, other]
Title: Fast Diffeomorphic Image Registration using Patch based Fully Convolutional Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1094]  arXiv:2404.04346 (replaced) [pdf, other]
Title: Koala: Key frame-conditioned long video-LLM
Comments: Accepted at CVPR 2024 as a poster highlight
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1095]  arXiv:2404.04653 (replaced) [pdf, other]
Title: HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene
Comments: Accepted by IEEE IV 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[1096]  arXiv:2404.04956 (replaced) [pdf, other]
Title: Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Comments: 17 pages, 11 figures, accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[1097]  arXiv:2404.05184 (replaced) [pdf, other]
Title: Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach
Comments: This is the version we re-submitted to the journal after addressing all the peer review requirements
Subjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG)
[1098]  arXiv:2404.05468 (replaced) [pdf, other]
Title: Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI
Comments: Pre-print to be updated. Work in progress
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1099]  arXiv:2404.05696 (replaced) [pdf, ps, other]
Title: BOLD v4: A Centralized Bioinformatics Platform for DNA-based Biodiversity Data
Subjects: Databases (cs.DB); Quantitative Methods (q-bio.QM)
[1100]  arXiv:2404.05717 (replaced) [pdf, other]
Title: SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Comments: 18 pages, 16 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1101]  arXiv:2404.06728 (replaced) [pdf, other]
Title: A Data Efficient Framework for Learning Local Heuristics
Comments: Accepted in the 17th International Symposium on Combinatorial Search (SoCS 2024)
Subjects: Robotics (cs.RO)
[1102]  arXiv:2404.06884 (replaced) [pdf, ps, other]
Title: Demand Private Coded Caching: the Two-File Case
Subjects: Information Theory (cs.IT)
[1103]  arXiv:2404.07831 (replaced) [pdf, ps, other]
Title: Protected QR Code-based Anti-counterfeit System for Pharmaceutical Manufacturing
Subjects: Cryptography and Security (cs.CR)
[1104]  arXiv:2404.08132 (replaced) [pdf, ps, other]
Title: Goppa Codes: Key to High Efficiency and Reliability in Communications
Comments: isnt complete
Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG)
[1105]  arXiv:2404.08472 (replaced) [pdf, other]
Title: TSLANet: Rethinking Transformers for Time Series Representation Learning
Comments: Accepted in ICML 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[1106]  arXiv:2404.09491 (replaced) [pdf, other]
Title: Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
Comments: Accepted to ICML, 2024
Subjects: Machine Learning (cs.LG)
[1107]  arXiv:2404.10346 (replaced) [pdf, other]
Title: Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
Comments: Preprint Under Review
Subjects: Computation and Language (cs.CL)
[1108]  arXiv:2404.10490 (replaced) [pdf, other]
Title: Enhancing Sign Language Teaching: A Mixed Reality Approach for Immersive Learning and Multi-Dimensional Feedback
Comments: 8 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1109]  arXiv:2404.10616 (replaced) [pdf, ps, other]
Title: One is all you need: Second-order Unification without First-order Variables
Comments: Under review
Subjects: Logic in Computer Science (cs.LO)
[1110]  arXiv:2404.10861 (replaced) [pdf, other]
Title: Trackable Agent-based Evolution Models at Wafer Scale
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC)
[1111]  arXiv:2404.11054 (replaced) [pdf, other]
Title: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1112]  arXiv:2404.11565 (replaced) [pdf, other]
Title: MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Comments: Project Website: this https URL, Same as previous version, only updated metadata because bib was missing an author name
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[1113]  arXiv:2404.11982 (replaced) [pdf, other]
Title: SIGformer: Sign-aware Graph Transformer for Recommendation
Comments: Accepted by SIGIR2024
Subjects: Information Retrieval (cs.IR)
[1114]  arXiv:2404.12135 (replaced) [pdf, other]
Title: mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture
Subjects: Multiagent Systems (cs.MA); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[1115]  arXiv:2404.12390 (replaced) [pdf, other]
Title: BLINK: Multimodal Large Language Models Can See but Not Perceive
Comments: Multimodal Benchmark, Project Url: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1116]  arXiv:2404.12678 (replaced) [src]
Title: Exploring Interactive Semantic Alignment for Efficient HOI Detection with Vision-language Model
Comments: There are issues with the experimental results
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1117]  arXiv:2404.12725 (replaced) [pdf, other]
Title: Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
Authors: Zhaoxi Mu, Xinyu Yang
Comments: Accepted by IJCAI 2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[1118]  arXiv:2404.13179 (replaced) [pdf, other]
Title: When Computing follows Vehicles: Decentralized Mobility-Aware Resource Allocation for Edge-to-Cloud Continuum
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
[1119]  arXiv:2404.13288 (replaced) [pdf, other]
Title: PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[1120]  arXiv:2404.13624 (replaced) [pdf, ps, other]
Title: Necessary and Sufficient Conditions for Capacity-Achieving Private Information Retrieval with Non-Colluding and Colluding Servers
Comments: 16 pages
Subjects: Information Theory (cs.IT)
[1121]  arXiv:2404.13733 (replaced) [pdf, other]
Title: Elucidating the Design Space of Dataset Condensation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[1122]  arXiv:2404.13872 (replaced) [pdf, other]
Title: FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1123]  arXiv:2404.13873 (replaced) [pdf, other]
Title: Texture-aware and Shape-guided Transformer for Sequential DeepFake Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1124]  arXiv:2404.13904 (replaced) [pdf, other]
Title: Deep Regression Representation Learning with Topology
Comments: ICML2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[1125]  arXiv:2404.13945 (replaced) [pdf, other]
Title: How do LLMs Support Deep Learning Testing? A Comprehensive Study Through the Lens of Image Mutation
Subjects: Software Engineering (cs.SE)
[1126]  arXiv:2404.14066 (replaced) [pdf, other]
Title: SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[1127]  arXiv:2404.14146 (replaced) [pdf, ps, other]
Title: Physics-based reward driven image analysis in microscopy
Comments: 12 pages, 4 figures
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
[1128]  arXiv:2404.14162 (replaced) [pdf, other]
Title: FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on
Comments: Accepted by IJCAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1129]  arXiv:2404.14665 (replaced) [pdf, other]
Title: Illuminating the Unseen: Investigating the Context-induced Harms in Behavioral Sensing
Comments: 26 pages, 8 tables, and 1 figure (excluding appendix)
Subjects: Human-Computer Interaction (cs.HC)
[1130]  arXiv:2404.15001 (replaced) [pdf, other]
Title: Unknown Object Grasping for Assistive Robotics
Comments: 7 pages, 9 figures
Subjects: Robotics (cs.RO)
[1131]  arXiv:2404.15135 (replaced) [pdf, ps, other]
Title: Linear-Function Correcting Codes
Comments: Some minor errors corrected. 11 pages and 3 figures
Subjects: Information Theory (cs.IT)
[1132]  arXiv:2404.15777 (replaced) [pdf, ps, other]
Title: A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry
Comments: 28 pages
Subjects: Computation and Language (cs.CL)
[1133]  arXiv:2404.15855 (replaced) [pdf, ps, other]
Title: Taking Bi-Intuitionistic Logic First-Order: A Proof-Theoretic Investigation via Polytree Sequents
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
[1134]  arXiv:2404.16471 (replaced) [pdf, other]
Title: COBRA - COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1135]  arXiv:2404.16663 (replaced) [pdf, other]
Title: Formal Specification, Assessment, and Enforcement of Fairness for Generative AIs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
[1136]  arXiv:2404.16706 (replaced) [pdf, other]
Title: Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[1137]  arXiv:2404.16895 (replaced) [pdf, other]
Title: QuERLoc: Towards Next-Generation Localization with Quantum-Enhanced Ranging
Subjects: Emerging Technologies (cs.ET)
[1138]  arXiv:2404.17465 (replaced) [pdf, ps, other]
Title: Fast Abstracts and Student Forum Proceedings -- EDCC 2024 -- 19th European Dependable Computing Conference
Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Robotics (cs.RO)
[1139]  arXiv:2404.17723 (replaced) [pdf, other]
Title: Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1140]  arXiv:2404.17781 (replaced) [pdf, ps, other]
Title: A Value-Oriented Investigation of Photoshop's Generative Fill
Subjects: Human-Computer Interaction (cs.HC)
[1141]  arXiv:2404.18030 (replaced) [pdf, other]
Title: Parallel Adaptive Anisotropic Meshing on cc-NUMA Machines
Subjects: Computational Geometry (cs.CG)
[1142]  arXiv:2404.18074 (replaced) [pdf, other]
Title: MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot
Comments: In processing
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[1143]  arXiv:2404.18253 (replaced) [pdf, other]
Title: Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Authors: Tengjun Huang
Comments: Accepted by the Twelfth International Conference on Learning Representations (ICLR) Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1144]  arXiv:2404.18255 (replaced) [pdf, other]
Title: PatentGPT: A Large Language Model for Intellectual Property
Comments: 19 pages, 9 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[1145]  arXiv:2404.18311 (replaced) [pdf, ps, other]
Title: Towards Incremental Learning in Large Language Models: A Critical Review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[1146]  arXiv:2404.18316 (replaced) [pdf, other]
Title: Position paper: Do not explain (vision models) without context
Comments: Accepted for ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[1147]  arXiv:2404.18321 (replaced) [pdf, other]
Title: Riemannian Optimization for Active Mapping with Robot Teams
Subjects: Robotics (cs.RO)
[1148]  arXiv:2404.18405 (replaced) [pdf, ps, other]
Title: Understanding and Shaping Human-Technology Assemblages in the Age of Generative AI
Subjects: Human-Computer Interaction (cs.HC)
[1149]  arXiv:2404.18423 (replaced) [pdf, other]
Title: Unsupervised Dynamics Prediction with Object-Centric Kinematics
Comments: 15 pages, 6 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1150]  arXiv:2404.18530 (replaced) [pdf, other]
Title: Predicting PDEs Fast and Efficiently with Equivariant Extreme Learning Machines
Subjects: Machine Learning (cs.LG)
[1151]  arXiv:2404.18654 (replaced) [pdf, other]
Title: A Scoping Review on Simulation-based Design Optimization in Marine Engineering: Trends, Best Practices, and Gaps
Subjects: Optimization and Control (math.OC); Computational Engineering, Finance, and Science (cs.CE)
[1152]  arXiv:2404.18909 (replaced) [pdf, other]
Title: Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
Comments: Accepted by Conference on Neural Information Processing Systems (NeurIPS), 2024
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[1153]  arXiv:2404.18947 (replaced) [pdf, other]
Title: Multimodal Fusion on Low-quality Data: A Comprehensive Survey
Comments: Feel free to comment on our manuscript: qingyangzhang@tju.edu.cn
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1154]  arXiv:2404.19028 (replaced) [pdf, other]
Title: Adaptive Regulated Sparsity Promoting Approach for Data-Driven Modeling and Control of Grid-Connected Solar Photovoltaic Generation
Subjects: Systems and Control (eess.SY)
[1155]  arXiv:2404.19500 (replaced) [pdf, other]
Title: Towards Real-world Video Face Restoration: A New Benchmark
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[1156]  arXiv:2404.19556 (replaced) [pdf, ps, other]
Title: A logarithmic approximation of linearly-ordered colourings
Comments: This paper is a merger of independent work by H{\aa}stad and Martinsson, and by Nakajima and \v{Z}ivn\'y respectively
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[1157]  arXiv:2404.19652 (replaced) [pdf, other]
Title: VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1158]  arXiv:2404.19713 (replaced) [pdf, ps, other]
Title: Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models
Authors: Scott Sumpter
Comments: 22 pages but 12 are appendices which are examples of the main text. 3 figures, 4 tables
Subjects: Computation and Language (cs.CL)
[1159]  arXiv:2405.00136 (replaced) [pdf, other]
Title: Data-Driven Permissible Safe Control with Barrier Certificates
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
[1160]  arXiv:2405.00181 (replaced) [pdf, other]
Title: Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Comments: Accepted in CVPR2024, Codebase: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1161]  arXiv:2405.00316 (replaced) [pdf, other]
Title: Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[1162]  arXiv:2405.00344 (replaced) [pdf, other]
Title: Expert Insight-Enhanced Follow-up Chest X-Ray Summary Generation
Comments: accepted by 22nd International Conference on Artificial Intelligence in medicine (AIME2024)
Subjects: Multimedia (cs.MM)
[1163]  arXiv:2405.00349 (replaced) [pdf, other]
Title: A Self-explaining Neural Architecture for Generalizable Concept Learning
Comments: IJCAI 2024. 16 pages (7 main content, 2 references, 7 Appendix) Code available at this https URL
Subjects: Machine Learning (cs.LG)
[1164]  arXiv:2405.00712 (replaced) [pdf, other]
Title: SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[1165]  arXiv:2405.00727 (replaced) [pdf, other]
Title: Generalised envelope spectrum-based signal-to-noise objectives: Formulation, optimisation and application for gear fault detection under time-varying speed conditions
Comments: 27 pages, 15 figures, tables 1, submitted MSSP review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Methodology (stat.ME)
[1166]  arXiv:2405.00740 (replaced) [pdf, other]
Title: Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[1167]  arXiv:2405.00801 (replaced) [pdf, ps, other]
Title: "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time
Subjects: Computation and Language (cs.CL)
[1168]  arXiv:2405.00900 (replaced) [pdf, other]
Title: LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
Comments: CVPR2024 Highlights
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1169]  arXiv:2405.00914 (replaced) [pdf, other]
Title: Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization
Authors: Chris Junchi Li
Comments: Minor typographical updates. arXiv admin note: text overlap with arXiv:2307.00126
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[1170]  arXiv:2405.01029 (replaced) [pdf, other]
Title: MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Comments: Accepted at ICML 2024
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[1171]  arXiv:2405.01066 (replaced) [pdf, other]
Title: HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images
Comments: 12 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[1172]  arXiv:2405.01103 (replaced) [pdf, other]
Title: LLM Security Guard for Code
Comments: SECUTE, EASE 2024
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
[1173]  arXiv:2405.01111 (replaced) [pdf, other]
Title: Mining REST APIs for Potential Mass Assignment Vulnerabilities
Comments: EASE 2024
Subjects: Cryptography and Security (cs.CR)
[1174]  arXiv:2405.01196 (replaced) [pdf, other]
Title: Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Comments: Proceedings of the 41 st International Conference on Machine Learning (ICML) 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[1175]  arXiv:2405.01242 (replaced) [pdf, other]
Title: TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[1176]  arXiv:2405.01460 (replaced) [pdf, other]
Title: Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders
Comments: Accepted by ICML 2024
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1177]  arXiv:2405.01491 (replaced) [pdf, other]
Title: FeNNol: an Efficient and Flexible Library for Building Force-field-enhanced Neural Network Potentials
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)
[1178]  arXiv:2405.01510 (replaced) [pdf, other]
Title: Reverse Influential Community Search Over Social Networks (Technical Report)
Subjects: Social and Information Networks (cs.SI); Databases (cs.DB)
[1179]  arXiv:2405.01555 (replaced) [pdf, ps, other]
Title: Digital Twin-Empowered Task Assignment in Aerial MEC Network: A Resource Coalition Cooperation Approach with Generative Model
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
[1180]  arXiv:2405.01558 (replaced) [pdf, other]
Title: Configurable Learned Holography
Comments: 14 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)
[1181]  arXiv:2405.01673 (replaced) [pdf, other]
Title: ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness
Comments: 21 pages, 13 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[1182]  arXiv:2405.01699 (replaced) [pdf, other]
Title: SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients
Comments: 7 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[1183]  arXiv:2405.01719 (replaced) [pdf, other]
Title: Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Comments: To be published in ICML 2024
Subjects: Machine Learning (cs.LG)
[1184]  arXiv:2405.01726 (replaced) [pdf, ps, other]
Title: SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[1185]  arXiv:2405.01765 (replaced) [pdf, other]
Title: Early years of Biased Random-Key Genetic Algorithms: A systematic review
Comments: 24 pages, 9 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
[1186]  arXiv:2405.01775 (replaced) [pdf, other]
Title: Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Comments: Accepted for publication at MLSys 2024
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
[1187]  arXiv:2405.01798 (replaced) [pdf, other]
Title: The Economy and Public Diplomacy: An Analysis of RT's Economic Content and Context on Facebook
Comments: 14 pages, 6 figures
Subjects: Information Theory (cs.IT); General Economics (econ.GN)
[1188]  arXiv:2405.01803 (replaced) [pdf, other]
Title: How to Gain Commit Rights in Modern Top Open Source Communities?
Comments: 23 pages,5 figures,FSE 2024
Journal-ref: Proceedings of the ACM on Software Engineering (PACMSE) Issue FSE 2024
Subjects: Software Engineering (cs.SE)
[1189]  arXiv:2405.01843 (replaced) [pdf, ps, other]
Title: Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization
Comments: Accepted at ICML 2024. This is a revised version of arXiv:2306.10486, where we have gone from finite action space to continuous action space, from average iterate convergence to last iterate convergence and from $\epsilon^{-4}$ to $\epsilon^{-3}$ sample complexity
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[1190]  arXiv:2405.01881 (replaced) [pdf, ps, other]
Title: Explainable Risk Classification in Financial Reports
Comments: ICIS 2023 Proceedings. 3. this https URL
Subjects: Risk Management (q-fin.RM); Machine Learning (cs.LG)
[1191]  arXiv:2405.01972 (replaced) [pdf, other]
Title: A quantitative and typological study of Early Slavic participle clauses and their competition
Authors: Nilo Pedrazzini
Comments: 259 pages, 138 figures. DPhil Thesis in Linguistics submitted and defended at the University of Oxford (December 2023). This manuscript is a version formatted for improved readability and broader dissemination
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[1192]  arXiv:2405.02023 (replaced) [pdf, other]
Title: IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[1193]  arXiv:2405.02106 (replaced) [pdf, ps, other]
Title: Got Root? A Linux Priv-Esc Benchmark
Comments: arXiv admin note: text overlap with arXiv:2310.11409
Subjects: Cryptography and Security (cs.CR)
[1194]  arXiv:2405.02132 (replaced) [pdf, other]
Title: Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[1195]  arXiv:2405.02179 (replaced) [pdf, other]
Title: Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[ total of 1195 entries: 1-1195 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)