We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 149 entries: 1-149 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 7 May 24

[1]  arXiv:2405.02316 [pdf, ps, other]
Title: A Cloud-Edge Framework for Energy-Efficient Event-Driven Control: An Integration of Online Supervised Learning, Spiking Neural Networks and Local Plasticity Rules
Comments: 13 pages, 19 figures
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

This paper presents a novel cloud-edge framework for addressing computational and energy constraints in complex control systems. Our approach centers around a learning-based controller using Spiking Neural Networks (SNN) on physical plants. By integrating a biologically plausible learning method with local plasticity rules, we harness the efficiency, scalability, and low latency of SNNs. This design replicates control signals from a cloud-based controller directly on the plant, reducing the need for constant plant-cloud communication. The plant updates weights only when errors surpass predefined thresholds, ensuring efficiency and robustness in various conditions. Applied to linear workbench systems and satellite rendezvous scenarios, including obstacle avoidance, our architecture dramatically lowers normalized tracking error by 96% with increased network size. The event-driven nature of SNNs minimizes energy consumption, utilizing only about 111 nJ (0.3% of conventional computing requirements). The results demonstrate the system's adjustment to changing work environments and its efficient use of computational and energy resources, with a moderate increase in energy consumption of 27.2% and 37% for static and dynamic obstacles, respectively, compared to non-obstacle scenarios.

[2]  arXiv:2405.02361 [pdf, other]
Title: Technical report on target classification in SAR track
Comments: arXiv admin note: text overlap with arXiv:2310.06221, arXiv:2111.12797 by other authors
Subjects: Image and Video Processing (eess.IV)

This report proposes a robust method for classifying oceanic and atmospheric phenomena using synthetic aperture radar (SAR) imagery. Our proposed method leverages the powerful pre-trained model Swin Transformer v2 Large as the backbone and employs carefully designed data augmentation and exponential moving average during training to enhance the model's generalization capability and stability. In the testing stage, a method called ReAct is utilized to rectify activation values and utilize Energy Score for more accurate measurement of model uncertainty, significantly improving out-of-distribution detection performance. Furthermore, test time augmentation is employed to enhance classification accuracy and prediction stability. Comprehensive experimental results demonstrate that each additional technique significantly improves classification accuracy, confirming their effectiveness in classifying maritime and atmospheric phenomena in SAR imagery.

[3]  arXiv:2405.02362 [pdf, other]
Title: Solution for Authenticity Identification of Typical Target Remote Sensing Images
Subjects: Image and Video Processing (eess.IV)

In this paper, we propose a basic RGB single-mode model based on weakly supervised training under pseudo labels, which performs high-precision authenticity identification under multi-scene typical target remote sensing images. Due to the imprecision of Mask generation, we divide the task into two sub-tasks: generating pseudo-mask and fine-tuning model based on generated Masks. In generating pseudo masks, we use MM-Fusion as the base model to generate masks for large objects such as planes and ships. By manually calibrating the Mask of a small object such as a car, a highly accurate pseudo-mask is obtained. For the task of fine-tuning models based on generating masks, we use the WSCL model as the base model. It is worth noting that due to the difference between the generated pseudo-Masks and the real Masks, we discard the image feature extractors such as SRM and Noiseprint++ in WSCL, and select the unscaled original image for training alone, which greatly ensures the match between the image and the original label. The final trained model achieved a score of 90.7702 on the test set.

[4]  arXiv:2405.02422 [pdf, other]
Title: Precision Enhancement in Sustained Visual Attention Training Platforms: Offline EEG Signal Analysis for Classifier Fine-Tuning
Comments: 5 pages, 3 figures, 18 references, EMBC conference
Subjects: Signal Processing (eess.SP)

In this study, a novel open-source brain-computer interface (BCI) platform was developed to decode scalp electroencephalography (EEG) signals associated with sustained attention. The EEG signal collection was conducted using a wireless headset during a sustained visual attention task, where participants were instructed to discriminate between composite images superimposed with scenes and faces, responding only to the relevant subcategory while ignoring the irrelevant ones. Seven volunteers participated in this experiment. The data collected were subjected to analyses through event-related potential (ERP), Hilbert Transform, and Wavelet Transform to extract temporal and spectral features. For each participant, utilizing its extracted features, personalized Support Vector Machine (SVM) and Random Forest (RF) models with tuned hyperparameters were developed. The models aimed to decode the participant's attentional state towards the face and scene stimuli. The SVM models achieved a higher average accuracy of 80\% and an Area Under the Curve (AUC) of 0.86, while the RF models showed an average accuracy of 78\% and AUC of 0.8. This work suggests potential applications for the evaluation of visual attention and the development of closed-loop brainwave regulation systems in the future.

[5]  arXiv:2405.02487 [pdf, other]
Title: Stable Distributed Online Feedback Optimization for Distribution System Voltage Regulation
Subjects: Systems and Control (eess.SY)

We investigate the distributed voltage regulation problem in distribution systems employing online feedback optimization and short-range communication between physical neighbours. We show that a two-metric approach can be unstable. As a remedy, we propose a nested feedback optimization strategy. Simulation results reveal that while the two-metric approach fails to regulate voltages, the proposed approach achieves even less voltage limit violations than its centralized counterpart.

[6]  arXiv:2405.02504 [pdf, other]
Title: Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain functional information conveyed by PET. To this end, we propose a functional imaging constrained diffusion (FICD) framework for 3D brain PET image synthesis with paired structural MRI as input condition, through a new constrained diffusion model (CDM). The FICD introduces noise to PET and then progressively removes it with CDM, ensuring high output fidelity throughout a stable training phase. The CDM learns to predict denoised PET with a functional imaging constraint introduced to ensure voxel-wise alignment between each denoised PET and its ground truth. Quantitative and qualitative analyses conducted on 293 subjects with paired T1-weighted MRI and 18F-fluorodeoxyglucose (FDG)-PET scans suggest that FICD achieves superior performance in generating FDG-PET data compared to state-of-the-art methods. We further validate the effectiveness of the proposed FICD on data from a total of 1,262 subjects through three downstream tasks, with experimental results suggesting its utility and generalizability.

[7]  arXiv:2405.02510 [pdf, other]
Title: Low-cost sensors and circuits for plasma education: characterizing power and illuminance
Subjects: Systems and Control (eess.SY); Plasma Physics (physics.plasm-ph)

Industrial applications of plasma have significantly increased beyond semiconductor manufacturing in recent years. This necessitates training a skilled workforce in plasma science and technology. However, an essential challenge to this end stems from the high cost of plasma devices and diagnostics. The limited access to plasma devices has hindered plasma education, particularly in the least developed countries. To this end, this paper demonstrates how low-cost sensors and circuits can be developed to enable inexpensive plasma experiments in laboratory environments. In particular, we show how to measure high voltage, current, and power from a cold-atmospheric plasma discharge. Additionally, we develop a low-cost illuminance sensor and demonstrate how it can be used to estimate the corresponding plasma power. The low-cost sensors and electronics presented in this paper can aid educators in characterizing plasma power versus plasma illuminance.

[8]  arXiv:2405.02511 [pdf, other]
Title: Incremental Volt/Var Control for Distribution Networks via Chance-Constrained Optimization
Subjects: Systems and Control (eess.SY)

This paper considers an incremental Volt/Var control scheme for distribution systems with high integration of inverter-interfaced distributed generation (such as photovoltaic systems). The incremental Volt/Var controller is implemented with the objective of minimizing reactive power usage while maintaining voltages within safe limits sufficiently often. To this end, the parameters of the incremental Volt/Var controller are obtained by solving a chance-constrained optimization problem, where constraints are designed to ensure that voltage violations do not occur more often than a pre-specified probability. This approach leads to cost savings in a controlled, predictable way, while still avoiding significant over- or under-voltage issues. The proposed chance-constrained problem is solved using a successive convex approximation method. Once the gains are broadcast to the inverters, no additional communication is required since the controller is implemented locally at the inverters. The proposed method is successfully tested on a low-voltage 42-nodes network.

[9]  arXiv:2405.02537 [pdf, other]
Title: A Robust Data-Driven Iterative Control Method for Linear Systems with Bounded Disturbances
Authors: Kaijian Hu, Tao Liu
Subjects: Systems and Control (eess.SY)

This paper proposes a new robust data-driven control method for linear systems with bounded disturbances, where the system model and disturbances are unknown. Due to disturbances, accurately determining the true system becomes challenging using the collected dataset. Therefore, instead of designing controllers directly for the unknown true system, an available approach is to design controllers for all systems compatible with the dataset. To overcome the limitations of using a single dataset and benefit from collecting more data, multiple datasets are employed in this paper. Furthermore, a new iterative method is developed to address the challenges of using multiple datasets. Based on this method, this paper develops an offline and online robust data-driven iterative control method, respectively. Compared to the existing robust data-driven controller method, both proposed control methods iteratively utilize multiple datasets in the controller design process. This allows for the incorporation of numerous datasets, potentially reducing the conservativeness of the designed controller. Particularly, the online controller is iteratively designed by continuously incorporating online collected data into the historical data to construct new datasets. Lastly, the effectiveness of the proposed methods is demonstrated using a batch reactor.

[10]  arXiv:2405.02563 [pdf, other]
Title: Deep Representation Learning-Based Dynamic Trajectory Phenotyping for Acute Respiratory Failure in Medical Intensive Care Units
Comments: 9 pages
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Sepsis-induced acute respiratory failure (ARF) is a serious complication with a poor prognosis. This paper presents a deep representation learningbased phenotyping method to identify distinct groups of clinical trajectories of septic patients with ARF. For this retrospective study, we created a dataset from electronic medical records (EMR) consisting of data from sepsis patients admitted to medical intensive care units who required at least 24 hours of invasive mechanical ventilation at a quarternary care academic hospital in southeast USA for the years 2016-2021. A total of N=3349 patient encounters were included in this study. Clustering Representation Learning on Incomplete Time Series Data (CRLI) algorithm was applied to a parsimonious set of EMR variables in this data set. To validate the optimal number of clusters, the K-means algorithm was used in conjunction with dynamic time warping. Our model yielded four distinct patient phenotypes that were characterized as liver dysfunction/heterogeneous, hypercapnia, hypoxemia, and multiple organ dysfunction syndrome by a critical care expert. A Kaplan-Meier analysis to compare the 28-day mortality trends exhibited significant differences (p < 0.005) between the four phenotypes. The study demonstrates the utility of our deep representation learning-based approach in unraveling phenotypes that reflect the heterogeneity in sepsis-induced ARF in terms of different mortality outcomes and severity. These phenotypes might reveal important clinical insights into an effective prognosis and tailored treatment strategies.

[11]  arXiv:2405.02567 [pdf, other]
Title: TiRE-GAN: Task-Incentivized Generative Learning Models for Radiomap Estimation with Radio Propagation Model
Subjects: Signal Processing (eess.SP)

Enriching geometric information on radio frequency (RF) signal power distribution in wireless communication systems, the radiomap has become an essential tool for resource allocation and network management. Usually, a dense radiomap is reconstructed from sparse observations collected by deployed sensors or mobile devices, which makes the radiomap estimation an urgent challenge. To leverage both physical principles of radio propagation models and data statistics from sparse observations, this work introduces a novel task-incentivized generative learning model, namely TiRE-GAN, for radiomap estimation. Specifically, we first introduce a radio depth map as input to capture the overall pattern of radio propagation and shadowing effects, following which a task-driven incentive network is proposed to provide feedback for radiomap compensation depending on downstream tasks. Our experimental results demonstrate the power of the radio depth map to capture radio propagation information, together with the efficiency of the proposed TiRE-GAN for radiomap estimation.

[12]  arXiv:2405.02633 [pdf, other]
Title: Risk Assessment for Nonlinear Cyber-Physical Systems under Stealth Attacks
Comments: 12 pages and 9 figures
Subjects: Systems and Control (eess.SY)

Stealth attacks pose potential risks to cyber-physical systems because they are difficult to detect. Assessing the risk of systems under stealth attacks remains an open challenge, especially in nonlinear systems. To comprehensively quantify these risks, we propose a framework that considers both the reachability of a system and the risk distribution of a scenario. We propose an algorithm to approximate the reachability of a nonlinear system under stealth attacks with a union of standard sets. Meanwhile, we present a method to construct a risk field to formally describe the risk distribution in a given scenario. The intersection relationships of system reachability and risk regions in the risk field indicate that attackers can cause corresponding risks without being detected. Based on this, we introduce a metric to dynamically quantify the risk. Compared to traditional methods, our framework predicts the risk value in an explainable way and provides early warnings for safety control. We demonstrate the effectiveness of our framework through a case study of an automated warehouse.

[13]  arXiv:2405.02643 [pdf, other]
Title: EM-based Algorithm for Unsupervised Clustering of Measurements from a Radar Sensor Network
Comments: 12 pages 14 figures
Subjects: Signal Processing (eess.SP)

This paper deals with the problem of clustering data returned by a radar sensor network that monitors a region where multiple moving targets are present. The network is formed by nodes with limited functionalities that transmit the estimates of target positions (after a detection) to a fusion center without any association between measurements and targets. To solve the problem at hand, we resort to model-based learning algorithms and instead of applying the plain maximum likelihood approach, due to the related computational requirements, we exploit the latent variable model coupled with the expectation-maximization algorithm. The devised estimation procedure returns posterior probabilities that are used to cluster the huge amount of data collected by the fusion center. Remarkably, we also consider challenging scenarios with an unknown number of targets and estimate it by means of the model order selection rules. The clustering performance of the proposed strategy is compared to that of conventional data-driven methods over synthetic data. The numerical examples point out that the herein proposed solutions can provide reliable clustering performance overcoming the considered competitors.

[14]  arXiv:2405.02718 [pdf, other]
Title: Zak-OTFS: Pulse Shaping and the Tradeoff between Time/Bandwidth Expansion and Predictability
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

The Zak-OTFS input/output (I/O) relation is predictable and non-fading when the delay and Doppler periods are greater than the effective channel delay and Doppler spreads, a condition which we refer to as the crystallization condition. When the crystallization condition is satisfied, we describe how to integrate sensing and communication within a single Zak-OTFS subframe by transmitting a pilot in the center of the subframe and surrounding the pilot with a pilot region and guard band to mitigate interference between data symbols and pilot. At the receiver we first read off the effective channel taps within the pilot region, and then use the estimated channel taps to recover the data from the symbols received outside the pilot region. We introduce a framework for filter design in the delay-Doppler (DD) domain where the symplectic Fourier transform connects aliasing in the DD domain (predictability of the I/O relation) with time/bandwidth expansion. The choice of pulse shaping filter determines the fraction of pilot energy that lies outside the pilot region and the degradation in BER performance that results from the interference to data symbols. We demonstrate that Gaussian filters in the DD domain provide significant improvements in BER performance over the sinc and root raised cosine filters considered in previous work. We also demonstrate that, by limiting DD domain aliasing, Gaussian filters extend the region where the crystallization condition is satisfied. The Gaussian filters considered in this paper are a particular case of factorizable pulse shaping filters in the DD domain, and this family of filters may be of independent interest.

[15]  arXiv:2405.02741 [pdf, other]
Title: Activity Detection for Massive Random Access using Covariance-based Matching Pursuit
Comments: submitted to IEEE IoT journal
Subjects: Signal Processing (eess.SP)

The Internet of Things paradigm heavily relies on a network of a massive number of machine-type devices (MTDs) that monitor changes in various phenomena. Consequently, MTDs are randomly activated at different times whenever a change occurs. This essentially results in relatively few MTDs being active simultaneously compared to the entire network, resembling targeted sampling in compressed sensing. Therefore, signal recovery in machine-type communications is addressed through joint user activity detection and channel estimation algorithms built using compressed sensing theory. However, most of these algorithms follow a two-stage procedure in which a channel is first estimated and later mapped to find active users. This approach is inefficient because the estimated channel information is subsequently discarded. To overcome this limitation, we introduce a novel covariance-learning matching pursuit algorithm that bypasses explicit channel estimation. Instead, it focuses on estimating the indices of the active users greedily. Simulation results presented in terms of probability of miss detection, exact recovery rate, and computational complexity validate the proposed technique's superior performance and efficiency.

[16]  arXiv:2405.02784 [pdf, other]
Title: MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

A transformer-based deep learning model, MR-Transformer, was developed for total knee replacement (TKR) prediction using magnetic resonance imaging (MRI). The model incorporates the ImageNet pre-training and captures three-dimensional (3D) spatial correlation from the MR images. The performance of the proposed model was compared to existing state-of-the-art deep learning models for knee injury diagnosis using MRI. Knee MR scans of four different tissue contrasts from the Osteoarthritis Initiative and Multicenter Osteoarthritis Study databases were utilized in the study. Experimental results demonstrated the state-of-the-art performance of the proposed model on TKR prediction using MRI.

[17]  arXiv:2405.02788 [pdf, other]
Title: Antenna Failure Resilience: Deep Learning-Enabled Robust DOA Estimation with Single Snapshot Sparse Arrays
Comments: Invited paper for IEEE Asilomar conference 2024
Subjects: Signal Processing (eess.SP)

Recent advancements in Deep Learning (DL) for Direction of Arrival (DOA) estimation have highlighted its superiority over traditional methods, offering faster inference, enhanced super-resolution, and robust performance in low Signal-to-Noise Ratio (SNR) environments. Despite these advancements, existing research predominantly focuses on multi-snapshot scenarios, a limitation in the context of automotive radar systems which demand high angular resolution and often rely on limited snapshots, sometimes as scarce as a single snapshot. Furthermore, the increasing interest in sparse arrays for automotive radar, owing to their cost-effectiveness and reduced antenna element coupling, presents additional challenges including susceptibility to random sensor failures. This paper introduces a pioneering DL framework featuring a sparse signal augmentation layer, meticulously crafted to bolster single snapshot DOA estimation across diverse sparse array setups and amidst antenna failures. To our best knowledge, this is the first work to tackle this issue. Our approach improves the adaptability of deep learning techniques to overcome the unique difficulties posed by sparse arrays with single snapshot. We conduct thorough evaluations of our network's performance using simulated and real-world data, showcasing the efficacy and real-world viability of our proposed solution. The code and real-world dataset employed in this study are available at https://github.com/ruxinzh/Deep_RSA_DOA.

[18]  arXiv:2405.02809 [pdf, other]
Title: Does Optimal Control Always Benefit from Better Prediction? An Analysis Framework for Predictive Optimal Control
Subjects: Systems and Control (eess.SY)

The ``prediction + optimal control'' scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer questions like ``what if the predictions are wrong''. This paper presents an analysis framework for predictive optimal control where the subjective belief about the future is no longer considered perfect. A novel concept called the hidden prediction state is proposed to establish connections among the predictors, the subjective beliefs, the control policies and the objective control performance. Based on this framework, the predictor evaluation problem is analyzed. Three commonly-used predictor evaluation measures, including the mean squared error, the regret and the log-likelihood, are considered. It is shown that neither using the mean square error nor using the likelihood can guarantee a monotonic relationship between the predictor error and the optimal control cost. To guarantee control cost improvement, it is suggested the predictor should be evaluated with the control performance, e.g., using the optimal control cost or the regret to evaluate predictors. Numerical examples and examples from automotive applications with real-world driving data are provided to illustrate the ideas and the results.

[19]  arXiv:2405.02813 [pdf, ps, other]
Title: Model Predictive Control for Joint Ramping and Regulation-Type Service from Distributed Energy Resource Aggregations
Comments: 10 pages, 3 figures, to be presented at IEEE PES GM 2024
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Distributed energy resources (DERs) such as grid-responsive loads and batteries can be harnessed to provide ramping and regulation services across the grid. This paper concerns the problem of optimal allocation of different classes of DERs, where each class is an aggregation of similar DERs, to balance net-demand forecasts. The resulting resource allocation problem is solved using model-predictive control (MPC) that utilizes a rolling sequence of finite time-horizon constrained optimizations. This is based on the concept that we have more accurate estimates of the load forecast in the short term, so each optimization in the rolling sequence of optimization problems uses more accurate short term load forecasts while ensuring satisfaction of capacity and dynamical constraints. Simulations demonstrate that the MPC solution can indeed reduce the ramping required from bulk generation, while mitigating near-real time grid disturbances.

[20]  arXiv:2405.02825 [pdf, other]
Title: An Enhanced Dynamic Ray Tracing Architecture for Channel Prediction Based on Multipath Bidirectional Geometry and Field Extrapolation
Subjects: Signal Processing (eess.SP)

With the development of sixth generation (6G) networks toward digitalization and intelligentization of communications, rapid and precise channel prediction is crucial for the network potential release. Interestingly, a dynamic ray tracing (DRT) approach for channel prediction has recently been proposed, which utilizes the results of traditional RT to extrapolate the multipath geometry evolution. However, both the priori environmental data and the regularity in multipath evolution can be further utilized. In this work, an enhanced-dynamic ray tracing (E-DRT) algorithm architecture based on multipath bidirectional extrapolation has been proposed. In terms of accuracy, all available environment information is utilized to predict the birth and death processes of multipath components (MPCs) through bidirectional geometry extrapolation. In terms of efficiency, bidirectional electric field extrapolation is employed based on the evolution regularity of the MPCs' electric field. The results in a Vehicle-to-Vehicle (V2V) scenario show that E-DRT improves the accuracy of the channel prediction from 68.3% to 94.8% while reducing the runtime by 7.2% compared to DRT.

[21]  arXiv:2405.02827 [pdf, other]
Title: Probabilistic tube-based control synthesis of stochastic multi-agent systems under signal temporal logic
Comments: Submitted to CDC24
Subjects: Systems and Control (eess.SY)

We consider the control design of stochastic discrete-time linear multi-agent systems (MASs) under a global signal temporal logic (STL) specification to be satisfied at a predefined probability. By decomposing the dynamics into deterministic and error components, we construct a probabilistic reachable tube (PRT) as the Cartesian product of reachable sets of the individual error systems driven by disturbances lying in confidence regions (CRs) with a fixed probability. By bounding the PRT probability with the specification probability, we tighten all state constraints induced by the STL specification by solving tractable optimization problems over segments of the PRT, and convert the underlying stochastic problem into a deterministic one. This approach reduces conservatism compared to tightening guided by the STL structure. Additionally, we propose a recursively feasible algorithm to attack the resulting problem by decomposing it into agent-level subproblems, which are solved iteratively according to a scheduling policy. We demonstrate our method on a ten-agent system, where existing approaches are impractical.

[22]  arXiv:2405.02852 [pdf, other]
Title: On Enhancing Brain Tumor Segmentation Across Diverse Populations with Convolutional Neural Networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Brain tumor segmentation is a fundamental step in assessing a patient's cancer progression. However, manual segmentation demands significant expert time to identify tumors in 3D multimodal brain MRI scans accurately. This reliance on manual segmentation makes the process prone to intra- and inter-observer variability. This work proposes a brain tumor segmentation method as part of the BraTS-GoAT challenge. The task is to segment tumors in brain MRI scans automatically from various populations, such as adults, pediatrics, and underserved sub-Saharan Africa. We employ a recent CNN architecture for medical image segmentation, namely MedNeXt, as our baseline, and we implement extensive model ensembling and postprocessing for inference. Our experiments show that our method performs well on the unseen validation set with an average DSC of 85.54% and HD95 of 27.88. The code is available on https://github.com/BioMedIA-MBZUAI/BraTS2024_BioMedIAMBZ.

[23]  arXiv:2405.02857 [pdf, other]
Title: I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution from other views. Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution. The through-plane branch supplements the limited information contained in low through-plane resolution from high in-plane resolution and enables continual and diverse feature learning. In-plane branch transforms features to the frequency domain and enforces an equal learning opportunity for all frequency bands in a global context learning paradigm. We further propose a cross-view block to take advantage of the information from all three views online. Extensive experiments on two public datasets demonstrate the effectiveness of I$^3$Net, and noticeably outperforms state-of-the-art super-resolution, video frame interpolation and slice interpolation methods by a large margin. We achieve 43.90dB in PSNR, with at least 1.14dB improvement under the upscale factor of $\times$2 on MSD dataset with faster inference. Code is available at https://github.com/DeepMed-Lab-ECNU/Medical-Image-Reconstruction.

[24]  arXiv:2405.02873 [pdf, other]
Title: Target Localization with Macro and Micro Base Stations Cooperative Sensing
Comments: 7 pages 6 figures, submitted to 2024 IEEE GLOBECOM
Subjects: Signal Processing (eess.SP)

Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are common in the development of 6G, making the cooperative sensing between MBS and MiBS feasible. Since MBS and MiBS work in low and high frequency bands, respectively, the challenges of MBS and MiBS cooperative sensing lie in the fusion method of the sensing information in high and low-frequency bands. To this end, this paper introduces a symbol-level fusion method and a grid-based three-dimensional discrete Fourier transform (3D-GDFT) algorithm to achieve precise localization of multiple targets with limited resources. Simulation results demonstrate that the proposed MBS and MiBS cooperative sensing scheme outperforms traditional single BS (MBS/MiBS) sensing scheme, showcasing superior sensing performance

[25]  arXiv:2405.02885 [pdf, other]
Title: Performance Analysis of Underwater Acoustic Channel Amid Jamming by Random Jammers
Subjects: Signal Processing (eess.SP); Emerging Technologies (cs.ET)

Underwater communication networks are increasingly popularized by various important maritime applications. However, this also leads to an increased threat landscape. This letter presents the first study that considers jamming attacks by random jammers present in the surroundings of legitimate transceivers in underwater acoustic communication systems. We investigate the impact of jamming attacks on various performance parameters of the legitimate underwater acoustic communication link. In particular, we investigate the legitimate link using stochastic geometry for important performance parameters, namely coverage probability, average rate, and energy efficiency of the link between two legitimate nodes, i.e., underwater and surface nodes. We then derive and present tractable expressions for these performance parameters. Finally, we performed a Monte Carlo simulation to validate our analysis. We plot the performance metrics against the transmit power, and jamming power for different intensities of the jammers in shallow, mid, and deep water scenarios. Results reveal that on average, jamming in deep water has a relatively high impact on the performance of legitimate link than in shallow water.

[26]  arXiv:2405.02891 [pdf, other]
Title: Sparse Matrix Coding for URLLC
Authors: Yifei Yang
Subjects: Signal Processing (eess.SP)

Sparse Vector Coding (SVC) has long been considered an encoding method that meets the URLLC QOS requirements. This encoding method has been widely studied and applied due to its low encoding and decoding complexity, no pilot transmission, resistance to inter-carrier interference, and low power consumption. However, due to the use of position indexing, the encoding essentially reduces the signal-to-noise ratio requirements by increasing the communication bandwidth, which also leads to low encoding efficiency and strong rigidity in decoding. Based on the sparse representation characteristics of SVC, we propose a joint sparse representation encoding, namely Sparse Matrix Coding (SMC). This encoding method utilizes multi-user information joint encoding, and the sparsity and sparse locations between users are shared.

[27]  arXiv:2405.02953 [pdf, other]
Title: Analysis of the Identifying Regulation with Adversarial Surrogates Algorithm
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Given a time-series of noisy measured outputs of a dynamical system z[k], k=1...N, the Identifying Regulation with Adversarial Surrogates (IRAS) algorithm aims to find a non-trivial first integral of the system, namely, a scalar function g() such that g(z[i]) = g(z[j]), for all i,j. IRAS has been suggested recently and was used successfully in several learning tasks in models from biology and physics. Here, we give the first rigorous analysis of this algorithm in a specific setting. We assume that the observations admit a linear first integral and that they are contaminated by Gaussian noise. We show that in this case the IRAS iterations are closely related to the self-consistent-field (SCF) iterations for solving a generalized Rayleigh quotient minimization problem. Using this approach, we derive several sufficient conditions guaranteeing local convergence of IRAS to the correct first integral.

[28]  arXiv:2405.02986 [pdf, other]
Title: Harvesting Energy from Soil-Air Temperature Differences for Batteryless IoT Devices: A Case Study
Comments: This paper is under review at IEEE Access. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Systems and Control (eess.SY)

The temperature difference between soil and air holds the potential to generate energy to power many low-power IoT devices. However, there is a lack of studies in the literature that explore the nuances of soil-air thermal energy harvesting. This paper offers a comprehensive discussion on soil-air thermal energy harvesting. We engineer a custom Soil-air Thermoelectric Generator (SoTEG) that incorporates an off-the-shelf TEG and an efficient heat transfer network. A detailed discussion of the design and analysis of SoTEG is presented along with a versatile simulation model which can be used to simulate the performance of the harvester under different ambient conditions. Investigations using the model and results gathered from experiments demonstrate that the SoTEG has a heat transfer efficiency of 34.5% with room for improvement and can power a load from temperature differences as low as 3 {\deg}C between soil and air, or 1 {\deg}C across the TEG. Power generated by SoTEG at 3 {\deg}C difference amounts to 110 {\mu}Wor a power density of 11.58mW/m2. When connected to a Power Management Unit (PMU), the combined system generates around 30 {\mu}Wat 3 {\deg}C. During a 14-day outdoor deployment in a winter month, the maximum power generated by the combined system is 337 {\mu}W when the temperature difference across the TEG is 2.75 {\deg}C. Additionally, the model analysis reveals that the weather conditions have an impact on the harvester. While Solar radiation enhances power generation, wind can either improve or diminish the harvested energy depending on whether it is day or night.

[29]  arXiv:2405.02994 [pdf, other]
Title: Extended State Observer for Mismatch Disturbances Using Taylor Approximation of the Integral
Authors: Cuong Duc Nguyen
Subjects: Systems and Control (eess.SY)

The development of disturbance estimators using extended state observers (ESOs) typically assumes that the system is observable. This paper introduces an improved method for systems that are initially unobservable, leveraging Taylor expansion to approximate the integral of disturbance dynamics. A new extended system is formulated based on this approximation, enabling the design of an observer that achieves exponential stability of the error dynamics. The proposed method's efficacy is demonstrated through a practical example, highlighting its potential for robust disturbance estimation in dynamic systems.

[30]  arXiv:2405.03008 [pdf, other]
Title: DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Comments: 8 pages, 8 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git

[31]  arXiv:2405.03022 [pdf, other]
Title: Joint Discrete Precoding and RIS Optimization for RIS-Assisted MU-MIMO Communication Systems
Comments: Submitted to IEEE Transactions
Subjects: Signal Processing (eess.SP)

This paper considers a multi-user multiple-input multiple-output (MU-MIMO) system where the downlink communication between a base station (BS) and multiple user equipments (UEs) is aided by a reconfigurable intelligent surface (RIS). We study the sum-rate maximization problem with the objective of finding the optimal precoding vectors and RIS configuration. Due to fronthaul limitation, each entry of the precoding vectors must be picked from a finite set of quantization labels. Furthermore, two scenarios for the RIS are investigated, one with continuous infinite-resolution reflection coefficients and another with discrete finite-resolution reflection coefficients. A novel framework is developed which, in contrast to the common literature that only offers sub-optimal solutions for optimization of discrete variables, is able to find the optimal solution to problems involving discrete constraints. Based on the classical weighted minimum mean square error (WMMSE), we transform the original problem into an equivalent weighted sum mean square error (MSE) minimization problem and solve it iteratively. We compute the optimal precoding vectors via an efficient algorithm inspired by sphere decoding (SD). For optimizing the discrete RIS configuration, two solutions based on the SD algorithm are developed: An optimal SD-based algorithm and a low-complexity heuristic method that can efficiently obtain RIS configuration without much loss in optimality. The effectiveness of the presented algorithms is corroborated via numerical simulations where it is shown that the proposed designs are remarkably superior to the commonly used benchmarks.

[32]  arXiv:2405.03040 [pdf, other]
Title: Virtual Extended-Range Tomography (VERT): Contact-free realistic ultrasonic bone imaging
Comments: 14 pages, 9 figures including graphical abstract. Submitted to IEEE T-UFFC
Subjects: Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

Ultrasound tomography generally struggles to reconstruct high-contrast and/or extended-range problems. A prime example is site-specific in-vivo bone imaging, crucial for accurately assessing the risk of life-threatening fractures, which are preventable given accurate diagnosis and treatment. In this type of problem, two main obstacles arise: (a) an external region prohibits access to the region of interest (ROI), and (b) high contrast exists between the two regions. These challenges impede existing algorithms -- including bent-ray tomography (BRT), known for its robustness, speed, and reasonable short-range resolution. We propose Virtual Extended-Range Tomography (VERT), which tackles these challenges through (a) placement of virtual transducers directly on the ROI, facilitating (b) rapid initialisation before BRT inversion. In-silico validation against BRT with and without a-priori information shows superior resolution and robustness -- while maintaining or even improving speed. These improvements are drastic where the external region is much larger than the ROI. Additional validation against the practically impossible -- BRT directly on the ROI -- demonstrates that VERT is approaching the resolution limit. The capability to solve high-contrast extended-range tomography problems without prior knowledge about the ROI's interior has many implications. VERT has the potential to unlock site-specific in-vivo bone imaging for assessing fracture risk, potentially saving millions of lives globally. In other applications, VERT may replace classical BRT to yield improvements in resolution, robustness and speed -- especially where the ROI does not cover the entire imaging array. For even higher resolution, VERT offers a reliable starting background to complement algorithms with less robustness and high computational costs.

[33]  arXiv:2405.03047 [pdf, ps, other]
Title: A Model-Free Kullback-Leibler Divergence Filter for Anomaly Detection in Noisy Data Series
Comments: 10 pages, 40 references
Journal-ref: Journal of Dynamic Systems, Measurement, and Control. February 2023; 145(2)
Subjects: Signal Processing (eess.SP)

We propose a Kullback-Leibler Divergence (KLD) filter to extract anomalies within data series generated by a broad class of proximity sensors, along with the anomaly locations and their relative sizes. The technique applies to devices commonly used in engineering practice, such as those mounted on mobile robots for non-destructive inspection of hazardous or other environments that may not be directly accessible to humans. The raw data generated by this class of sensors can be challenging to analyze due to the prevalence of noise over the signal content. The proposed filter is built to detect the difference of information content between data series collected by the sensor and baseline data series. It is applicable in a model-based or model-free context. The performance of the KLD filter is validated in an industrial-norm setup and benchmarked against a peer industrially-adopted algorithm.

[34]  arXiv:2405.03106 [pdf, other]
Title: Compression-based Privacy Preservation for Distributed Nash Equilibrium Seeking in Aggregative Games
Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)

This paper explores distributed aggregative games in multi-agent systems. Current methods for finding distributed Nash equilibrium require players to send original messages to their neighbors, leading to communication burden and privacy issues. To jointly address these issues, we propose an algorithm that uses stochastic compression to save communication resources and conceal information through random errors induced by compression. Our theoretical analysis shows that the algorithm guarantees convergence accuracy, even with aggressive compression errors used to protect privacy. We prove that the algorithm achieves differential privacy through a stochastic quantization scheme. Simulation results for energy consumption games support the effectiveness of our approach.

[35]  arXiv:2405.03126 [pdf, ps, other]
Title: Infrared Polarization Imaging-based Non-destructive Thermography Inspection
Subjects: Image and Video Processing (eess.IV); Signal Processing (eess.SP)

Infrared pulse thermography non-destructive testing (NDT) method is developed based on the difference in the infrared radiation intensity emitted by defective and non-defective areas of an object. However, when the radiation intensity of the defective target is similar to that of the non-defective area of the object, the detection results are poor. To address this issue, this study investigated the polarization characteristics of the infrared radiation of different materials. Simulation results showed that the degree of infrared polarization of the object surface changed regularly with changes in thermal environment radiation. An infrared polarization imaging-based NDT method was proposed and demonstrated using specimens with four different simulated defective areas, which were designed and fabricated using four different materials. The experimental results were consistent with the simulation results, thereby proving the effectiveness of the proposed method. Compared with the infrared-radiation-intensity-based NDT method, the proposed method improved the image detail presentation and detection accuracy.

[36]  arXiv:2405.03129 [pdf, other]
Title: Active Sensing for Multiuser Beam Tracking with Reconfigurable Intelligent Surface
Authors: Han Han, Tao Jiang, Wei Yu
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channel sensing intervals, the AP then adaptively configures the beamformers and the RIS reflection coefficients for subsequent data transmission based on the received pilots. This is an active sensing problem, because channel sensing involves configuring the RIS coefficients during the pilot stage and the optimal sensing strategy should exploit the trajectory of channel state information (CSI) from previously received pilots. Analytical solution to such an active sensing problem is very challenging. In this paper, we propose a deep learning framework utilizing a recurrent neural network (RNN) to automatically summarize the time-varying CSI obtained from the periodically received pilots into state vectors. These state vectors are then mapped to the AP beamformers and RIS reflection coefficients for subsequent downlink data transmissions, as well as the RIS reflection coefficients for the next round of uplink channel sensing. The mappings from the state vectors to the downlink beamformers and the RIS reflection coefficients for both channel sensing and downlink data transmission are performed using graph neural networks (GNNs) to account for the interference among the UEs. Simulations demonstrate significant and interpretable performance improvement of the proposed approach over the existing data-driven methods with nonadaptive channel sensing schemes.

[37]  arXiv:2405.03141 [pdf, other]
Title: Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.

[38]  arXiv:2405.03152 [pdf, other]
Title: MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Subjects: Audio and Speech Processing (eess.AS)

Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. However, GER encounters challenges such as fixed N-best hypotheses, insufficient utilization of acoustic information, and limited specificity to multi-accent scenarios. In this paper, we explore the application of GER in multi-accent scenarios. Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution. In this work, we propose a unified ASR-AR GER model, named MMGER, leveraging multi-modal correction, and multi-granularity correction. Multi-task ASR-AR learning is employed to provide dynamic 1-best hypotheses and accent embeddings. Multi-modal correction accomplishes fine-grained frame-level correction by force-aligning the acoustic features of speech with the corresponding character-level 1-best hypothesis sequence. Multi-granularity correction supplements the global linguistic information by incorporating regular 1-best hypotheses atop fine-grained multi-modal correction to achieve coarse-grained utterance-level correction. MMGER effectively mitigates the limitations of GER and tailors LLM-based ASR error correction for the multi-accent scenarios. Experiments conducted on the multi-accent Mandarin KeSpeech dataset demonstrate the efficacy of MMGER, achieving a 26.72% relative improvement in AR accuracy and a 27.55% relative reduction in ASR character error rate, compared to a well-established standard baseline.

[39]  arXiv:2405.03201 [pdf, other]
Title: Experimental Investigation of Repurposed Kaplan Turbines as Variable-Speed Propellers for Maximizing Frequency Containment Reserve
Comments: Submitted to PSCC 2024
Subjects: Systems and Control (eess.SY)

This study explores the practical viability of repurposing aging Kaplan turbines into variable-speed propellers by employing full-size frequency converters. The motivation behind this approach is to improve the provision of \emph{Frequency Containment Reserve} (FCR) while reducing fatigue in the Kaplan blades servomechanism. We evaluate the performance of these modified Kaplan turbines against the one of another hydro asset composed of the same Kaplan turbine hybridized with a \emph{Battery Energy Storage System} (BESS). Experiments are conducted on a one-of-its-kind reduced-scale model testing platform. Our findings reveal that Kaplan turbines repurposed as variable-speed propellers exhibit similar dynamic response characteristics compared to the standalone Kaplan operation, with the added benefit of effectively eliminating blade movements. Furthermore, the ability to control the speed increases the hydraulic efficiency for certain operating points. In summary, investment in variable speed technology emerges as a viable alternative to BESS-based hydropower hybridization.

[40]  arXiv:2405.03222 [pdf, other]
Title: Computational Efficient Width-Wise Early Exits in Modulation Classification
Subjects: Signal Processing (eess.SP)

Deep learning (DL) techniques are increasingly pervasive across various domains, including wireless communication, where they extract insights from raw radio signals. However, the computational demands of DL pose significant challenges, particularly in distributed wireless networks like Cell-free networks, where deploying DL models on edge devices becomes hard due to heightened computational loads. These computational loads escalate with larger input sizes, often correlating with improved model performance. To mitigate this challenge, Early Exiting (EE) techniques have been introduced in DL, primarily targeting the depth of the model. This approach enables models to exit during inference based on specified criteria, leveraging entropy measures at intermediate exits. Doing so makes less complex samples exit early, reducing computational load and inference time. In our contribution, we propose a novel width-wise exiting strategy for Convolutional Neural Network (CNN)-based architectures. By selectively adjusting the input size, we aim to regulate computational demands effectively. Our approach aims to decrease the average computational load during inference while maintaining performance levels comparable to conventional models. We specifically investigate Modulation Classification, a well-established application of DL in wireless communication. Our experimental results show substantial reductions in computational load, with an average decrease of 28%, and particularly notable reductions of 65% in high-SNR scenarios. Through this work, we present a practical solution for reducing computational demands in deep learning applications, particularly within the domain of wireless communication.

[41]  arXiv:2405.03230 [pdf, other]
Title: Slicing for Dense Smart Factory Network: Current State, Scenarios, Challenges and Expectations
Comments: Submitted to IEEE ETFA 2024 - IEEE International Conference on Emerging Technologies and Factory Automation, 8 Pages
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

In the era of Industry 4.0, smart factories have emerged as a paradigm shift, redefining manufacturing with the integration of advanced digital technologies. Central to this transformation is the deployment of 5G networks, offering unprecedented levels of connectivity, speed, reliability, and ultra-low latency. Among the revolutionary features of 5G is network slicing, a technology that offers enhanced capabilities through the customization of network resources by allowing multiple logical networks (or slices) to run on top of a shared physical infrastructure. This capability is particularly crucial in the densely packed and highly dynamic environment of smart factories, where diverse applications - from robotic automation to real-time analytics - demand varying network requirements. In this paper, we present a comprehensive overview of the integration of slicing in smart factory networks, emphasizing its critical role in enhancing operational efficiency and supporting the diverse requirements of future manufacturing processes. We elaborate on the recent advances, and technical scenarios, including indoor factory propagation conditions, traffic characteristics, system requirements, slice-aware radio resource management, network elements, enabling technologies and current standardisation efforts. Additionally, we identify open research challenges as well as key technical issues stifling deployments. Finally, we speculate on the future trajectory of slicing-enabled smart factories, emphasizing the need for continuous adaptation to emerging technologies.

[42]  arXiv:2405.03245 [pdf, ps, other]
Title: How improving performance may imply losing consistency in event-triggered consensus
Subjects: Systems and Control (eess.SY)

Event-triggered control is often argued to lower the average triggering rate compared to time-triggered control while still achieving a desired control goal, e.g., the same performance level. However, this property, often called consistency, cannot be taken for granted and can be hard to analyze in many settings. In particular, although numerous decentralized event-triggered control schemes have been proposed in the past years, their performance properties with respect to time-triggered control remain mostly unexplored. In this paper, we therefore examine the performance properties of event-triggered control (relative to time-triggered control) for a single-integrator consensus problem with a level-triggering rule. We consider the long-term average quadratic deviation from consensus as a performance measure. For this setting, we show that enriching the information the local controllers use improves the performance of the consensus algorithm but renders a previously consistent event-triggered control scheme inconsistent. In addition, we do so while deploying optimal control inputs which we derive for both information cases and all triggering schemes. With this insight, we can furthermore explain the relationship between two contrasting consistency results from the literature on decentralized event-triggered control. We support our theoretical findings with simulation results.

[43]  arXiv:2405.03254 [pdf, ps, other]
Title: Automatic Assessment of Dysarthria Using Audio-visual Vowel Graph Attention Network
Comments: 10 pages, 6 figures, 7 tables
Subjects: Audio and Speech Processing (eess.AS)

Automatic assessment of dysarthria remains a highly challenging task due to high variability in acoustic signals and the limited data. Currently, research on the automatic assessment of dysarthria primarily focuses on two approaches: one that utilizes expert features combined with machine learning, and the other that employs data-driven deep learning methods to extract representations. Research has demonstrated that expert features are effective in representing pathological characteristics, while deep learning methods excel at uncovering latent features. Therefore, integrating the advantages of expert features and deep learning to construct a neural network architecture based on expert knowledge may be beneficial for interpretability and assessment performance. In this context, the present paper proposes a vowel graph attention network based on audio-visual information, which effectively integrates the strengths of expert knowledges and deep learning. Firstly, various features were combined as inputs, including knowledge based acoustical features and deep learning based pre-trained representations. Secondly, the graph network structure based on vowel space theory was designed, allowing for a deep exploration of spatial correlations among vowels. Finally, visual information was incorporated into the model to further enhance its robustness and generalizability. The method exhibited superior performance in regression experiments targeting Frenchay scores compared to existing approaches.

[44]  arXiv:2405.03277 [pdf, other]
Title: Distributed Adaptive Spatial Filtering with Inexact Local Solvers
Subjects: Signal Processing (eess.SP)

The Distributed Adaptive Signal Fusion (DASF) framework is a meta-algorithm for computing data-driven spatial filters in a distributed sensing platform with limited bandwidth and computational resources, such as a wireless sensor network. The convergence and optimality of the DASF algorithm has been extensively studied under the assumption that an exact, but possibly impractical solver for the local optimization problem at each updating node is available. In this work, we provide convergence and optimality results for the DASF framework when used with an inexact, finite-time solver such as (proximal) gradient descent or Newton's method. We provide sufficient conditions that the solver should satisfy in order to guarantee convergence of the resulting algorithm, and a lower bound for the convergence rate. We also provide numerical simulations to validate these theoretical results.

[45]  arXiv:2405.03334 [pdf, other]
Title: On the constrained feedback linearization control based on the MILP representation of a ReLU-ANN
Subjects: Systems and Control (eess.SY)

In this work, we explore the efficacy of rectified linear unit artificial neural networks in addressing the intricate challenges of convoluted constraints arising from feedback linearization mapping. Our approach involves a comprehensive procedure, encompassing the approximation of constraints through a regression process. Subsequently, we transform these constraints into an equivalent representation of mixed-integer linear constraints, seamlessly integrating them into other stabilizing control architectures. The advantage resides in the compatibility with the linear control design and the constraint satisfaction in the model predictive control setup, even for forecasted trajectories. Simulations are provided to validate the proposed constraint reformulation.

[46]  arXiv:2405.03482 [pdf, ps, other]
Title: Managing Renewable Energy Resources Using Equity-Market Risk Tools - the Efficient Frontiers
Comments: 9 pages, 3 figures, 10 ref
Subjects: Systems and Control (eess.SY)

The energy market, and specifically the renewable sector carries volatility and risks, similar to the financial market. Here, we leverage on a well-established, return-risk approach, commonly used by equity portfolio-managers and apply it to energy resources. We visualize the relationship between the resources' costs and their risks in terms of efficient frontiers. We apply this analysis to publically available data for various US regions: Central, Eastern and Western coasts. Since risk management is contingent on costs, this approach sheds useful light in assessing dynamic pricing in modern electrical grids. By integrating geographical and temporal dimensions into our research, we aim at providing more nuanced and context-specific recommendations for energy resource allocation. This approach may help decision-makers in the renewable energy sector to make informed choices that account for regional variations, climatic conditions, and long-term performance trends.

[47]  arXiv:2405.03502 [pdf, ps, other]
Title: Human-Variability-Respecting Optimal Control for Physical Human-Machine Interaction
Subjects: Systems and Control (eess.SY)

Physical Human-Machine Interaction plays a pivotal role in facilitating collaboration across various domains. When designing appropriate model-based controllers to assist a human in the interaction, the accuracy of the human model is crucial for the resulting overall behavior of the coupled system. When looking at state-of-the-art control approaches, most methods rely on a deterministic model or no model at all of the human behavior. This poses a gap to the current neuroscientific standard regarding human movement modeling, which uses stochastic optimal control models that include signal-dependent noise processes and therefore describe the human behavior much more accurate than the deterministic counterparts. To close this gap by including these stochastic human models in the control design, we introduce a novel design methodology resulting in a Human-Variability-Respecting Optimal Control that explicitly incorporates the human noise processes and their influence on the mean and variability behavior of a physically coupled human-machine system. Our approach results in an improved overall system performance, i.e. higher accuracy and lower variability in target point reaching, while allowing to shape the joint variability, for example to preserve human natural variability patterns.

[48]  arXiv:2405.03542 [pdf, other]
Title: Enhancing Channel Estimation in Quantized Systems with a Generative Prior
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

Channel estimation in quantized systems is challenging, particularly in low-resolution systems. In this work, we propose to leverage a Gaussian mixture model (GMM) as generative prior, capturing the channel distribution of the propagation environment, to enhance a classical estimation technique based on the expectation-maximization (EM) algorithm for one-bit quantization. Thereby, a maximum a posteriori (MAP) estimate of the most responsible mixture component is inferred for a quantized received signal, which is subsequently utilized in the EM algorithm as side information. Numerical results demonstrate the significant performance improvement of our proposed approach over both a simplistic Gaussian prior and current state-of-the-art channel estimators. Furthermore, the proposed estimation framework exhibits adaptability to higher resolution systems and alternative generative priors.

[49]  arXiv:2405.03575 [pdf, other]
Title: A Valuation Framework for Customers Impacted by Extreme Temperature-Related Outages
Subjects: Systems and Control (eess.SY)

Extreme temperature outages can lead to not just economic losses but also various non-energy impacts (NEI) due to significant degradation of indoor operating conditions caused by service disruptions. However, existing resilience assessment approaches lack specificity for extreme temperature conditions. They often overlook temperature-related mortality and neglect the customer characteristics and grid response in the calculation, despite the significant influence of these factors on NEI-related economic losses. This paper aims to address these gaps by introducing a comprehensive framework to estimate the impact of resilience enhancement not only on the direct economic losses incurred by customers but also on potential NEI, including mortality and the value of statistical life during extreme temperature-related outages. The proposed resilience valuation integrates customer characteristics and grid response variables based on a scalable grid simulation environment. This study adopts a holistic approach to quantify customer-oriented economic impacts, utilizing probabilistic loss scenarios that incorporate health-related factors and damage/loss models as a function of exposure for valuation. The proposed methodology is demonstrated through comparative resilient outage planning, using grid response models emulating a Texas weather zone during the 2021 winter storm Uri. The case study results show that enhanced outage planning with hardened infrastructure can improve the system resilience and thereby reduce the relative risk of mortality by 16% and save the total costs related to non-energy impacts by 74%. These findings underscore the efficacy of the framework by assessing the financial implications of each case, providing valuable insights for decision-makers and stakeholders involved in extreme-weather related resilience planning for risk management and mitigation strategies.

[50]  arXiv:2405.03597 [pdf, other]
Title: Improving the Ranging Performance of Random ISAC Signals Through Pulse Shaping Design
Subjects: Signal Processing (eess.SP)

In this paper, we propose a novel pulse shaping design for single-carrier integrated sensing and communication (ISAC) transmission. Due to the communication information embedded in the ISAC signal, the resulting auto-correlation function (ACF) is determined by both the information-conveying random symbol sequence and the signaling pulse, where the former leads to random fluctuations in the sidelobes of the ACF, impairing the range estimation performance. To overcome this challenge, we first analyze the statistical characteristics of the random ACF under the symbol-wise pulse shaping (SWPS) regime. As a step further, we formulate an optimization problem to design ISAC pulse shaping filters, which minimizes the average integrated sidelobe level ratio (ISLR) while meeting the Nyquist criterion, subject to power and bandwidth constraints. We then show that the problem can be recast as a convex quadratic program by expressing it in the frequency domain, which can be readily solved through standard tools. Numerical results demonstrate that the proposed pulse shaping design achieves substantial ranging sidelobe reduction compared to the celebrated root-raised cosine (RRC) pulse shaping, given that the communication throughput is unchanged.

[51]  arXiv:2405.03608 [pdf, other]
Title: Energy-Based Optimization of Physical-Layer Challenge-Response Authentication with Drones
Subjects: Signal Processing (eess.SP)

Drones are expected to be used for many tasks in the future and require secure communication protocols. In this work, we propose a novel physical layer authentication (PLA)-based challenge-response (CR) protocol in which a drone Bob authenticates the sender (either on the ground or air) by exploiting his prior knowledge of the wireless channel statistic (fading, path loss, and shadowing). In particular, Bob will move to a set of positions in the space, and by estimating the attenuations of the received signals he will authenticate the sender. We take into account the energy consumption in the design and provide three solutions: a purely greedy solution (PG), an optimal Bellman iterative solution (BI), and a heuristic solution based on the evaluation of the standard deviation of the attenuations in the space. Finally, we demonstrate the effectiveness of our approach through numerical simulations.

[52]  arXiv:2405.03629 [pdf, other]
Title: Configuration-Constrained Tube MPC for Tracking
Comments: 14 pages, paper presented to the L-CSS/CDC combined submission
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a novel tube-based Model Predictive Control (MPC) framework for tracking varying setpoint references with linear systems subject to additive and multiplicative uncertainties. The MPC controllers designed using this framework exhibit recursively feasible for changing references, and robust asymptotic stability for piecewise constant references. The framework leverages configuration-constrained polytopes to parameterize the tubes, offering flexibility to optimize their shape. The efficacy of the approach is demonstrated through two numerical examples. The first example illustrates the theoretical results, and the second uses the framework to design a lane-change controller for an autonomous vehicle.

[53]  arXiv:2405.03665 [pdf, other]
Title: Distributed Estimation in Blockchain-aided Internet of Things in the Presence of Attacks
Comments: 11 pages, 4 figures
Subjects: Signal Processing (eess.SP)

Distributed estimation in a blockchain-aided Internet of Things (BIoT) is considered, where the integrated blockchain secures data exchanges across the BIoT and the storage of data at BIoT agents. This paper focuses on developing a performance guarantee for the distributed estimation in a BIoT in the presence of malicious attacks which jointly exploits vulnerabilities present in both IoT devices and the employed blockchain within the BIoT. To achieve this, we adopt the Cramer-Rao Bound (CRB) as the performance metric, and maximize the CRB for estimating the parameter of interest over the attack domain. However, the maximization problem is inherently non-convex, making it infeasible to obtain the globally optimal solution in general. To address this issue, we develop a relaxation method capable of transforming the original non-convex optimization problem into a convex optimization problem. Moreover, we derive the analytical expression for the optimal solution to the relaxed optimization problem. The optimal value of the relaxed optimization problem can be used to provide a valid estimation performance guarantee for the BIoT in the presence of attacks.

[54]  arXiv:2405.03667 [pdf, other]
Title: Fault Detection and Monitoring using an Information-Driven Strategy: Method, Theory, and Application
Comments: 28 pages, 11 figures
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)

The ability to detect when a system undergoes an incipient fault is of paramount importance in preventing a critical failure. In this work, we propose an information-driven fault detection method based on a novel concept drift detector. The method is tailored to identifying drifts in input-output relationships of additive noise models (i.e., model drifts) and is based on a distribution-free mutual information (MI) estimator. Our scheme does not require prior faulty examples and can be applied distribution-free over a large class of system models. Our core contributions are twofold. First, we demonstrate the connection between fault detection, model drift detection, and testing independence between two random variables. Second, we prove several theoretical properties of the proposed MI-based fault detection scheme: (i) strong consistency, (ii) exponentially fast detection of the non-faulty case, and (iii) control of both significance levels and power of the test. To conclude, we validate our theory with synthetic data and the benchmark dataset N-CMAPSS of aircraft turbofan engines. These empirical results support the usefulness of our methodology in many practical and realistic settings, and the theoretical results show performance guarantees that other methods cannot offer.

[55]  arXiv:2405.03684 [pdf, other]
Title: All-in-One Deep Learning Framework for MR Image Reconstruction
Comments: 22 pages, 9 figures
Subjects: Image and Video Processing (eess.IV)

We introduce a novel, all-in-one deep learning framework for MR image reconstruction, enabling a single model to enhance image quality across multiple aspects of k-space sampling and to be effective across a wide range of clinical and technical scenarios. This DICOM-based algorithm serves as the core of SwiftMR (AIRS Medical, Seoul, Korea), which is FDA-cleared, CE-certified, and commercially available. We first detail the comprehensive development process of the model, including data collection, training pair preparation, model architecture design, and DICOM inference. We then assess the model's capability to enhance image quality in a multi-dimensional manner, specifically across various aspects of k-space sampling. Subsequently, we evaluate several features of the multi-dimensional enhancement: the accuracy of tunable denoising, the effectiveness of super-resolution in each encoding direction, and the reduction of artifacts that become more prominent at lower spatial resolutions. Additionally, we assess its compatibility with various scan parameter sets and its generalizability across scanner vendors not seen during training. Finally, we present specific cases demonstrating the model's utility in reducing scan time across anatomical regions in conjunction with protocol optimization. The proposed model is compatible with a broad spectrum of scenarios, including various vendors, pulse sequences, scan parameters, and anatomical regions. Its DICOM-based operation particularly enhances its applicability for real-world applications. Given its demonstrated effectiveness and versatility, we expect its use to expand in the field of clinical MRI.

Cross-lists for Tue, 7 May 24

[56]  arXiv:2405.02303 (cross-list from cs.CE) [pdf, other]
Title: Enhanced Thermal Management in High-Temperature Applications: Design and Optimization of a Water-Cooled Forced Convection System in a Hollow Cuboid Vapour Chamber Using COMSOL and MATLAB
Comments: 18 pages, 52 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)

This report details the design and optimisation of a water-cooled forced convection heat dissipation system for use in high-temperature applications (ranges between 700 degrees - 1000 degrees K). A hollow cuboid vapour chamber model was investigated. The space within the hollow cuboid was used as the design space. COMSOL, a FEM software product was used to solve for the physical parameters of each geometry for the heat dissipation system design space. COMSOL in conjunction with MATLAB was used for the parametric and density-based topology optimisation of the geometric design in the design space. The goal of the optimization is the minimisation of a temperature gradient over the design space. This allows the heat to be evenly spread throughout the designed mesh which allows for more effective cooling. To reduce the computational time needed to solve and optimise each geometry in 3D, a 2D representation was created for the front and rear faces of the hollow cuboid setup. These 2D face designs were then extrapolated into 3D over the length of the hollow cube and COMSOL was used to find a solution for each model. This report also proposes a use case for this system wherein it would be used in conjunction with MGA and thermometric technology within coal-fired power stations for the extraction and storage of waste heat for later use.

[57]  arXiv:2405.02305 (cross-list from cs.CV) [pdf, ps, other]
Title: Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Image and Video Processing (eess.IV)

Image captioning models are widely used to describe recent and archived pictures with the objective of improving their accessibility and retrieval. Yet, these approaches tend to be inefficient and biased at retrieving people's names. In this work we introduce AstroCaptions, a dataset for the image captioning task. This dataset specifically contains thousands of public fig-ures that are complex to identify for a traditional model. We also propose a novel post-processing method to insert identified people's names inside the caption using explainable AI tools and the grounding capabilities of vi-sion-language models. The results obtained with this method show signifi-cant improvements of captions quality and a potential of reducing halluci-nations. Up to 93.2% of the persons detected can be inserted in the image captions leading to improvements in the BLEU, ROUGE, CIDEr and METEOR scores of each captioning model.

[58]  arXiv:2405.02312 (cross-list from cs.CV) [pdf, ps, other]
Title: YOLOv5 vs. YOLOv8 in Marine Fisheries: Balancing Class Detection and Instance Count
Comments: 12 pages, 25 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This paper presents a comparative study of object detection using YOLOv5 and YOLOv8 for three distinct classes: artemia, cyst, and excrement. In this comparative study, we analyze the performance of these models in terms of accuracy, precision, recall, etc. where YOLOv5 often performed better in detecting Artemia and cysts with excellent precision and accuracy. However, when it came to detecting excrement, YOLOv5 faced notable challenges and limitations. This suggests that YOLOv8 offers greater versatility and adaptability in detection tasks while YOLOv5 may struggle in difficult situations and may need further fine-tuning or specialized training to enhance its performance. The results show insights into the suitability of YOLOv5 and YOLOv8 for detecting objects in challenging marine environments, with implications for applications such as ecological research.

[59]  arXiv:2405.02317 (cross-list from cs.CV) [pdf, other]
Title: Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the camera. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group. A massive independent testing dataset of 12,518,250 student label instances, of total duration of 21 hours and 22 minutes of real-life videos, is used for evaluating the performance of our proposed method for student group detection. The proposed method of using multiple image representations is shown to perform equally or better than YOLO on all video instances. Over the entire dataset, the proposed method achieved an F1 score of 0.85 compared to 0.80 for YOLO. Following student group detection, the paper presents the development of a dynamic participant tracking system for assessing student group participation through long video sessions. The proposed dynamic participant tracking system is shown to perform exceptionally well, missing a student in just one out of 35 testing videos. In comparison, a state of the art method fails to track students in 14 out of the 35 testing videos. The proposed method achieves 82.3% accuracy on an independent set of long, real-life collaborative videos.

[60]  arXiv:2405.02323 (cross-list from cs.AR) [pdf, ps, other]
Title: CNN-Based Equalization for Communications: Achieving Gigabit Throughput with a Flexible FPGA Hardware Architecture
Comments: The article was submitted to the International Journal of Parallel Programming (IJPP) and is currently under review
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Signal Processing (eess.SP)

To satisfy the growing throughput demand of data-intensive applications, the performance of optical communication systems increased dramatically in recent years. With higher throughput, more advanced equalizers are crucial, to compensate for impairments caused by inter-symbol interference (ISI). The latest research shows that artificial neural network (ANN)-based equalizers are promising candidates to replace traditional algorithms for high-throughput communications. On the other hand, not only throughput but also flexibility is a main objective of beyond-5G and 6G communication systems. A platform that is able to satisfy the strict throughput and flexibility requirements of modern communication systems are field programmable gate arrays (FPGAs). Thus, in this work, we present a high-performance FPGA implementation of an ANN-based equalizer, which meets the throughput requirements of modern optical communication systems. Further, our architecture is highly flexible since it includes a variable degree of parallelism (DOP) and therefore can also be applied to low-cost or low-power applications which is demonstrated for a magnetic recording channel. The implementation is based on a cross-layer design approach featuring optimizations from the algorithm down to the hardware architecture, including a detailed quantization analysis. Moreover, we present a framework to reduce the latency of the ANN-based equalizer under given throughput constraints. As a result, the bit error ratio (BER) of our equalizer for the optical fiber channel is around four times lower than that of a conventional one, while the corresponding FPGA implementation achieves a throughput of more than 40 GBd, outperforming a high-performance graphics processing unit (GPU) by three orders of magnitude for a similar batch size.

[61]  arXiv:2405.02333 (cross-list from cs.SD) [pdf, ps, other]
Title: Speech Technology Services for Oral History Research
Comments: 5 pages plus references, 3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Oral history is about oral sources of witnesses and commentors on historical events. Speech technology is an important instrument to process such recordings in order to obtain transcription and further enhancements to structure the oral account In this contribution we address the transcription portal and the webservices associated with speech processing at BAS, speech solutions developed at LINDAT, how to do it yourself with Whisper, remaining challenges, and future developments.

[62]  arXiv:2405.02339 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: Noise Models in the LISA Mission
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Systems and Control (eess.SY)

This document briefly describes the noise models and shapes used for the synthesis of the Drag-Free and Attitude Control System in the LISA space mission. LISA (Laser Interferometer Space Antenna) is one of the next large-class missions from the European Space Agency (ESA), expected to be launched in 2034. The main goal of the mission is to detect the gravitational waves, which are undulatory perturbations of the space-time fabric, extremely important to collect experimental proofs for the General Relativity Theory. In the 90s, different international collaborations of institutes laid the foundations for the first ground-based interferometers (see, e.g., LIGO and Virgo). However, ground-based interferometers have a limited bandwidth due to the Earth's environmental noises and short arm-length of few kilometers. Therefore, they cannot observe gravitational waves belonging to the portion of the spectrum below 1 Hz. This issue can be overcome by means of space-based interferometers, that can have arm-lengths up to millions of kilometers and exploit a quieter environment than the Earth's surface. The LISA system is affected by actuation, sensing and environmental disturbances and noises. Among the actuation noises we have those given by the Micro Propulsion System (MPS), the Gravitational Reference Sensor (GRS) and the Optical Assembly (OA) motor. Among the sensing noises we consider the interferometer, the Differential Wavefront Sensor (DWS) and the GRS. The environmental disturbances are given by the solar radiation pressure, the test-mass stiffness and self-gravity, and the environmental noises acting directly on the test-mass.

[63]  arXiv:2405.02346 (cross-list from cs.CR) [pdf, other]
Title: Temporal assessment of malicious behaviors: application to turnout field data monitoring
Comments: To be published in the International Conference on Control, Automation and Diagnosis (ICCAD24)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Systems and Control (eess.SY)

Monitored data collected from railway turnouts are vulnerable to cyberattacks: attackers may either conceal failures or trigger unnecessary maintenance actions. To address this issue, a cyberattack investigation method is proposed based on predictions made from the temporal evolution of the turnout behavior. These predictions are then compared to the field acquired data to detect any discrepancy. This method is illustrated on a collection of real-life data.

[64]  arXiv:2405.02370 (cross-list from cs.AI) [pdf, other]
Title: Neuromorphic Correlates of Artificial Consciousness
Authors: Anwaar Ulhaq
Comments: 13 Pages, 8 Figures
Subjects: Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

The concept of neural correlates of consciousness (NCC), which suggests that specific neural activities are linked to conscious experiences, has gained widespread acceptance. This acceptance is based on a wealth of evidence from experimental studies, brain imaging techniques such as fMRI and EEG, and theoretical frameworks like integrated information theory (IIT) within neuroscience and the philosophy of mind. This paper explores the potential for artificial consciousness by merging neuromorphic design and architecture with brain simulations. It proposes the Neuromorphic Correlates of Artificial Consciousness (NCAC) as a theoretical framework. While the debate on artificial consciousness remains contentious due to our incomplete grasp of consciousness, this work may raise eyebrows and invite criticism. Nevertheless, this optimistic and forward-thinking approach is fueled by insights from the Human Brain Project, advancements in brain imaging like EEG and fMRI, and recent strides in AI and computing, including quantum and neuromorphic designs. Additionally, this paper outlines how machine learning can play a role in crafting artificial consciousness, aiming to realise machine consciousness and awareness in the future.

[65]  arXiv:2405.02478 (cross-list from cs.LG) [pdf, other]
Title: Continuous Learned Primal Dual
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Neural ordinary differential equations (Neural ODEs) propose the idea that a sequence of layers in a neural network is just a discretisation of an ODE, and thus can instead be directly modelled by a parameterised ODE. This idea has had resounding success in the deep learning literature, with direct or indirect influence in many state of the art ideas, such as diffusion models or time dependant models. Recently, a continuous version of the U-net architecture has been proposed, showing increased performance over its discrete counterpart in many imaging applications and wrapped with theoretical guarantees around its performance and robustness. In this work, we explore the use of Neural ODEs for learned inverse problems, in particular with the well-known Learned Primal Dual algorithm, and apply it to computed tomography (CT) reconstruction.

[66]  arXiv:2405.02604 (cross-list from cs.IT) [pdf, ps, other]
Title: Interleave Frequency Division Multiplexing
Comments: Accepted by IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this letter, we study interleave frequency division multiplexing (IFDM) for multicarrier modulation in static multipath and mobile time-varying channels, which outperforms orthogonal frequency division multiplexing (OFDM), orthogonal time frequency space (OTFS), and affine frequency division multiplexing (AFDM) by considering practical advanced detectors. The fundamental principle underlying existing modulation techniques is to establish sparse equivalent channel matrices in order to facilitate the design of low-complexity detection algorithms for signal recovery, making a trade-off between performance and implementation complexity. In contrast, the proposed IFDM establishes an equivalent fully dense and right-unitarily invariant channel matrix with the goal of achieving channel capacity, ensuring that the signals undergo sufficient statistical channel fading. Meanwhile, a low-complexity and replica maximum a posteriori (MAP)-optimal cross-domain memory approximate message passing (CD-MAMP) detector is proposed for IFDM by exploiting the sparsity of the time-domain channel and the unitary invariance in interleave-frequency-domain channel. Numerical results show that IFDM with extremely low-complexity CD-MAMP outperforms OFDM, OTFS, and AFDM with state-of-the-art orthogonal approximate message passing detectors, particularly at low velocities.

[67]  arXiv:2405.02631 (cross-list from cs.LG) [pdf, other]
Title: Unsupervised machine learning for data-driven classification of rock mass using drilling data: How can a data-driven system handle limitations in existing rock mass classification systems?
Comments: 38 pages, 11 figures. Includes ancillary interactive versions of some figures
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Systems and Control (eess.SY)

Rock mass classification systems are crucial for assessing stability and risk in underground construction globally and guiding support and excavation design. However, systems developed primarily in the 1970s lack access to modern high-resolution data and advanced statistical techniques, limiting their effectiveness as decision-support systems. Initially, we outline the limitations observed in this context and later describe how a data-driven system, based on drilling data as detailed in this study, can overcome these limitations. Using extracted statistical information from thousands of MWD-data values in one-meter sections of a full tunnel profile, thus working as a signature of the rock mass, we have demonstrated that it is possible to form well-defined clusters that can act as a foundational basis for various rock mass classification systems. We reduced the dimensionality of 48-value vectors using nonlinear manifold learning techniques (UMAP) and linear principal component analysis (PCA) to enhance clustering. Unsupervised machine learning methods (HDBSCAN, Agglomerative Clustering, K-means) were employed to cluster the data, with hyperparameters optimised through multi-objective Bayesian optimisation for effective clustering. Using domain knowledge, we experienced improved clustering and system tuning opportunities in adding extra features to core clusters of MWD-data. We structured and correlated these clusters with physical rock mass properties, including labels of rock type and rock quality, and analysed cumulative distributions of key MWD-parameters for rock mass assessment to determine if clusters meaningfully differentiate rock masses. The ability of MWD data to form distinct rock mass clusters suggests substantial potential for future classification systems grounded in this objective, data-driven methodology, free from human bias.

[68]  arXiv:2405.02660 (cross-list from cs.IT) [pdf, other]
Title: AFDM Channel Estimation in Multi-Scale Multi-Lag Channels
Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a result of mobility, while ignoring the accompanied Doppler time scaling (DTS) on waveform. However, for underwater acoustic (UWA) communication at much lower CF and propagating at speed of sound, the DTS effect could not be ignored and poses significant challenges for channel estimation. This paper analyzes the channel frequency response (CFR) of AFDM under multi-scale multi-lag (MSML) channels, where each propagating path could have different delay and DFS/DTS. Based on the newly derived input-output formula and its characteristics, two new channel estimation methods are proposed, i.e., AFDM with iterative multi-index (AFDM-IMI) estimation under low to moderate DTS, and AFDM with orthogonal matching pursuit (AFDM-OMP) estimation under high DTS. Numerical results confirm the effectiveness of the proposed methods against the original AFDM channel estimation method. Moreover, the resulted AFDM system outperforms OFDM as well as Orthogonal Chirp Division Multiplexing (OCDM) in terms of channel estimation accuracy and bit error rate (BER), which is consistent with our theoretical analysis based on CFR overlap probability (COP), mutual incoherent property (MIP) and channel diversity gain under MSML channels.

[69]  arXiv:2405.02675 (cross-list from cs.SD) [pdf, other]
Title: Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.

[70]  arXiv:2405.02681 (cross-list from cs.IT) [pdf, other]
Title: Spider RIS: Mobilizing Intelligent Surfaces for Enhanced Wireless Communications
Comments: Accepted in 2024 EuCNC and 6G Summit, Antwerp, Belgium, 3-6 June 2024
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this study, we introduce Spider RIS technology, which offers an innovative solution to the challenges encountered in movable antennas (MAs) and unmanned aerial vehicle (UAV)-enabled communication systems. By combining the dynamic adaptation capability of MAs and the flexible location advantages of UAVs, this technology offers a dynamic and movable RIS, which can flexibly optimize physical locations within the two-dimensional movement platform. Spider RIS aims to enhance the communication efficiency and reliability of wireless networks, particularly in obstructive environments, by elevating the signal quality and achievable rate. The motivation of Spider RIS is based on the ability to fully exploit the spatial variability of wireless channels and maximize channel capacity even with a limited number of reflecting elements by overcoming the limitations of traditional fixed RIS and energy-intensive UAV systems. Considering the geometry-based millimeter wave channel model, we present the design of a three-stage angular-based hybrid beamforming system empowered by Spider RIS: First, analog beamformers are designed using angular information, followed by the generation of digital precoder/combiner based on the effective channel observed from baseband stage. Subsequently, the joint dynamic positioning with phase shift design of the Spider RIS is optimized using particle swarm optimization, maximizing the achievable rate of the systems.

[71]  arXiv:2405.02693 (cross-list from cs.NI) [pdf, ps, other]
Title: TV White Space and LTE Network Optimization towards Energy Efficiency in Suburban and Rural Scenarios
Journal-ref: IEEE Transactions on Broadcasting, vol. 64, no. 1, pp. 164-171, 2018
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

The radio spectrum is a limited resource. Demand for wireless communication services is increasing exponentially, stressing the availability of radio spectrum to accommodate new services. TV White Space (TVWS) technologies allow a dynamic usage of the spectrum. These technologies provide wireless connectivity, in the channels of the Very High Frequency (VHF) and Ultra High Frequency (UHF) television broadcasting bands. In this paper, we investigate and compare the coverage range, network capacity, and network energy efficiency for TVWS technologies and LTE. We consider Ghent, Belgium and Boyeros, Havana, Cuba to evaluate a realistic outdoor suburban and rural area, respectively. The comparison shows that TVWS networks have an energy efficiency 9-12 times higher than LTE networks.

[72]  arXiv:2405.02726 (cross-list from cs.LG) [pdf, other]
Title: A Mathematical Model of the Hidden Feedback Loop Effect in Machine Learning Systems
Comments: 21 pages, 15 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Widespread deployment of societal-scale machine learning systems necessitates a thorough understanding of the resulting long-term effects these systems have on their environment, including loss of trustworthiness, bias amplification, and violation of AI safety requirements. We introduce a repeated learning process to jointly describe several phenomena attributed to unintended hidden feedback loops, such as error amplification, induced concept drift, echo chambers and others. The process comprises the entire cycle of obtaining the data, training the predictive model, and delivering predictions to end-users within a single mathematical model. A distinctive feature of such repeated learning setting is that the state of the environment becomes causally dependent on the learner itself over time, thus violating the usual assumptions about the data distribution. We present a novel dynamical systems model of the repeated learning process and prove the limiting set of probability distributions for positive and negative feedback loop modes of the system operation. We conduct a series of computational experiments using an exemplary supervised learning problem on two synthetic data sets. The results of the experiments correspond to the theoretical predictions derived from the dynamical model. Our results demonstrate the feasibility of the proposed approach for studying the repeated learning processes in machine learning systems and open a range of opportunities for further research in the area.

[73]  arXiv:2405.02753 (cross-list from math.OC) [pdf, ps, other]
Title: Unscented Trajectory Optimization
Comments: 21 pages, 11 figures 2023 AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, Aug 13-17, 2023
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Statistics Theory (math.ST); Computation (stat.CO)

In a nutshell, unscented trajectory optimization is the generation of optimal trajectories through the use of an unscented transform. Although unscented trajectory optimization was introduced by the authors about a decade ago, it is reintroduced in this paper as a special instantiation of tychastic optimal control theory. Tychastic optimal control theory (from \textit{Tyche}, the Greek goddess of chance) avoids the use of a Brownian motion and the resulting It\^{o} calculus even though it uses random variables across the entire spectrum of a problem formulation. This approach circumvents the enormous technical and numerical challenges associated with stochastic trajectory optimization. Furthermore, it is shown how a tychastic optimal control problem that involves nonlinear transformations of the expectation operator can be quickly instantiated using an unscented transform. These nonlinear transformations are particularly useful in managing trajectory dispersions be it associated with path constraints or targeted values of final-time conditions. This paper also presents a systematic and rapid process for formulating and computing the most desirable tychastic trajectory using an unscented transform. Numerical examples are used to illustrate how unscented trajectory optimization may be used for risk reduction and mission recovery caused by uncertainties and failures.

[74]  arXiv:2405.02801 (cross-list from cs.SD) [pdf, other]
Title: Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Comments: 7 pages, 2 figures, submitted to ACM MM 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the cross-modality inputs, such as images, videos and text. Mozart's Touch is composed of three main components: Multi-modal Captioning Module, Large Language Model (LLM) Understanding & Bridging Module, and Music Generation Module. Unlike traditional approaches, Mozart's Touch requires no training or fine-tuning pre-trained models, offering efficiency and transparency through clear, interpretable prompts. We also introduce "LLM-Bridge" method to resolve the heterogeneous representation problems between descriptive texts of different modalities. We conduct a series of objective and subjective evaluations on the proposed model, and results indicate that our model surpasses the performance of current state-of-the-art models. Our codes and examples is availble at: https://github.com/WangTooNaive/MozartsTouch

[75]  arXiv:2405.02821 (cross-list from cs.SD) [pdf, other]
Title: Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)

Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation. We first validate our design choice in the SoundSpaces simulator and show improvement on the Continuous AudioGoal navigation benchmark. We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input. We further propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured spectral difference and the energy distribution of the received audio, which improves the performance on the real data. Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects. This work demonstrates the potential of building intelligent agents that can see, hear, and act entirely from simulation, and transferring them to the real world.

[76]  arXiv:2405.02823 (cross-list from cs.IT) [pdf, other]
Title: Reconfigurable Massive MIMO: Precoding Design and Channel Estimation in the Electromagnetic Domain
Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Reconfigurable massive multiple-input multiple-output (RmMIMO) technology offers increased flexibility for future communication systems by exploiting previously untapped degrees of freedom in the electromagnetic (EM) domain. The representation of the traditional spatial domain channel state information (sCSI) limits the insights into the potential of EM domain channel properties, constraining the base station's (BS) utmost capability for precoding design. This paper leverages the EM domain channel state information (eCSI) for radiation pattern design at the BS. We develop an orthogonal decomposition method based on spherical harmonic functions to decompose the radiation pattern into a linear combination of orthogonal bases. By formulating the radiation pattern design as an optimization problem for the projection coefficients over these bases, we develop a manifold optimization-based method for iterative radiation pattern and digital precoder design. To address the eCSI estimation problem, we capitalize on the inherent structure of the channel. Specifically, we propose a subspace-based scheme to reduce the pilot overhead for wideband sCSI estimation. Given the estimated full-band sCSI, we further employ parameterized methods for angle of arrival estimation. Subsequently, the complete eCSI can be reconstructed after estimating the equivalent channel gain via the least squares method. Simulation results demonstrate that, in comparison to traditional mMIMO systems with fixed antenna radiation patterns, the proposed RmMIMO architecture offers significant throughput gains for multi-user transmission at a low channel estimation overhead.

[77]  arXiv:2405.02931 (cross-list from cs.IT) [pdf, other]
Title: Optimal Signals and Detectors Based on Correlation and Energy
Comments: 27 pages, 5 figures; submitted for publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In continuation of an earlier study, we explore a Neymann-Pearson hypothesis testing scenario where, under the null hypothesis ($\cal{H}_0$), the received signal is a white noise process $N_t$, which is not Gaussian in general, and under the alternative hypothesis ($\cal{H}_1$), the received signal comprises a deterministic transmitted signal $s_t$ corrupted by additive white noise, the sum of $N_t$ and another noise process originating from the transmitter, denoted as $Z_t$, which is not necessarily Gaussian either. Our approach focuses on detectors that are based on the correlation and energy of the received signal, which are motivated by implementation simplicity. We optimize the detector parameters to achieve the best trade-off between missed-detection and false-alarm error exponents. First, we optimize the detectors for a given signal, resulting in a non-linear relation between the signal and correlator weights to be optimized. Subsequently, we optimize the transmitted signal and the detector parameters jointly, revealing that the optimal signal is a balanced ternary signal and the correlator has at most three different coefficients, thus facilitating a computationally feasible solution.

[78]  arXiv:2405.02942 (cross-list from physics.optics) [pdf, other]
Title: Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens
Comments: Accepted to Optics & Laser Technology
Subjects: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)

We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360{\deg}x(35{\deg}~110{\deg}) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 lenses. Moreover, we establish a physical structure model of PAL using the ray tracing method and study the influence of its physical parameters on compactness ratio. In addition, for the evaluation of local tolerances of annular surfaces, we propose a tolerance analysis method suitable for ASPAL. This analytical method can effectively analyze surface irregularities on annular surfaces and provide clear guidance on manufacturing tolerances for ASPAL. Benefiting from high-precision glass molding and injection molding aspheric lens manufacturing techniques, we finally manufactured 20 ASPALs in small batches. The weight of an ASPAL prototype is only 8.5 g. Our framework provides promising insights for the application of panoramic systems in space and weight-constrained environmental sensing scenarios such as intelligent security, micro-UAVs, and micro-robots.

[79]  arXiv:2405.02961 (cross-list from cs.CV) [pdf, other]
Title: JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos
Comments: Submitted to the International Journal of Computer Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Due to the ever-increasing availability of video surveillance cameras and the growing need for crime prevention, the violence detection task is attracting greater attention from the research community. With respect to other action recognition tasks, violence detection in surveillance videos shows additional issues, such as the presence of a significant variety of real fight scenes. Unfortunately, available datasets seem to be very small compared with other action recognition datasets. Moreover, in surveillance applications, people in the scenes always differ for each video and the background of the footage differs for each camera. Also, violent actions in real-life surveillance videos must be detected quickly to prevent unwanted consequences, thus models would definitely benefit from a reduction in memory usage and computational costs. Such problems make classical action recognition methods difficult to be adopted. To tackle all these issues, we introduce JOSENet, a novel self-supervised framework that provides outstanding performance for violence detection in surveillance videos. The proposed model receives two spatiotemporal video streams, i.e., RGB frames and optical flows, and involves a new regularized self-supervised learning approach for videos. JOSENet provides improved performance compared to self-supervised state-of-the-art methods, while requiring one-fourth of the number of frames per video segment and a reduced frame rate. The source code and the instructions to reproduce our experiments are available at https://github.com/ispamm/JOSENet.

[80]  arXiv:2405.02963 (cross-list from cs.CR) [pdf, ps, other]
Title: Preventive Audits for Data Applications Before Data Sharing in the Power IoT
Comments: 19 pages, 18 figures
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

With the increase in data volume, more types of data are being used and shared, especially in the power Internet of Things (IoT). However, the processes of data sharing may lead to unexpected information leakage because of the ubiquitous relevance among the different data, thus it is necessary for data owners to conduct preventive audits for data applications before data sharing to avoid the risk of key information leakage. Considering that the same data may play completely different roles in different application scenarios, data owners should know the expected data applications of the data buyers in advance and provide modified data that are less relevant to the private information of the data owners and more relevant to the nonprivate information that the data buyers need. In this paper, data sharing in the power IoT is regarded as the background, and the mutual information of the data and their implicit information is selected as the data feature parameter to indicate the relevance between the data and their implicit information or the ability to infer the implicit information from the data. Therefore, preventive audits should be conducted based on changes in the data feature parameters before and after data sharing. The probability exchange adjustment method is proposed as the theoretical basis of preventive audits under simplified consumption, and the corresponding optimization models are constructed and extended to more practical scenarios with multivariate characteristics. Finally, case studies are used to validate the effectiveness of the proposed preventive audits.

[81]  arXiv:2405.02979 (cross-list from cs.RO) [pdf, other]
Title: A Long-Short-Term Mixed-Integer Formulation for Highway Lane Change Planning
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work considers the problem of optimal lane changing in a structured multi-agent road environment. A novel motion planning algorithm that can capture long-horizon dependencies as well as short-horizon dynamics is presented. Pivotal to our approach is a geometric approximation of the long-horizon combinatorial transition problem which we formulate in the continuous time-space domain. Moreover, a discrete-time formulation of a short-horizon optimal motion planning problem is formulated and combined with the long-horizon planner. Both individual problems, as well as their combination, are formulated as MIQP and solved in real-time by using state-of-the-art solvers. We show how the presented algorithm outperforms two other state-of-the-art motion planning algorithms in closed-loop performance and computation time in lane changing problems. Evaluations are performed using the traffic simulator SUMO, a custom low-level tracking model predictive controller, and high-fidelity vehicle models and scenarios, provided by the CommonRoad environment.

[82]  arXiv:2405.02981 (cross-list from cs.IT) [pdf, other]
Title: Over-the-Air Majority Vote Computation with Modulation on Conjugate-Reciprocal Zeros
Authors: Alphan Sahin
Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In this study, we propose a new approach to compute the majority vote (MV) function based on modulation on conjugate-reciprocal zeros (MOCZ) and introduce three different methods. The proposed methods rely on the fact that when a linear combination of polynomials is evaluated at one of the roots of a polynomial in the combination, that polynomial does contribute to the evaluation. To utilize this property, each transmitter maps the votes to the zeros of a Huffman polynomial, and the corresponding polynomial coefficients are transmitted. The receiver evaluates the polynomial constructed by the elements of the superposed sequence at conjugate-reciprocal zero pairs and detects the MV with a direct zero-testing (DiZeT) decoder. With differential and index-based encoders, we eliminate the need for power-delay information at the receiver while improving the computation error rate (CER) performance. The proposed methods do not use instantaneous channel state information at the transmitters and receiver. Thus, they provide robustness against phase and time synchronization errors. We theoretically analyze the CERs of the proposed methods. Finally, we demonstrate their efficacy in a distributed median computation scenario in a fading channel.

[83]  arXiv:2405.02989 (cross-list from cs.CR) [pdf, other]
Title: Defense against Joint Poison and Evasion Attacks: A Case Study of DERMS
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

There is an upward trend of deploying distributed energy resource management systems (DERMS) to control modern power grids. However, DERMS controller communication lines are vulnerable to cyberattacks that could potentially impact operational reliability. While a data-driven intrusion detection system (IDS) can potentially thwart attacks during deployment, also known as the evasion attack, the training of the detection algorithm may be corrupted by adversarial data injected into the database, also known as the poisoning attack. In this paper, we propose the first framework of IDS that is robust against joint poisoning and evasion attacks. We formulate the defense mechanism as a bilevel optimization, where the inner and outer levels deal with attacks that occur during training time and testing time, respectively. We verify the robustness of our method on the IEEE-13 bus feeder model against a diverse set of poisoning and evasion attack scenarios. The results indicate that our proposed method outperforms the baseline technique in terms of accuracy, precision, and recall for intrusion detection.

[84]  arXiv:2405.02991 (cross-list from cs.SD) [pdf, other]
Title: Steered Response Power for Sound Source Localization: A Tutorial Review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.

[85]  arXiv:2405.02996 (cross-list from cs.SD) [pdf, other]
Title: RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
Comments: Accepted EMBC 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.

[86]  arXiv:2405.03039 (cross-list from cs.CV) [pdf, ps, other]
Title: Performance Evaluation of Real-Time Object Detection for Electric Scooters
Comments: 10 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)

Electric scooters (e-scooters) have rapidly emerged as a popular mode of transportation in urban areas, yet they pose significant safety challenges. In the United States, the rise of e-scooters has been marked by a concerning increase in related injuries and fatalities. Recently, while deep-learning object detection holds paramount significance in autonomous vehicles to avoid potential collisions, its application in the context of e-scooters remains relatively unexplored. This paper addresses this gap by assessing the effectiveness and efficiency of cutting-edge object detectors designed for e-scooters. To achieve this, the first comprehensive benchmark involving 22 state-of-the-art YOLO object detectors, including five versions (YOLOv3, YOLOv5, YOLOv6, YOLOv7, and YOLOv8), has been established for real-time traffic object detection using a self-collected dataset featuring e-scooters. The detection accuracy, measured in terms of mAP@0.5, ranges from 27.4% (YOLOv7-E6E) to 86.8% (YOLOv5s). All YOLO models, particularly YOLOv3-tiny, have displayed promising potential for real-time object detection in the context of e-scooters. Both the traffic scene dataset (https://zenodo.org/records/10578641) and software program codes (https://github.com/DongChen06/ScooterDet) for model benchmarking in this study are publicly available, which will not only improve e-scooter safety with advanced object detection but also lay the groundwork for tailored solutions, promising a safer and more sustainable urban micromobility landscape.

[87]  arXiv:2405.03054 (cross-list from quant-ph) [pdf, other]
Title: A Greedy Quantum Route-Generation Algorithm
Authors: Jordan Makansi
Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)

Routing and scheduling problems with time windows have long been important optimization problems for logistics and planning. Many classical heuristics and exact methods exist for such problems. However, there are no satisfactory methods for generating routes using quantum computing (QC), for mainly two reasons: inequality constraints, and the trade-off of feasibility and solution quality. Inequality constraints are typically handled using slack variables; and feasible solutions are found by filtering samples. These challenges are amplified in the presence of noise inherent in QC. Here, we propose a greedy algorithm that generates routes by using information from all samples obtained from the quantum computer. By noticing the relationship between qubits in our formulation as a directed acyclic graph (DAG), we designed an algorithm that adaptively constructs a feasible solution.
We prove its convergence to a feasible solution, and illustrate its efficacy by solving the Fleet Sizing Vehicle Routing Problem with Time Windows (FSVRPTW). Our computational results show that this method obtains a lower objective value than the current state-of-the-art annealing approaches, both classical and hybrid, for the same amount of time using D-Wave Hybrid Solvers. We also show its robustness to noise on D-Wave Advantage 4.1 through computational results as compared to the filtering approach on DWaveSampler, even when the filtering approach is given a longer annealing time, and a larger sample size.

[88]  arXiv:2405.03118 (cross-list from cs.SD) [pdf, other]
Title: Determined Multichannel Blind Source Separation with Clustered Source Model
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the other hand, NCPD preserves intrinsic structure but lacks interpretable latent factors, making it challenging to incorporate prior information as constraints. To address these limitations, we introduce a clustered source model based on nonnegative block-term decomposition (NBTD). This model defines blocks as outer products of vectors (clusters) and matrices (for spectral structure modeling), offering interpretable latent vectors. Moreover, it enables straightforward integration of orthogonality constraints to ensure independence among source images. Experimental results demonstrate that our proposed method outperforms ILRMA and its extensions in anechoic conditions and surpasses the original ILRMA in simulated reverberant environments.

[89]  arXiv:2405.03119 (cross-list from cs.IT) [pdf, ps, other]
Title: DAFT-Spread Affine Frequency Division Multiple Access for Downlink Transmission
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In this paper, we propose a discrete affine Fourier transform (DAFT)-spread AFDMA scheme based on the properties of the AFDM systems, named DAFT-s-AFDMA to significantly reduce the PAPR by resorting to the DAFT. We formulate the transmitted time-domain signals of the proposed DAFT-s-AFDMA schemes with localized and interleaved chirp subcarrier allocation strategies. Accordingly, we derive the guidelines for setting the DAFT parameters, revealing the insights of PAPR reduction. Finally, simulation results of PAPR comparison in terms of the complementary cumulative distribution function (CCDF) show that the proposed DAFT-s-AFDMA schemes with localized and interleaved strategies can both attain better PAPR performances than the conventional O-AFDMA scheme.

[90]  arXiv:2405.03123 (cross-list from math.OC) [pdf, other]
Title: Revealing Decision Conservativeness Through Inverse Distributionally Robust Optimization
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper introduces Inverse Distributionally Robust Optimization (I-DRO) as a method to infer the conservativeness level of a decision-maker, represented by the size of a Wasserstein metric-based ambiguity set, from the optimal decisions made using Forward Distributionally Robust Optimization (F-DRO). By leveraging the Karush-Kuhn-Tucker (KKT) conditions of the convex F-DRO model, we formulate I-DRO as a bi-linear program, which can be solved using off-the-shelf optimization solvers. Additionally, this formulation exhibits several advantageous properties. We demonstrate that I-DRO not only guarantees the existence and uniqueness of an optimal solution but also establishes the necessary and sufficient conditions for this optimal solution to accurately match the actual conservativeness level in F-DRO. Furthermore, we identify three extreme scenarios that may impact I-DRO effectiveness. Our case study applies F-DRO for power system scheduling under uncertainty and employs I-DRO to recover the conservativeness level of system operators. Numerical experiments based on an IEEE 5-bus system and a realistic NYISO 11-zone system demonstrate I-DRO performance in both normal and extreme scenarios.

[91]  arXiv:2405.03134 (cross-list from cs.HC) [pdf, other]
Title: Transhuman Ansambl - Voice Beyond Language
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper we present the design and development of the Transhuman Ansambl, a novel interactive singing-voice interface which senses its environment and responds to vocal input with vocalisations using human voice. Designed for live performance with a human performer and as a standalone sound installation, the ansambl consists of sixteen bespoke virtual singers arranged in a circle. When performing live, the virtual singers listen to the human performer and respond to their singing by reading pitch, intonation and volume cues. In a standalone sound installation mode, singers use ultrasonic distance sensors to sense audience presence. Developed as part of the 1st author's practice-based PhD and artistic practice as a live performer, this work employs the singing-voice to explore voice interactions in HCI beyond language, and innovative ways of live performing. How is technology supporting the effect of intimacy produced through voice? Does the act of surrounding the audience with responsive virtual singers challenge the traditional roles of performer-listener? To answer these questions, we draw upon the 1st author's experience with the system, and the interdisciplinary field of voice studies that consider the voice as the sound medium independent of language, capable of enacting a reciprocal connection between bodies.

[92]  arXiv:2405.03178 (cross-list from cs.SD) [pdf, other]
Title: POPDG: Popular 3D Dance Generation with PopDanceSet
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.

[93]  arXiv:2405.03300 (cross-list from cs.IT) [pdf, other]
Title: Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system in the presence of phase noise at the active RIS. Specifically, we employ a two-timescale scheme, where the beamforming at the base station (BS) is adjusted based on the instantaneous aggregated channel state information (CSI) and the statistical CSI serves as the basis for designing the phase shifts at the active RIS, so that the feedback overhead and computational complexity can be significantly reduced. The aggregated channel composed of the cascaded and direct channels is estimated by utilizing the linear minimum mean square error (LMMSE) technique. Based on the estimated channel, we derive the analytical closed-form expression of a lower bound of the achievable rate. The power scaling laws in the active RIS-aided system are investigated based on the theoretical expressions. When the transmit power of each user is scaled down by the number of BS antennas M or reflecting elements N, we find that the thermal noise will cause the lower bound of the achievable rate to approach zero, as the number of M or N increases to infinity. Moreover, an optimization approach based on genetic algorithms (GA) is introduced to tackle the phase shift optimization problem. Numerical results reveal that the active RIS can greatly enhance the performance of the considered system under various settings.

[94]  arXiv:2405.03320 (cross-list from cs.LG) [pdf, other]
Title: Denoising of Geodetic Time Series Using Spatiotemporal Graph Neural Networks: Application to Slow Slip Event Extraction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Geophysics (physics.geo-ph)

Geospatial data has been transformative for the monitoring of the Earth, yet, as in the case of (geo)physical monitoring, the measurements can have variable spatial and temporal sampling and may be associated with a significant level of perturbations degrading the signal quality. Denoising geospatial data is, therefore, essential, yet often challenging because the observations may comprise noise coming from different origins, including both environmental signals and instrumental artifacts, which are spatially and temporally correlated, thus hard to disentangle. This study addresses the denoising of multivariate time series acquired by irregularly distributed networks of sensors, requiring specific methods to handle the spatiotemporal correlation of the noise and the signal of interest. Specifically, our method focuses on the denoising of geodetic position time series, used to monitor ground displacement worldwide with centimeter- to-millimeter precision. Among the signals affecting GNSS data, slow slip events (SSEs) are of interest to seismologists. These are transients of deformation that are weakly emerging compared to other signals. Here, we design SSEdenoiser, a multi-station spatiotemporal graph-based attentive denoiser that learns latent characteristics of GNSS noise to reveal SSE-related displacement with sub-millimeter precision. It is based on the key combination of graph recurrent networks and spatiotemporal Transformers. The proposed method is applied to the Cascadia subduction zone, where SSEs occur along with bursts of tectonic tremors, a seismic rumbling identified from independent seismic recordings. The extracted events match the spatiotemporal evolution of tremors. This good space-time correlation of the denoised GNSS signals with the tremors validates the proposed denoising procedure.

[95]  arXiv:2405.03322 (cross-list from cs.SD) [pdf, other]
Title: Enhancing Aeroacoustic Wind Tunnel Studies through Massive Channel Upscaling with MEMS Microphones
Comments: 30th AIAA/CEAS Aeroacoustics Conference
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Instrumentation and Detectors (physics.ins-det)

This paper presents a large 6~m x 3~m aperture 7200 MEMS microphone array. The array is designed so that sub-arrays with optimized point spread functions can be used for beamforming and thus, enable the research of source directivity in wind tunnel facilities. The total array consists of modular 800 microphone panels, each consisting of four unique PCB board designs. This modular architecture allows for the time-synchronized measurement of an arbitrary number of panels and thus, aperture size and total number of sensors. The panels can be installed without a gap so that the array's microphone pattern avoids high sidelobes in the point spread function. The array's capabilities are evaluated on a 1:9.5 airframe half model in an open wind tunnel at DNW-NWB. The total source emission is quantified and the directivity is evaluated with beamforming. Additional far-field microphones are employed to validate the results.

[96]  arXiv:2405.03356 (cross-list from cs.NI) [pdf, other]
Title: An Overview of Intelligent Meta-surfaces for 6G and Beyond: Opportunities, Trends, and Challenges
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

With the impending arrival of the sixth generation (6G) of wireless communication technology, the telecommunications landscape is poised for another revolutionary transformation. At the forefront of this evolution are intelligent meta-surfaces (IS), emerging as a disruptive physical layer technology with the potential to redefine the capabilities and performance metrics of future wireless networks. As 6G evolves from concept to reality, industry stakeholders, standards organizations, and regulatory bodies are collaborating to define the specifications, protocols, and interoperability standards governing IS deployment. Against this background, this article delves into the ongoing standardization efforts, emerging trends, potential opportunities, and prevailing challenges surrounding the integration of IS into the framework of 6G and beyond networks. Specifically, it provides a tutorial-style overview of recent advancements in IS and explores their potential applications within future networks beyond 6G. Additionally, the article identifies key challenges in the design and implementation of various types of intelligent surfaces, along with considerations for their practical standardization. Finally, it highlights potential future prospects in this evolving field.

[97]  arXiv:2405.03385 (cross-list from cs.SD) [pdf, other]
Title: Fully Reversing the Shoebox Image Source Method: From Impulse Responses to Room Parameters
Authors: Tom Sprunck (IRMA), Antoine Deleforge (IRMA), Yannick Privat (IECL, SPHINX, IUF), Cédric Foy (UMRAE, Cerema Direction Est)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Classical Physics (physics.class-ph)

We present an algorithm that fully reverses the shoebox image source method (ISM), a popular and widely used room impulse response (RIR) simulator for cuboid rooms introduced by Allen and Berkley in 1979. More precisely, given a discrete multichannel RIR generated by the shoebox ISM for a microphone array of known geometry, the algorithm reliably recovers the 18 input parameters. These are the 3D source position, the 3 dimensions of the room, the 6-degrees-of-freedom room translation and orientation, and an absorption coefficient for each of the 6 room boundaries. The approach builds on a recently proposed gridless image source localization technique combined with new procedures for room axes recovery and first-order-reflection identification. Extensive simulated experiments reveal that near-exact recovery of all parameters is achieved for a 32-element, 8.4-cm-wide spherical microphone array and a sampling rate of 16~kHz using fully randomized input parameters within rooms of size 2X2X2 to 10X10X5 meters. Estimation errors decay towards zero when increasing the array size and sampling rate. The method is also shown to strongly outperform a known baseline, and its ability to extrapolate RIRs at new positions is demonstrated. Crucially, the approach is strictly limited to low-passed discrete RIRs simulated using the vanilla shoebox ISM. Nonetheless, it represents to our knowledge the first algorithmic demonstration that this difficult inverse problem is in-principle fully solvable over a wide range of configurations.

[98]  arXiv:2405.03393 (cross-list from cs.RO) [pdf, other]
Title: On-site scale factor linearity calibration of MEMS triaxial gyroscopes
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environments. The core strategy of the method involves utilizing the fact that the dot product of the measured gravity and the rotational speed in a fixed frame remains constant. To eliminate the influence of rotating centrifugal force on the accelerometer, the accelerometer data is measured while stationary. The proposed calibration experiment scheme, which allows gyroscopic measurements when operating each axis at a specific rotation speed, making it easier to evaluate the linearity across a related speed range constituted by a series of rotation speeds. Moreover, solely the classical least squares algorithm proves adequate for estimating the scale factor, notably streamlining the analysis of the calibration process. Extensive numerical simulations were conducted to analyze the proposed method's performance in calibrating a triaxial gyroscope model. Experimental validation was also carried out using a commercially available MEMS inertial measurement unit (LSM9DS1 from Arduino nano 33 BLE SENSE) and a servo motor capable of controlling precise speed. The experimental results effectively demonstrate the efficacy of the proposed calibration approach.

[99]  arXiv:2405.03470 (cross-list from cs.RO) [pdf, other]
Title: Motion Planning under Uncertainty: Integrating Learning-Based Multi-Modal Predictors into Branch Model Predictive Control
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

In complex traffic environments, autonomous vehicles face multi-modal uncertainty about other agents' future behavior. To address this, recent advancements in learningbased motion predictors output multi-modal predictions. We present our novel framework that leverages Branch Model Predictive Control(BMPC) to account for these predictions. The framework includes an online scenario-selection process guided by topology and collision risk criteria. This efficiently selects a minimal set of predictions, rendering the BMPC realtime capable. Additionally, we introduce an adaptive decision postponing strategy that delays the planner's commitment to a single scenario until the uncertainty is resolved. Our comprehensive evaluations in traffic intersection and random highway merging scenarios demonstrate enhanced comfort and safety through our method.

[100]  arXiv:2405.03484 (cross-list from cs.SD) [pdf, other]
Title: Whispy: Adapting STT Whisper Models to Real-Time Environments
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Large general-purpose transformer models have recently become the mainstay in the realm of speech analysis. In particular, Whisper achieves state-of-the-art results in relevant tasks such as speech recognition, translation, language identification, and voice activity detection. However, Whisper models are not designed to be used in real-time conditions, and this limitation makes them unsuitable for a vast plethora of practical applications. In this paper, we introduce Whispy, a system intended to bring live capabilities to the Whisper pretrained models. As a result of a number of architectural optimisations, Whispy is able to consume live audio streams and generate high level, coherent voice transcriptions, while still maintaining a low computational cost. We evaluate the performance of our system on a large repository of publicly available speech datasets, investigating how the transcription mechanism introduced by Whispy impacts on the Whisper output. Experimental results show how Whispy excels in robustness, promptness, and accuracy.

[101]  arXiv:2405.03561 (cross-list from cs.RO) [pdf, other]
Title: Model- and Data-Based Control of Self-Balancing Robots: Practical Educational Approach with LabVIEW and Arduino
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

A two-wheeled self-balancing robot (TWSBR) is non-linear and unstable system. This study compares the performance of model-based and data-based control strategies for TWSBRs, with an explicit practical educational approach. Model-based control (MBC) algorithms such as Lead-Lag and PID control require a proficient dynamic modeling and mathematical manipulation to drive the linearized equations of motions and develop the appropriate controller. On the other side, data-based control (DBC) methods, like fuzzy control, provide a simpler and quicker approach to designing effective controllers without needing in-depth understanding of the system model. In this paper, the advantages and disadvantages of both MBC and DBC using a TWSBR are illustrated. All controllers were implemented and tested on the OSOYOO self-balancing kit, including an Arduino microcontroller, MPU-6050 sensor, and DC motors. The control law and the user interface are constructed using the LabVIEW-LINX toolkit. A real-time hardware-in-loop experiment validates the results, highlighting controllers that can be implemented on a cost-effective platform.

[102]  arXiv:2405.03567 (cross-list from cs.SD) [pdf, other]
Title: Deep Space Separable Distillation for Lightweight Acoustic Scene Classification
Authors: ShuQi Ye, Yuan Tian
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.

Replacements for Tue, 7 May 24

[103]  arXiv:2006.11360 (replaced) [pdf, other]
Title: CDM Based Virtual FMCW MIMO Radar Imaging at 79GHz
Subjects: Signal Processing (eess.SP)
[104]  arXiv:2108.06009 (replaced) [pdf, ps, other]
Title: SAR image matching algorithm based on multi-class features
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[105]  arXiv:2112.10625 (replaced) [pdf, other]
Title: Privacy-Preserving Nonlinear Cloud-based Model Predictive Control via Affine Masking
Comments: 11 pages, 5 figures
Subjects: Systems and Control (eess.SY)
[106]  arXiv:2204.07756 (replaced) [pdf, other]
Title: Visual Attention Methods in Deep Learning: An In-Depth Survey
Comments: Accepted in Information Fusion
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[107]  arXiv:2207.13526 (replaced) [pdf, other]
Title: UltimateKalman: Flexible Kalman Filtering and Smoothing Using Orthogonal Transformations
Authors: Sivan Toledo
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
[108]  arXiv:2208.00841 (replaced) [pdf, other]
Title: Spline-Shaped Microstrip Edge-Fed Antenna for 77 GHz Automotive Radar Systems
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
[109]  arXiv:2208.14328 (replaced) [pdf, other]
Title: 3D Near-Field Virtual MIMO-SAR Imaging using FMCW Radar Systems at 77 GHz
Authors: Shahrokh Hamidi
Subjects: Signal Processing (eess.SP)
[110]  arXiv:2209.11112 (replaced) [pdf, other]
Title: CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Comments: 17 pages, 11 figures, and 6 tables. arXiv admin note: text overlap with arXiv:2203.15149
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2477-2493, 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111]  arXiv:2303.03090 (replaced) [pdf, other]
Title: Parallel Optimization with Hard Safety Constraints for Cooperative Planning of Connected Autonomous Vehicles
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[112]  arXiv:2304.06883 (replaced) [pdf, other]
Title: Intelligent Reflecting Surface Aided Wireless Communication Systems: Joint Location and Passive Beamforming Design
Comments: Following the publication of our work, we identified errors in our data analysis process. To uphold the standards of academic integrity and the accuracy of our findings, we feel it necessary to withdraw the current version of our paper. We plan to submit a revised version upon thorough review and correction of these errors
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[113]  arXiv:2305.12844 (replaced) [pdf, other]
Title: An Optimized Ensemble Deep Learning Model For Brain Tumor Classification
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[114]  arXiv:2306.02105 (replaced) [pdf, other]
Title: Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models
Comments: Accepted at SIGUL-LREC 2024
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2307.04760 (replaced) [pdf, other]
Title: Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2307.10321 (replaced) [pdf, other]
Title: Terahertz Communications and Sensing for 6G and Beyond: A Comprehensive Review
Comments: 56 pages, 9 figures, 11 tables, IEEE Communications Surveys & Tutorials
Subjects: Signal Processing (eess.SP)
[117]  arXiv:2307.16033 (replaced) [pdf, other]
Title: CoVid-19 Detection leveraging Vision Transformers and Explainable AI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[118]  arXiv:2308.04259 (replaced) [pdf, other]
Title: Generalized Forgetting Recursive Least Squares: Stability and Robustness Guarantees
Comments: Accepted to the IEEE Transactions on Automatic Control. Scheduled to appear in the 2024 November issue
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[119]  arXiv:2308.11647 (replaced) [pdf, other]
Title: Optically-Transparent EM Skins for Outdoor-to-Indoor mm-Wave Wireless Communications
Journal-ref: IEEE Access, 2024
Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY); Applied Physics (physics.app-ph)
[120]  arXiv:2309.07169 (replaced) [pdf, other]
Title: Spectral Convergence of Complexon Shift Operators
Comments: 9 pages, 2 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[121]  arXiv:2309.08464 (replaced) [pdf, ps, other]
Title: Differentially Private Average Consensus with Improved Accuracy-Privacy Trade-off
Subjects: Systems and Control (eess.SY)
[122]  arXiv:2309.16482 (replaced) [pdf, ps, other]
Title: Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
Comments: Accepted at HSCMA Sattelite Workshop at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[123]  arXiv:2310.15937 (replaced) [pdf, other]
Title: A Behavioral Perspective on Models of Linear Dynamical Networks with Manifest Variables
Subjects: Systems and Control (eess.SY)
[124]  arXiv:2311.14213 (replaced) [pdf, other]
Title: Learning to Solve Inverse Problems for Perceptual Sound Matching
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2312.01239 (replaced) [pdf, other]
Title: Motion Informed Needle Segmentation in Ultrasound Images
Comments: 7 pages, 4 figures, accepted at ISBI 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[126]  arXiv:2312.11387 (replaced) [pdf, other]
Title: Data-Driven Continuous-Time Framework for Frequency-Constrained Unit Commitment
Subjects: Systems and Control (eess.SY)
[127]  arXiv:2401.00813 (replaced) [pdf, other]
Title: Ultraspherical/Gegenbauer polynomials to unify 2D/3D Ambisonic directivity designs
Authors: Franz Zotter
Comments: 56 pages, 9 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[128]  arXiv:2401.07644 (replaced) [pdf, other]
Title: Resource Allocation in STAR-RIS-Aided SWIPT with RSMA via Meta-Learning
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[129]  arXiv:2401.15647 (replaced) [pdf, other]
Title: UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
[130]  arXiv:2402.03383 (replaced) [pdf, ps, other]
Title: A Collaborative Model-driven Network for MRI Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[131]  arXiv:2402.09907 (replaced) [pdf, other]
Title: On the convergence of Block Majorization-Minimization algorithms on the Grassmann Manifold
Comments: This work has been accepted to the IEEE Signal Processing letters. The IEEE is the copyright holder of this manuscript
Journal-ref: 10.1109/LSP.2024.3396660
Subjects: Optimization and Control (math.OC); Signal Processing (eess.SP)
[132]  arXiv:2403.01150 (replaced) [pdf, other]
Title: Singularity and Error Analysis of a Simple Quaternion Estimator
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)
[133]  arXiv:2403.05370 (replaced) [pdf, other]
Title: On the Certification of the Kinematics of 3-DOF Spherical Parallel Manipulators
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[134]  arXiv:2403.05771 (replaced) [pdf, other]
Title: Providing Safety Assurances for Systems with Unknown Dynamics
Comments: Submitted to L-CSS/CDC
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[135]  arXiv:2403.06659 (replaced) [pdf, other]
Title: Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
Comments: Accepted by ICML2024
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[136]  arXiv:2403.10522 (replaced) [pdf, other]
Title: Ordinal Classification with Distance Regularization for Robust Brain Age Prediction
Comments: Accepted in WACV 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[137]  arXiv:2404.00896 (replaced) [pdf, other]
Title: A Novel Algorithm for Digital Lithological Mapping-Case Studies in Sri Lanka's Mineral Exploration
Subjects: Image and Video Processing (eess.IV)
[138]  arXiv:2404.12725 (replaced) [pdf, other]
Title: Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
Authors: Zhaoxi Mu, Xinyu Yang
Comments: Accepted by IJCAI 2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[139]  arXiv:2404.19028 (replaced) [pdf, other]
Title: Adaptive Regulated Sparsity Promoting Approach for Data-Driven Modeling and Control of Grid-Connected Solar Photovoltaic Generation
Subjects: Systems and Control (eess.SY)
[140]  arXiv:2404.19500 (replaced) [pdf, other]
Title: Towards Real-world Video Face Restoration: A New Benchmark
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[141]  arXiv:2405.00136 (replaced) [pdf, other]
Title: Data-Driven Permissible Safe Control with Barrier Certificates
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
[142]  arXiv:2405.00316 (replaced) [pdf, other]
Title: Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[143]  arXiv:2405.00712 (replaced) [pdf, other]
Title: SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[144]  arXiv:2405.00727 (replaced) [pdf, other]
Title: Generalised envelope spectrum-based signal-to-noise objectives: Formulation, optimisation and application for gear fault detection under time-varying speed conditions
Comments: 27 pages, 15 figures, tables 1, submitted MSSP review
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Methodology (stat.ME)
[145]  arXiv:2405.01242 (replaced) [pdf, other]
Title: TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[146]  arXiv:2405.01558 (replaced) [pdf, other]
Title: Configurable Learned Holography
Comments: 14 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)
[147]  arXiv:2405.01726 (replaced) [pdf, ps, other]
Title: SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[148]  arXiv:2405.02132 (replaced) [pdf, other]
Title: Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[149]  arXiv:2405.02179 (replaced) [pdf, other]
Title: Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[ total of 149 entries: 1-149 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2405, contact, help  (Access key information)