Statistics
- [1] arXiv:2405.03778 [pdf, ps, html, other]
-
Title: An Autoregressive Model for Time Series of Random ObjectsSubjects: Methodology (stat.ME)
Random variables in metric spaces indexed by time and observed at equally spaced time points are receiving increased attention due to their broad applicability. However, the absence of inherent structure in metric spaces has resulted in a literature that is predominantly non-parametric and model-free. To address this gap in models for time series of random objects, we introduce an adaptation of the classical linear autoregressive model tailored for data lying in a Hadamard space. The parameters of interest in this model are the Fréchet mean and a concentration parameter, both of which we prove can be consistently estimated from data. Additionally, we propose a test statistic and establish its asymptotic normality, thereby enabling hypothesis testing for the absence of serial dependence. Finally, we introduce a bootstrap procedure to obtain critical values for the test statistic under the null hypothesis. Theoretical results of our method, including the convergence of the estimators as well as the size and power of the test, are illustrated through simulations, and the utility of the model is demonstrated by an analysis of a time series of consumer inflation expectations.
- [2] arXiv:2405.03814 [pdf, ps, html, other]
-
Title: Stochastic behavior of an n-node blockchain under cyber attacks from multiple hackers with random re-setting timesSubjects: Applications (stat.AP)
This paper investigates the stochastic behavior of an n-node blockchain which is continuously monitored and faces non-stop cyber attacks from multiple hackers. The blockchain will start being re-set once hacking is detected, forfeiting previous efforts of all hackers. It is assumed the re-setting process takes a random amount of time. Multiple independent hackers will keep attempting to hack into the blockchain until one of them succeeds. For arbitrary distributions of the hacking times, detecting times, and re-setting times, we derive the instantaneous functional probability, the limiting functional probability, and the mean functional time of the blockchain. Moreover, we establish that these quantities are increasing functions of the number of nodes, formalizing the intuition that the more nodes a blockchain has the more secure it is.
- [3] arXiv:2405.03815 [pdf, ps, html, other]
-
Title: Statistical inference for a stochastic generalized logistic differential equationComments: 22 pages, 3 figuresSubjects: Methodology (stat.ME); Probability (math.PR); Statistics Theory (math.ST)
This research aims to estimate three parameters in a stochastic generalized logistic differential equation. We assume the intrinsic growth rate and shape parameters are constant but unknown. To estimate these two parameters, we use the maximum likelihood method and establish that the estimators for these two parameters are strongly consistent. We estimate the diffusion parameter by using the quadratic variation processes. To test our results, we evaluate two data scenarios, complete and incomplete, with fixed values assigned to the three parameters. In the incomplete data scenario, we apply an Expectation Maximization algorithm.
- [4] arXiv:2405.03834 [pdf, ps, html, other]
-
Title: Covariance-free Multifidelity Control Variates Importance Sampling for Reliability Analysis of Rare EventsPromit Chakroborty (Dept. of Civil and Systems Engg, Johns Hopkins University), Somayajulu L. N. Dhulipala (Idaho National Laboratory), Michael D. Shields (Dept. of Civil and Systems Engg, Johns Hopkins University)Comments: 32 pages, 6 figuresSubjects: Methodology (stat.ME)
Multifidelity modeling has been steadily gaining attention as a tool to address the problem of exorbitant model evaluation costs that makes the estimation of failure probabilities a significant computational challenge for complex real-world problems, particularly when failure is a rare event. To implement multifidelity modeling, estimators that efficiently combine information from multiple models/sources are necessary. In past works, the variance reduction techniques of Control Variates (CV) and Importance Sampling (IS) have been leveraged for this task. In this paper, we present the CVIS framework; a creative take on a coupled Control Variates and Importance Sampling estimator for bifidelity reliability analysis. The framework addresses some of the practical challenges of the CV method by using an estimator for the control variate mean and side-stepping the need to estimate the covariance between the original estimator and the control variate through a clever choice for the tuning constant. The task of selecting an efficient IS distribution is also considered, with a view towards maximally leveraging the bifidelity structure and maintaining expressivity. Additionally, a diagnostic is provided that indicates both the efficiency of the algorithm as well as the relative predictive quality of the models utilized. Finally, the behavior and performance of the framework is explored through analytical and numerical examples.
- [5] arXiv:2405.03874 [pdf, ps, other]
-
Title: Non-locality and Spillover Effects of Residential Flood Damage on Community Recovery: Insights from High-resolution Flood Claim and Mobility DataSubjects: Applications (stat.AP)
Examining the relationship between vulnerability of the built environment and community recovery is crucial for understanding disaster resilience. Yet, this relationship is rather neglected in the existing literature due to previous limitations in the availability of empirical datasets needed for such analysis. In this study, we combine fine-resolution flood damage claims data (composed of both insured and uninsured losses) and human mobility data (composed of millions of movement trajectories) during the 2017 Hurricane Harvey in Harris County, Texas, to specify the extent to which vulnerability of the built environment (i.e., flood property damage) affects community recovery (based on the speed of human mobility recovery) locally and regionally. We examine this relationship using a spatial lag, spatial reach, and spatial decay models to measure the extent of spillover effects of residential damage on community recovery. The findings show that: first, the severity of residential damage significantly affects the speed of community recovery. A greater extent of residential damage suppresses community recovery not only locally but also in the surrounding areas. Second, the spatial spillover effect of residential damage on community recovery speed decays with distance from the highly damaged areas. Third, spatial areas display heterogeneous spatial decay coefficients, which are associated with urban structure features such as the density of points-of-interest facilities and roads. These findings provide a novel data-driven characterization of the spatial diffusion of residential flood damage effects on community recovery and move us closer to a better understanding of complex spatial processes that shape community resilience to hazards. This study also provides valuable insights for emergency managers and public officials seeking to mitigate the non-local effects of residential damage.
- [6] arXiv:2405.03879 [pdf, ps, html, other]
-
Title: Scalable Amortized GPLVMs for Single Cell Transcriptomics DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP)
Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.
- [7] arXiv:2405.03985 [pdf, ps, html, other]
-
Title: Bayesian Multilevel Compositional Data Analysis: Introduction, Evaluation, and ApplicationComments: 36 pages and 7 figures (6 pdf files)Subjects: Methodology (stat.ME); Computation (stat.CO)
Multilevel compositional data commonly occur in various fields, particularly in intensive, longitudinal studies using ecological momentary assessments. Examples include data repeatedly measured over time that are non-negative and sum to a constant value, such as sleep-wake movement behaviours in a 24-hour day. This article presents a novel methodology for analysing multilevel compositional data using a Bayesian inference approach. This method can be used to investigate how reallocation of time between sleep-wake movement behaviours may be associated with other phenomena (e.g., emotions, cognitions) at a daily level. We explain the theoretical details of the data and the models, and outline the steps necessary to implement this method. We introduce the R package multilevelcoda to facilitate the application of this method and illustrate using a real data example. An extensive parameter recovery simulation study verified the robust performance of the method. Across all simulation conditions investigated in the simulation study, the model had minimal convergence issues (convergence rate > 99%) and achieved excellent quality of parameter estimates and inference, with an average bias of 0.00 (range -0.09, 0.05) and coverage of 0.95 (range 0.93, 0.97). We conclude the article with recommendations on the use of the Bayesian compositional multilevel modelling approach, and hope to promote wider application of this method to answer robust questions using the increasingly available data from intensive, longitudinal studies.
- [8] arXiv:2405.04026 [pdf, ps, html, other]
-
Title: Federated Control in Markov Decision ProcessesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.
- [9] arXiv:2405.04043 [pdf, ps, other]
-
Title: Scalable Vertical Federated Learning via Data Augmentation and Amortized InferenceComments: 30 pages, 5 figures, 3 tablesSubjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.
- [10] arXiv:2405.04147 [pdf, ps, other]
-
Title: Multiparameter regularization and aggregation in the context of polynomial functional regressionElke R. Gizewski, Markus Holzleitner, Lukas Mayer-Suess, Sergiy Pereverzyev Jr., Sergei V. PereverzyevComments: 18 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)
Most of the recent results in polynomial functional regression have been focused on an in-depth exploration of single-parameter regularization schemes. In contrast, in this study we go beyond that framework by introducing an algorithm for multiple parameter regularization and presenting a theoretically grounded method for dealing with the associated parameters. This method facilitates the aggregation of models with varying regularization parameters. The efficacy of the proposed approach is assessed through evaluations on both synthetic and some real-world medical data, revealing promising results.
- [11] arXiv:2405.04193 [pdf, ps, other]
-
Title: A generalized ordinal quasi-symmetry model and its separability for analyzing multi-way tablesSubjects: Methodology (stat.ME)
This paper addresses the challenge of modeling multi-way contingency tables for matched set data with ordinal categories. Although the complete symmetry and marginal homogeneity models are well established, they may not always provide a satisfactory fit to the data. To address this issue, we propose a generalized ordinal quasi-symmetry model that offers increased flexibility when the complete symmetry model fails to capture the underlying structure. We investigate the properties of this new model and provide an information-theoretic interpretation, elucidating its relationship to the ordinal quasi-symmetry model. Moreover, we revisit Agresti's findings and present a new necessary and sufficient condition for the complete symmetry model, proving that the proposed model and the marginal moment equality model are separable hypotheses. The separability of the proposed model and marginal moment equality model is a significant development in the analysis of multi-way contingency tables. It enables researchers to examine the symmetry structure in the data with greater precision, providing a more thorough understanding of the underlying patterns. This powerful framework equips researchers with the necessary tools to explore the complexities of ordinal variable relationships in matched set data, paving the way for new discoveries and insights.
- [12] arXiv:2405.04226 [pdf, ps, html, other]
-
Title: NEST: Neural Estimation by Sequential TestingSubjects: Methodology (stat.ME)
Adaptive psychophysical procedures aim to increase the efficiency and reliability of measurements. With increasing stimulus and experiment complexity in the last decade, estimating multi-dimensional psychometric functions has become a challenging task for adaptive procedures. If the experimenter has limited information about the underlying psychometric function, it is not possible to use parametric techniques developed for the multi-dimensional stimulus space. Although there are non-parametric approaches that use Gaussian process methods and specific hand-crafted acquisition functions, their performance is sensitive to proper selection of the kernel function, which is not always straightforward. In this work, we use a neural network as the psychometric function estimator and introduce a novel acquisition function for stimulus selection. We thoroughly benchmark our technique both using simulations and by conducting psychovisual experiments under realistic conditions. We show that our method outperforms the state of the art without the need to select a kernel function and significantly reduces the experiment duration.
- [13] arXiv:2405.04238 [pdf, ps, other]
-
Title: Homogeneity of multinomial populations when data are classified into a large number of groupsComments: 37 pages, 1 figureSubjects: Methodology (stat.ME)
Suppose that we are interested in the comparison of two independent categorical variables. Suppose also that the population is divided into subpopulations or groups. Notice that the distribution of the target variable may vary across subpopulations, moreover, it may happen that the two independent variables have the same distribution in the whole population, but their distributions could differ in some groups. So, instead of testing the homogeneity of the two categorical variables, one may be interested in simultaneously testing the homogeneity in all groups. A novel procedure is proposed for carrying out such a testing problem. The test statistic is shown to be asymptotically normal, avoiding the use of complicated resampling methods to get $p$-values. Here by asymptotic we mean when the number of groups increases; the sample sizes of the data from each group can either stay bounded or grow with the number of groups. The finite sample performance of the proposal is empirically evaluated through an extensive simulation study. The usefulness of the proposal is illustrated by three data sets coming from diverse experimental fields such as education, the COVID-19 pandemic and digital elevation models.
- [14] arXiv:2405.04254 [pdf, ps, other]
-
Title: Distributed variable screening for generalized linear modelsSubjects: Methodology (stat.ME)
In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.
- [15] arXiv:2405.04269 [pdf, ps, other]
-
Title: An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering AlgorithmSubjects: Applications (stat.AP)
The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. We applied a hybridization of clustering technique to analyze data categorization and topological data analysis method to enhance the performance of our clustering analysis. Specifically, our approach utilized the persistent homology and $k$-means/$k$-means++ clustering. The fourteen data sets from fourteen tide gauge stations were categorized in classes based on a prior categorization that was determined by topological information, and the probability of data points that belong to certain groups that is yielded by $k$-means/$k$-means++ clustering. Our results demonstrated that our method significantly improves the performance of traditional clustering techniques.
- [16] arXiv:2405.04393 [pdf, ps, other]
-
Title: Efficient Online Set-valued Classification with Bandit FeedbackSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with correct actions, thereby affecting the accuracy of quantile estimation. To address these limitations, we propose Bandit Class-specific Conformal Prediction (BCCP), offering coverage guarantees on a class-specific granularity. Using an unbiased estimation of an estimand involving the true label, BCCP trains the model and makes set-valued inferences through stochastic gradient descent. Our approach overcomes the challenges of sparsely labeled data in each iteration and generalizes the reliability and applicability of conformal prediction to online decision-making environments.
- [17] arXiv:2405.04419 [pdf, ps, other]
-
Title: Transportability of Principal Causal EffectsSubjects: Methodology (stat.ME)
Recent research in causal inference has made important progress in addressing challenges to the external validity of trial findings. Such methods weight trial participant data to more closely resemble the distribution of effect-modifying covariates in a well-defined target population. In the presence of participant non-adherence to study medication, these methods effectively transport an intention-to-treat effect that averages over heterogeneous compliance behaviors. In this paper, we develop a principal stratification framework to identify causal effects conditioning on both on compliance behavior and membership in the target population. We also develop non-parametric efficiency theory for and construct efficient estimators of such "transported" principal causal effects and characterize their finite-sample performance in simulation experiments. While this work focuses on treatment non-adherence, the framework is applicable to a broad class of estimands that target effects in clinically-relevant, possibly latent subsets of a target population.
- [18] arXiv:2405.04446 [pdf, ps, other]
-
Title: Causal Inference in the Multiverse of HazardSubjects: Methodology (stat.ME)
Hazard serves as a pivotal estimand in both practical applications and methodological frameworks. However, its causal interpretation poses notable challenges, including inherent selection biases and ill-defined populations to be compared between different treatment groups. In response, we propose a novel definition of counterfactual hazard within the framework of possible worlds. Instead of conditioning on prior survival status as a conditional probability, our new definition involves intervening in the prior status, treating it as a marginal probability. Using single-world intervention graphs, we demonstrate that the proposed counterfactual hazard is a type of controlled direct effect. Conceptually, intervening in survival status at each time point generates a new possible world, where the proposed hazards across time points represent risks in these hypothetical scenarios, forming a "multiverse of hazard." The cumulative and average counterfactual hazards correspond to the sum and average of risks across this multiverse, respectively, with the actual world's risk lying between the two. This conceptual shift reframes hazards in the actual world as a collection of risks across possible worlds, marking a significant advancement in the causal interpretation of hazards.
- [19] arXiv:2405.04475 [pdf, ps, html, other]
-
Title: Bayesian Copula Density Estimation Using Bernstein Yett-Uniform PriorsComments: arXiv admin note: text overlap with arXiv:2109.03768Subjects: Methodology (stat.ME)
Probability density estimation is a central task in statistics. Copula-based models provide a great deal of flexibility in modelling multivariate distributions, allowing for the specifications of models for the marginal distributions separately from the dependence structure (copula) that links them to form a joint distribution. Choosing a class of copula models is not a trivial task and its misspecification can lead to wrong conclusions. We introduce a novel class of random Bernstein copula functions, and studied its support and the behavior of its posterior distribution. The proposal is based on a particular class of random grid-uniform copulas, referred to as yett-uniform copulas. Alternative Markov chain Monte Carlo algorithms for exploring the posterior distribution under the proposed model are also studied. The methodology is illustrated by means of simulated and real data.
- [20] arXiv:2405.04487 [pdf, ps, html, other]
-
Title: UQ state-dependent framework for seismic fragility assessment of industrial componentsSubjects: Computation (stat.CO); Applications (stat.AP)
In this study, we propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying Recently, there has been increased interest in assessing the seismic fragility of industrial plants and process equipment. This is reflected in the growing number of studies, community-funded research projects and experimental campaigns on the matter.Nonetheless, the complexity of the problem and its inherent modelling, coupled with a general scarcity of available data on process equipment, has limited the development of risk assessment methods. In fact, these limitations have led to the creation of simplified and quick-to-run models. In this context, we propose an innovative framework for developing state-dependent fragility functions. This new methodology combines limited data with the power of metamodelling and statistical techniques, namely polynomial chaos expansions (PCE) and bootstrapping. Therefore, we validated the framework on a simplified and inexpensive-to-run MDoF system endowed with Bouc-Wen hysteresis.Then, we tested it on a real nonstructural industrial process component. Specifically, we applied the state-dependent fragility framework to a critical vertical tank of a multicomponent full-scale 3D steel braced frame (BF). The seismic performance of the BF endowed with process components was captured by means of shake table campaign within the European SPIF project. Finally, we derived state-dependent fragility functions based on the combination of PCE and bootstrap at a greatly reduced computational cost.
- [21] arXiv:2405.04507 [pdf, ps, other]
-
Title: New allometric models for the USA create a step-change in forest carbon estimation, modeling, and mappingLucas K. Johnson (1), Michael J. Mahoney (1), Grant Domke (2), Colin M. Beier (1) ((1) State University of New York College of Environmental Science and Forestry, (2) USDA Forest Service)Comments: Manuscript: 16 pages, 7 figures; Supplements: 3 pages, 2 figures; Submitted to: Remote Sensing of EnvironmentSubjects: Applications (stat.AP)
The United States national forest inventory (NFI) serves as the foundation for forest aboveground biomass (AGB) and carbon accounting across the nation. These data enable design-based estimates of forest carbon stocks and stock-changes at state and regional levels, but also serve as inputs to model-based approaches for characterizing forest carbon stocks and stock-changes at finer resolutions. Although NFI tree and plot-level data are often treated as truth in these models, they are in fact estimates based on regional species-group models known collectively as the Component Ratio Method (CRM). In late 2023 the Forest Inventory and Analysis (FIA) program introduced a new National Scale Volume and Biomass Estimators (NSVB) system to replace CRM nationwide and offer more precise and accurate representations of forest AGB and carbon. Given the prevalence of model-based AGB studies relying on FIA, there is concern about the transferability of methods from CRM to NSVB models, as well as the comparability of existing CRM AGB products (e.g. maps) to new and forthcoming NSVB AGB products. To begin addressing these concerns we compared previously published CRM AGB maps to new maps produced using identical methods with NSVB AGB reference data. Our results suggest that models relying on passive satellite imagery (e.g. Landsat) provide acceptable estimates of point-in-time NSVB AGB and carbon stocks, but fail to accurately quantify growth in mature closed-canopy forests. We highlight that existing estimates, models, and maps based on FIA reference data are no longer compatible with NSVB, and recommend new methods as well as updated models and maps for accommodating this step-change. Our collective ability to adopt NSVB in our modeling and mapping workflows will help us provide the most accurate spatial forest carbon data possible in order to better inform local management and decision making.
- [22] arXiv:2405.04531 [pdf, ps, other]
-
Title: Stochastic Gradient MCMC for Massive Geostatistical DataSubjects: Methodology (stat.ME); Computation (stat.CO)
Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP inference. Our work adds to the substantial research in this area by developing a stochastic gradient Markov chain Monte Carlo (SGMCMC) framework for efficient computation in GPs. At each step, the algorithm subsamples a minibatch of locations and subsequently updates process parameters through a Vecchia-approximated GP likelihood. Since the Vecchia-approximated GP has a time complexity that is linear in the number of locations, this results in scalable estimation in GPs. Through simulation studies, we demonstrate that SGMCMC is competitive with state-of-the-art scalable GP algorithms in terms of computational time and parameter estimation. An application of our method is also provided using the Argo dataset of ocean temperature measurements.
New submissions for Wednesday, 8 May 2024 (showing 22 of 22 entries )
- [23] arXiv:2209.02685 (cross-list from astro-ph.IM) [pdf, ps, other]
-
Title: Neural Networks Optimized by Genetic Algorithms in CosmologyComments: 15 pages, 4 figures; matches the version published in Physical Review DJournal-ref: Phys. Rev. D 107, 043509 (2023)Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (stat.ML)
The applications of artificial neural networks in the cosmological field have shone successfully during the past decade, this is due to their great ability of modeling large amounts of datasets and complex nonlinear functions. However, in some cases, their use still remains controversial because their ease of producing inaccurate results when the hyperparameters are not carefully selected. In this paper, to find the optimal combination of hyperparameters to artificial neural networks, we propose to take advantage of the genetic algorithms. As a proof of the concept, we analyze three different cosmological cases to test the performance of the architectures achieved with the genetic algorithms and compare them with the standard process, consisting of a grid with all possible configurations. First, we carry out a model-independent reconstruction of the distance modulus using a type Ia supernovae compilation. Second, the neural networks learn to infer the equation of state for the quintessence model, and finally with the data from a combined redshift catalog the neural networks predict the photometric redshift given six photometric bands (urgizy). We found that the genetic algorithms improve considerably the generation of the neural network architectures, which can ensure more confidence in their physical results because of the better performance in the metrics with respect to the grid method.
- [24] arXiv:2405.03720 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Spatial Transfer Learning with Simple MLPSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
First step to investigate the potential of transfer learning applied to the field of spatial statistics
- [25] arXiv:2405.03723 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Generative adversarial learning with optimal input dimension and its adaptive generator architectureSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We investigate the impact of the input dimension on the generalization error in generative adversarial networks (GANs). In particular, we first provide both theoretical and practical evidence to validate the existence of an optimal input dimension (OID) that minimizes the generalization error. Then, to identify the OID, we introduce a novel framework called generalized GANs (G-GANs), which includes existing GANs as a special case. By incorporating the group penalty and the architecture penalty developed in the paper, G-GANs have several intriguing features. First, our framework offers adaptive dimensionality reduction from the initial dimension to a dimension necessary for generating the target distribution. Second, this reduction in dimensionality also shrinks the required size of the generator network architecture, which is automatically identified by the proposed architecture penalty. Both reductions in dimensionality and the generator network significantly improve the stability and the accuracy of the estimation and prediction. Theoretical support for the consistent selection of the input dimension and the generator network is provided. Third, the proposed algorithm involves an end-to-end training process, and the algorithm allows for dynamic adjustments between the input dimension and the generator network during training, further enhancing the overall performance of G-GANs. Extensive experiments conducted with simulated and benchmark data demonstrate the superior performance of G-GANs. In particular, compared to that of off-the-shelf methods, G-GANs achieves an average improvement of 45.68% in the CT slice dataset, 43.22% in the MNIST dataset and 46.94% in the FashionMNIST dataset in terms of the maximum mean discrepancy or Frechet inception distance. Moreover, the features generated based on the input dimensions identified by G-GANs align with visually significant features.
- [26] arXiv:2405.03734 (cross-list from cs.HC) [pdf, ps, html, other]
-
Title: FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt EngineeringSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Applications (stat.AP)
Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance.
We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction. - [27] arXiv:2405.03875 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Rethinking Data Shapley for Data Selection Tasks: Misleads and MeritsComments: ICML 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis testing framework and show that Data Shapley's performance can be no better than random selection without specific constraints on utility functions. We identify a class of utility functions, monotonically transformed modular functions, within which Data Shapley optimally selects data. Based on this insight, we propose a heuristic for predicting Data Shapley's effectiveness in data selection tasks. Our experiments corroborate these findings, adding new insights into when Data Shapley may or may not succeed.
- [28] arXiv:2405.03910 (cross-list from econ.EM) [pdf, ps, html, other]
-
Title: A Primer on the Analysis of Randomized Experiments and a Survey of some Recent AdvancesSubjects: Econometrics (econ.EM); Methodology (stat.ME)
The past two decades have witnessed a surge of new research in the analysis of randomized experiments. The emergence of this literature may seem surprising given the widespread use and long history of experiments as the "gold standard" in program evaluation, but this body of work has revealed many subtle aspects of randomized experiments that may have been previously unappreciated. This article provides an overview of some of these topics, primarily focused on stratification, regression adjustment, and cluster randomization.
- [29] arXiv:2405.03913 (cross-list from q-bio.QM) [pdf, ps, html, other]
-
Title: Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing ProcessComments: 12 pages, 5 figuresSubjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Machine Learning (stat.ML)
Biomanufacturing innovation relies on an efficient design of experiments (DoE) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach that can guide a sequential DoEs for digital twin model calibration. In this study, we consider a multi-scale mechanistic model for cell culture process, also known as Biological Systems-of-Systems (Bio-SoS), as our digital twin. This model with modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs.
- [30] arXiv:2405.04011 (cross-list from q-bio.MN) [pdf, ps, html, other]
-
Title: Adjoint Sensitivity Analysis on Multi-Scale Bioprocess Stochastic Reaction NetworkComments: 11 pages, 2 figuresSubjects: Molecular Networks (q-bio.MN); Machine Learning (stat.ML)
Motivated by the pressing challenges in the digital twin development for biomanufacturing process, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse production processes and leverage the information from existing macro-kinetic and genome-scale models. To support forward prediction and backward reasoning, we develop a convergent adjoint SA algorithm studying how the perturbations of model parameters and inputs (e.g., initial state) propagate through enzymatic reaction networks and impact on output trajectory predictions. This SA can provide a sample efficient and interpretable way to assess the sensitivities between inputs and outputs accounting for their causal dependencies. Our empirical study underscores the resilience of these sensitivities and illuminates a deeper comprehension of the regulatory mechanisms behind bioprocess through sensitivities.
- [31] arXiv:2405.04346 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Revisiting character-level adversarial attacksComments: Accepted in ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our implementation is available in this https URL.
- [32] arXiv:2405.04352 (cross-list from econ.GN) [pdf, ps, other]
-
Title: Return to Office and the Tenure DistributionComments: 6 figures, 3 tables, 18 pagesSubjects: General Economics (econ.GN); Applications (stat.AP)
With the official end of the COVID-19 pandemic, debates about the return to office have taken center stage among companies and employees. Despite their ubiquity, the economic implications of return to office policies are not fully understood. Using 260 million resumes matched to company data, we analyze the causal effects of such policies on employees' tenure and seniority levels at three of the largest US tech companies: Microsoft, SpaceX, and Apple. Our estimation procedure is nonparametric and captures the full heterogeneity of tenure and seniority of employees in a distributional synthetic controls framework. We estimate a reduction in counterfactual tenure that increases for employees with longer tenure. Similarly, we document a leftward shift in the seniority distribution towards positions below the senior level. These shifts appear to be driven by employees leaving to larger firms that are direct competitors. Our results suggest that return to office policies can lead to an outflow of senior employees, posing a potential threat to the productivity, innovation, and competitiveness of the wider firm.
- [33] arXiv:2405.04517 (cross-list from cs.LG) [pdf, ps, other]
-
Title: xLSTM: Extended Long Short-Term MemoryMaximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp HochreiterSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.
Cross submissions for Wednesday, 8 May 2024 (showing 11 of 11 entries )
- [34] arXiv:2008.08718 (replaced) [pdf, ps, other]
-
Title: Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regressionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria or Lepskii principle.
- [35] arXiv:2012.00180 (replaced) [pdf, ps, other]
-
Title: Anisotropic local constant smoothing for change-point regression function estimationComments: 30 pages, 12 figures. Parts of this Original Manuscript are in an article published by Taylor & Francis in the Journal of Applied Statistics on April 20th 2024, available at this https URLSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
Understanding forest fire spread in any region of Canada is critical to promoting forest health, and protecting human life and infrastructure. Quantifying fire spread from noisy images, where regions of a fire are separated by change-point boundaries, is critical to faithfully estimating fire spread rates. In this research, we develop a statistically consistent smooth estimator that allows us to denoise fire spread imagery from micro-fire experiments. We develop an anisotropic smoothing method for change-point data that uses estimates of the underlying data generating process to inform smoothing. We show that the anisotropic local constant regression estimator is consistent with convergence rate $O\left(n^{-1/{(q+2)}}\right)$. We demonstrate its effectiveness on simulated one- and two-dimensional change-point data and fire spread imagery from micro-fire experiments.
- [36] arXiv:2205.07739 (replaced) [pdf, ps, other]
-
Title: The Role of Pseudo-labels in Self-training Linear Classifiers on High-dimensional Gaussian Mixture DataComments: 65 pages, 13 figuresSubjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Statistics Theory (math.ST)
Self-training (ST) is a simple yet effective semi-supervised learning method. However, why and how ST improves generalization performance by using potentially erroneous pseudo-labels is still not well understood. To deepen the understanding of ST, we derive and analyze a sharp characterization of the behavior of iterative ST when training a linear classifier by minimizing the ridge-regularized convex loss on binary Gaussian mixtures, in the asymptotic limit where input dimension and data size diverge proportionally. The results show that ST improves generalization in different ways depending on the number of iterations. When the number of iterations is small, ST improves generalization performance by fitting the model to relatively reliable pseudo-labels and updating the model parameters by a large amount at each iteration. This suggests that ST works intuitively. On the other hand, with many iterations, ST can gradually improve the direction of the classification plane by updating the model parameters incrementally, using soft labels and small regularization. It is argued that this is because the small update of ST can extract information from the data in an almost noiseless way. However, in the presence of label imbalance, the generalization performance of ST underperforms supervised learning with true labels. To overcome this, two heuristics are proposed to enable ST to achieve nearly compatible performance with supervised learning even with significant label imbalance.
- [37] arXiv:2207.09098 (replaced) [pdf, ps, html, other]
-
Title: ReBoot: Distributed statistical learning via refitting bootstrap samplesSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
In this paper, we propose a one-shot distributed learning algorithm via refitting bootstrap samples, which we refer to as ReBoot. ReBoot refits a new model to mini-batches of bootstrap samples that are continuously drawn from each of the locally fitted models. It requires only one round of communication of model parameters without much memory. Theoretically, we analyze the statistical error rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems, respectively. In both cases, ReBoot provably achieves the full-sample statistical rate. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples (i.e., the number of sites), is $O(n ^ {-2})$ in GLM, where $n$ is the subsample size (the sample size of each local site). This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Our simulation study demonstrates the statistical advantage of ReBoot over competing methods. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification. FedReBoot exhibits substantial superiority over Federated Averaging (FedAvg) within early rounds of communication.
- [38] arXiv:2209.01328 (replaced) [pdf, ps, html, other]
-
Title: Optimal empirical Bayes estimation for the Poisson model via minimum-distance methodsComments: 28 pages, 7 figures, 3 tables. Added an extension to a multivariate setup and a comparison with the worst case priorSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
The Robbins estimator is the most iconic and widely used procedure in the empirical Bayes literature for the Poisson model. On one hand, this method has been recently shown to be minimax optimal in terms of the regret (excess risk over the Bayesian oracle that knows the true prior) for various nonparametric classes of priors. On the other hand, it has been long recognized in practice that Robbins estimator lacks the desired smoothness and monotonicity of Bayes estimators and can be easily derailed by those data points that were rarely observed before. Based on the minimum-distance distance method, we propose a suite of empirical Bayes estimators, including the classical nonparametric maximum likelihood, that outperform the Robbins method in a variety of synthetic and real data sets and retain its optimality in terms of minimax regret.
- [39] arXiv:2211.02039 (replaced) [pdf, ps, other]
-
Title: The Projected Covariance Measure for assumption-lean variable significance testingComments: 97 pages, 5 figuresSubjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.
- [40] arXiv:2304.02563 (replaced) [pdf, ps, html, other]
-
Title: The transcoding sampler for stick-breaking inferences on Dirichlet process mixturesComments: Some sections have been moved to the Appendix, and Appendix A has been updated with a more relevant exampleSubjects: Methodology (stat.ME)
Dirichlet process mixture models suffer from slow mixing of the MCMC posterior chain produced by stick-breaking Gibbs samplers, as opposed to collapsed Gibbs samplers based on the Polya urn representation which have shorter integrated autocorrelation time (IAT).
We study how cluster membership information is encoded under the two aforementioned samplers, and we introduce the transcoding algorithm to switch between encodings. We also develop the transcoding sampler, which consists of undertaking posterior partition inference with any high-efficiency sampler, such as collapsed Gibbs, and to subsequently transcode it to the stick-breaking representation via the transcoding algorithm, thereby allowing inference on all stick-breaking parameters of interest while retaining the shorter IAT of the high-efficiency sampler.
The transcoding sampler is substantially simpler to implement than the slice sampler, it can inherit the shorter IAT of collapsed Gibbs samplers and it can also achieve zero IAT when paired with a posterior partition sampler that is i.i.d., such as the sequential importance sampler. - [41] arXiv:2311.02043 (replaced) [pdf, ps, html, other]
-
Title: Bayesian Quantile Regression with Subset Selection: A Posterior Summarization PerspectiveSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Quantile regression is a powerful tool for inferring how covariates affect specific percentiles of the response distribution. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi- or non-parametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Further, neither approach is well-suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model, we derive optimal and interpretable linear estimates and uncertainty quantification for each model-based conditional quantile. Our approach introduces a quantile-focused squared error loss, which enables efficient, closed-form computing and maintains a close relationship with Wasserstein-based density estimation. In an extensive simulation study, our methods demonstrate substantial gains in quantile estimation accuracy, variable selection, and inference over frequentist and Bayesian competitors. We apply these tools to identify the quantile-specific impacts of social and environmental stressors on educational outcomes for a large cohort of children in North Carolina.
- [42] arXiv:2312.07186 (replaced) [pdf, ps, other]
-
Title: Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares AlgorithmComments: arXiv admin note: text overlap with arXiv:2208.01711Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space. These rates allow to treat the misspecified case in which the true regression function is not contained in the hypothesis space. We combine standard assumptions on the capacity of the hypothesis space with a novel tensor product construction of vector-valued interpolation spaces in order to characterize the smoothness of the regression function. Our upper bound not only attains the same rate as real-valued kernel ridge regression, but also removes the assumption that the target regression function is bounded. For the lower bound, we reduce the problem to the scalar setting using a projection argument. We show that these rates are optimal in most cases and independent of the dimension of the output space. We illustrate our results for the special case of vector-valued Sobolev spaces.
- [43] arXiv:2401.00097 (replaced) [pdf, ps, html, other]
-
Title: Recursive identification with regularization and on-line hyperparameters estimationComments: https://hal.science/hal-04337419Subjects: Methodology (stat.ME); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper presents a regularized recursive identification algorithm with simultaneous on-line estimation of both the model parameters and the algorithms hyperparameters. A new kernel is proposed to facilitate the algorithm development. The performance of this novel scheme is compared with that of the recursive least squares algorithm in simulation.
- [44] arXiv:2401.02048 (replaced) [pdf, ps, html, other]
-
Title: Random Effect Restricted Mean Survival Time ModelSubjects: Methodology (stat.ME); Applications (stat.AP)
The restricted mean survival time (RMST) model has been garnering attention as a way to provide a clinically intuitive measure: the mean survival time. RMST models, which use methods based on pseudo time-to-event values and inverse probability censoring weighting, can adjust covariates. However, no approach has yet been introduced that considers random effects for clusters. In this paper, we propose a new random-effect RMST. We present two methods of analysis that consider variable effects by i) using a generalized mixed model with pseudo-values and ii) integrating the estimated results from the inverse probability censoring weighting estimating equations for each cluster. We evaluate our proposed methods through computer simulations. In addition, we analyze the effect of a mother's age at birth on under-five deaths in India using states as clusters.
- [45] arXiv:2402.01810 (replaced) [pdf, ps, html, other]
-
Title: Parameter uncertainties for imperfect surrogate models in the low-noise regimeSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, the loss ignores misspecification, where models are imperfect. Parameter uncertainties from Bayesian regression are thus significantly underestimated and vanish in the large data limit. This is particularly problematic when building models of low-noise, or near-deterministic, calculations, as the main source of uncertainty is neglected. We analyze the generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show posterior distributions must cover every training point to avoid a divergent generalization error and design an ansatz that respects this constraint, which for linear models incurs minimal overhead. This is demonstrated on model problems before application to thousand dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors where existing schemes fail, allowing this important source of uncertainty to be incorporated in computational workflows.
- [46] arXiv:2404.01469 (replaced) [pdf, ps, html, other]
-
Title: A group testing based exploration of age-varying factors in chlamydia infections among Iowa residentsSubjects: Applications (stat.AP); Methodology (stat.ME)
Group testing, a method that screens subjects in pooled samples rather than individually, has been employed as a cost-effective strategy for chlamydia screening among Iowa residents. In efforts to deepen our understanding of chlamydia epidemiology in Iowa, several group testing regression models have been proposed. Different than previous approaches, we expand upon the varying coefficient model to capture potential age-varying associations with chlamydia infection risk. In general, our model operates within a Bayesian framework, allowing regression associations to vary with a covariate of key interest. We employ a stochastic search variable selection process for regularization in estimation. Additionally, our model can integrate random effects to consider potential geographical factors and estimate unknown assay accuracy probabilities. The performance of our model is assessed through comprehensive simulation studies. Upon application to the Iowa group testing dataset, we reveal a significant age-varying racial disparity in chlamydia infections. We believe this discovery has the potential to inform the enhancement of interventions and prevention strategies, leading to more effective chlamydia control and management, thereby promoting health equity across all populations.
- [47] arXiv:2405.01015 (replaced) [pdf, ps, other]
-
Title: Network reconstruction via the minimum description length principleComments: 17 pages, 10 figures. Code and documentation are available at this https URLSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Populations and Evolution (q-bio.PE)
A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.
- [48] arXiv:2202.08081 (replaced) [pdf, ps, other]
-
Title: Reasoning with fuzzy and uncertain evidence using epistemic random fuzzy sets: general framework and practical modelsJournal-ref: Fuzzy Sets and Systems, Vol. 453, Pages 1-36, 2023Subjects: Artificial Intelligence (cs.AI); Methodology (stat.ME)
We introduce a general theory of epistemic random fuzzy sets for reasoning with fuzzy or crisp evidence. This framework generalizes both the Dempster-Shafer theory of belief functions, and possibility theory. Independent epistemic random fuzzy sets are combined by the generalized product-intersection rule, which extends both Dempster's rule for combining belief functions, and the product conjunctive combination of possibility distributions. We introduce Gaussian random fuzzy numbers and their multi-dimensional extensions, Gaussian random fuzzy vectors, as practical models for quantifying uncertainty about scalar or vector quantities. Closed-form expressions for the combination, projection and vacuous extension of Gaussian random fuzzy numbers and vectors are derived.
- [49] arXiv:2205.00171 (replaced) [pdf, ps, other]
-
Title: A Heteroskedasticity-Robust Overidentifying Restriction Test with High-Dimensional CovariatesSubjects: Econometrics (econ.EM); Statistics Theory (math.ST); Machine Learning (stat.ML)
This paper proposes an overidentifying restriction test for high-dimensional linear instrumental variable models. The novelty of the proposed test is that it allows the number of covariates and instruments to be larger than the sample size. The test is scale-invariant and is robust to heteroskedastic errors. To construct the final test statistic, we first introduce a test based on the maximum norm of multiple parameters that could be high-dimensional. The theoretical power based on the maximum norm is higher than that in the modified Cragg-Donald test (Kolesár, 2018), the only existing test allowing for large-dimensional covariates. Second, following the principle of power enhancement (Fan et al., 2015), we introduce the power-enhanced test, with an asymptotically zero component used to enhance the power to detect some extreme alternatives with many locally invalid instruments. Finally, an empirical example of the trade and economic growth nexus demonstrates the usefulness of the proposed test.
- [50] arXiv:2305.19076 (replaced) [pdf, ps, other]
-
Title: Approximate Bayesian Class-Conditional Models under Continuous Representation ShiftComments: Published at AISTATS 2024, 9 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
For models consisting of a classifier in some representation space, learning online from a non-stationary data stream often necessitates changes in the representation. So, the question arises of what is the best way to adapt the classifier to shifts in representation. Current methods only slowly change the classifier to representation shift, introducing noise into learning as the classifier is misaligned to the representation. We propose DeepCCG, an empirical Bayesian approach to solve this problem. DeepCCG works by updating the posterior of a class conditional Gaussian classifier such that the classifier adapts in one step to representation shift. The use of a class conditional Gaussian classifier also enables DeepCCG to use a log conditional marginal likelihood loss to update the representation. To perform the update to the classifier and representation, DeepCCG maintains a fixed number of examples in memory and so a key part of DeepCCG is selecting what examples to store, choosing the subset that minimises the KL divergence between the true posterior and the posterior induced by the subset. We explore the behaviour of DeepCCG in online continual learning (CL), demonstrating that it performs well against a spectrum of online CL methods and that it reduces the change in performance due to representation shift.
- [51] arXiv:2306.01470 (replaced) [pdf, ps, html, other]
-
Title: Understanding MLP-Mixer as a Wide and Sparse MLPComments: Accepted in ICML 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Multi-layer perceptron (MLP) is a fundamental component of deep learning, and recent MLP-based architectures, especially the MLP-Mixer, have achieved significant empirical success. Nevertheless, our understanding of why and how the MLP-Mixer outperforms conventional MLPs remains largely unexplored. In this work, we reveal that sparseness is a key mechanism underlying the MLP-Mixers. First, the Mixers have an effective expression as a wider MLP with Kronecker-product weights, clarifying that the Mixers efficiently embody several sparseness properties explored in deep learning. In the case of linear layers, the effective expression elucidates an implicit sparse regularization caused by the model architecture and a hidden relation to Monarch matrices, which is also known as another form of sparse parameterization. Next, for general cases, we empirically demonstrate quantitative similarities between the Mixer and the unstructured sparse-weight MLPs. Following a guiding principle proposed by Golubeva, Neyshabur and Gur-Ari (2021), which fixes the number of connections and increases the width and sparsity, the Mixers can demonstrate improved performance.
- [52] arXiv:2307.08200 (replaced) [pdf, ps, other]
-
Title: Ternary Stochastic Geometry Theory for Performance Analysis of RIS-Assisted UDNComments: 13 pages, 10 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP); Statistics Theory (math.ST)
Currently, network topology becomes increasingly complex with the increased number of various network nodes, bringing in the challenge of network design and analysis. Most of the current studies are deduced based on the binary system stochastic geometry, overlooking the coupling and collaboration among nodes. This limitation makes it difficult to accurately analyze network systems, such as reconfigurable intelligent surface (RIS) assisted ultra-dense network (UDN). To address this issue, we propose a dual coordinate system analysis method, by using dual observation points and their established coordinates. The concept of a typical triangle that consists of a base station (BS), a RIS, and a user equipment (UE) is defined as the fundamental unit of analysis for ternary stochastic geometry. This triangle comprises the base station, the RIS, and the user equipment (UE). Furthermore, we extend Campbell's theorem and propose an approximate probability generating function for ternary stochastic geometry. Utilizing the theoretical framework of ternary stochastic geometry, we derive and analyze performance metrics of a RIS-assisted UDN system, such as coverage probability, area spectral efficiency, area energy efficiency, and energy coverage efficiency. Simulation results show that RIS can significantly enhance system performance, particularly for UEs with high signal-to-interference-plus-noise ratios, exhibiting a phenomenon similar to the Matthew effect.
- [53] arXiv:2311.03036 (replaced) [pdf, ps, other]
-
Title: On regularized polynomial functional regressionComments: 26 pagesJournal-ref: Journal of Complexity, Volume 83, August 2024, 101853Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Statistics Theory (math.ST)
This article offers a comprehensive treatment of polynomial functional regression, culminating in the establishment of a novel finite sample bound. This bound encompasses various aspects, including general smoothness conditions, capacity conditions, and regularization techniques. In doing so, it extends and generalizes several findings from the context of linear functional regression as well. We also provide numerical evidence that using higher order polynomial terms can lead to an improved performance.
- [54] arXiv:2311.17146 (replaced) [pdf, ps, html, other]
-
Title: Calabi-Yau Four/Five/Six-folds as $\mathbb{P}^n_\textbf{w}$ Hypersurfaces: Machine Learning, Approximation, and GenerationComments: 37 pages; 13 figures; 12 tables. v2: published version (Phys. Rev. D), minor correctionsJournal-ref: Phys. Rev. D 109, 106006 (2024)Subjects: High Energy Physics - Theory (hep-th); Algebraic Geometry (math.AG); Machine Learning (stat.ML)
Calabi-Yau four-folds may be constructed as hypersurfaces in weighted projective spaces of complex dimension 5 defined via weight systems of 6 weights. In this work, neural networks were implemented to learn the Calabi-Yau Hodge numbers from the weight systems, where gradient saliency and symbolic regression then inspired a truncation of the Landau-Ginzburg model formula for the Hodge numbers of any dimensional Calabi-Yau constructed in this way. The approximation always provides a tight lower bound, is shown to be dramatically quicker to compute (with compute times reduced by up to four orders of magnitude), and gives remarkably accurate results for systems with large weights. Additionally, complementary datasets of weight systems satisfying the necessary but insufficient conditions for transversality were constructed, including considerations of the interior point, reflexivity, and intradivisibility properties. Overall producing a classification of this weight system landscape, further confirmed with machine learning methods. Using the knowledge of this classification, and the properties of the presented approximation, a novel dataset of transverse weight systems consisting of 7 weights was generated for a sum of weights $\leq 200$; producing a new database of Calabi-Yau five-folds, with their respective topological properties computed. Further to this an equivalent database of candidate Calabi-Yau six-folds was generated with approximated Hodge numbers.
- [55] arXiv:2312.00761 (replaced) [pdf, ps, other]
-
Title: Deep Unlearning: Fast and Efficient Training-free Approach to Class ForgettingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Machine unlearning is a prominent and challenging field, driven by regulatory demands for user data deletion and heightened privacy awareness. Existing approaches involve retraining model or multiple finetuning steps for each deletion request, often constrained by computational limits and restricted data access. In this work, we introduce a novel class unlearning algorithm designed to strategically eliminate specific classes from the learned model. Our algorithm first estimates the Retain and the Forget Spaces using Singular Value Decomposition on the layerwise activations for a small subset of samples from the retain and unlearn classes, respectively. We then compute the shared information between these spaces and remove it from the forget space to isolate class-discriminatory feature space. Finally, we obtain the unlearned model by updating the weights to suppress the class discriminatory features from the activation spaces. We demonstrate our algorithm's efficacy on ImageNet using a Vision Transformer with only $\sim 1.5\%$ drop in retain accuracy compared to the original model while maintaining under $1\%$ accuracy on the unlearned class samples. Further, our algorithm consistently performs well when subject to Membership Inference Attacks showing $7.8\%$ improvement on average across a variety of image classification datasets and network architectures, as compared to other baselines while being $\sim 6 \times$ more computationally efficient. Our code is available at this https URL.
- [56] arXiv:2312.05429 (replaced) [pdf, ps, other]
-
Title: Mitigating Nonlinear Algorithmic Bias in Binary ClassificationComments: 5 pages, 3 figures, 12 tables. arXiv admin note: text overlap with arXiv:2310.12421Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Applications (stat.AP)
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias that is nonlinear in the protected attribute. We provide a general overview of our approach. We use the German Credit data set, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on age bias and the problem of binary classification. We show that the probability of getting correctly classified as "low risk" is lowest among young people. The probability increases with age nonlinearly. To incorporate the nonlinearity into the causal model, we introduce a higher order polynomial term. Based on the fitted causal model, the de-biased probability estimates are computed, showing improved fairness with little impact on overall classification accuracy. Causal modeling is intuitive and, hence, its use can enhance explicability and promotes trust among different stakeholders of AI.
- [57] arXiv:2312.08927 (replaced) [pdf, ps, other]
-
Title: Limit Order Book Dynamics and Order Size Modelling Using Compound Hawkes ProcessComments: Presented at Market Microstructure 2023, Accepted at Quantitative Finance Workshop 2024. To be submitted for publication to a journalSubjects: Trading and Market Microstructure (q-fin.TR); Computational Engineering, Finance, and Science (cs.CE); Computational Finance (q-fin.CP); Applications (stat.AP)
Hawkes Process has been used to model Limit Order Book (LOB) dynamics in several ways in the literature however the focus has been limited to capturing the inter-event times while the order size is usually assumed to be constant. We propose a novel methodology of using Compound Hawkes Process for the LOB where each event has an order size sampled from a calibrated distribution. The process is formulated in a novel way such that the spread of the process always remains positive. Further, we condition the model parameters on time of day to support empirical observations. We make use of an enhanced non-parametric method to calibrate the Hawkes kernels and allow for inhibitory cross-excitation kernels. We showcase the results and quality of fits for an equity stock's LOB in the NASDAQ exchange and compare them against several baselines. Finally, we conduct a market impact study of the simulator and show the empirical observation of a concave market impact function is indeed replicated.
- [58] arXiv:2401.01404 (replaced) [pdf, ps, other]
-
Title: Scalable network reconstruction in subquadratic timeComments: 12 pages, 7 figures. Code and documentation available at this https URLSubjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Computation (stat.CO); Machine Learning (stat.ML)
Network reconstruction consists in determining the unobserved pairwise couplings between $N$ nodes given only observational data on the resulting behavior that is conditioned on those couplings -- typically a time-series or independent samples from a graphical model. A major obstacle to the scalability of algorithms proposed for this problem is a seemingly unavoidable quadratic complexity of $\Omega(N^2)$, corresponding to the requirement of each possible pairwise coupling being contemplated at least once, despite the fact that most networks of interest are sparse, with a number of non-zero couplings that is only $O(N)$. Here we present a general algorithm applicable to a broad range of reconstruction problems that significantly outperforms this quadratic baseline. Our algorithm relies on a stochastic second neighbor search (Dong et al., 2011) that produces the best edge candidates with high probability, thus bypassing an exhaustive quadratic search. If we rely on the conjecture that the second-neighbor search finishes in log-linear time (Baron & Darling, 2020; 2022), we demonstrate theoretically that our algorithm finishes in subquadratic time, with a data-dependent complexity loosely upper bounded by $O(N^{3/2}\log N)$, but with a more typical log-linear complexity of $O(N\log^2N)$. In practice, we show that our algorithm achieves a performance that is many orders of magnitude faster than the quadratic baseline -- in a manner consistent with our theoretical analysis -- allows for easy parallelization, and thus enables the reconstruction of networks with hundreds of thousands and even millions of nodes and edges.
- [59] arXiv:2402.00027 (replaced) [pdf, ps, html, other]
-
Title: Perspectives on locally weighted ensemble Kalman methodsSubjects: Numerical Analysis (math.NA); Probability (math.PR); Computation (stat.CO)
This manuscript derives locally weighted ensemble Kalman methods from the point of view of ensemble-based function approximation. This is done by using pointwise evaluations to build up a local linear or quadratic approximation of a function, tapering off the effect of distant particles via local weighting. This introduces a candidate method (the locally weighted Ensemble Kalman method for inversion) with the motivation of combining some of the strengths of the particle filter (ability to cope with nonlinear maps and non-Gaussian distributions) and the Ensemble Kalman filter (no filter degeneracy).
- [60] arXiv:2402.01932 (replaced) [pdf, ps, other]
-
Title: A Virtual Solar Wind Monitor at Mars with Uncertainty Quantification using Gaussian ProcessesA. R. Azari, E. Abrahams, F. Sapienza, J. Halekas, J. Biersteker, D. L. Mitchell, F. Pérez, M. Marquette, M. J. Rutala, C. F. Bowers, C. M. Jackman, S. M. CurryComments: submitted to JGR: Machine Learning and ComputationSubjects: Space Physics (physics.space-ph); Applications (stat.AP)
Single spacecraft missions do not measure the pristine solar wind continuously because of the spacecrafts' orbital trajectory. The infrequent spatiotemporal cadence of measurement fundamentally limits conclusions about solar wind-magnetosphere coupling throughout the solar system. At Mars, such single spacecraft missions result in limitations for assessing the solar wind's role in causing lower altitude observations such as auroral dynamics or atmospheric loss. In this work, we detail the development of a virtual solar wind monitor from the Mars Atmosphere and Volatile Evolution (MAVEN) mission; a single spacecraft. This virtual solar wind monitor provides a continuous estimate of the solar wind upstream from Mars with uncertainties. We specifically employ Gaussian process regression to estimate the upstream solar wind and uncertainty estimations that scale with the data sparsity of our real observations. This proxy enables continuous solar wind estimation at Mars with representative uncertainties for the majority of the time since since late 2014. We conclude by discussing suggested uses of this virtual solar wind monitor for statistical studies of the Mars space environment and heliosphere.
- [61] arXiv:2402.12875 (replaced) [pdf, ps, other]
-
Title: Chain of Thought Empowers Transformers to Solve Inherently Serial ProblemsComments: 38 pages. Accepted by ICLR 2024Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Machine Learning (stat.ML)
Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
- [62] arXiv:2404.04057 (replaced) [pdf, ps, html, other]
-
Title: Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step GenerationComments: ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at this https URL
- [63] arXiv:2404.08803 (replaced) [pdf, ps, other]
-
Title: Random walks on simplicial complexesSubjects: Probability (math.PR); Machine Learning (stat.ML)
The notion of Laplacian of a graph can be generalized to simplicial complexes and hypergraphs, and contains information on the topology of these structures. Even for a graph, the consideration of associated simplicial complexes is interesting to understand its shape. Whereas the Laplacian of a graph has a simple probabilistic interpretation as the generator of a continuous time Markov chain on the graph, things are not so direct when considering simplicial complexes. We define here new Markov chains on simplicial complexes. For a given order~$k$, the state space is the set of $k$-cycles that are chains of $k$-simplexes with null boundary. This new framework is a natural generalization of the canonical Markov chains on graphs. We show that the generator of our Markov chain is the upper Laplacian defined in the context of algebraic topology for discrete structure. We establish several key properties of this new process: in particular, when the number of vertices is finite, the Markov chain is positive recurrent. This result is not trivial, since the cycles can loop over themselves an unbounded number of times. We study the diffusive limits when the simplicial complexes under scrutiny are a sequence of ever refining triangulations of the flat torus. Using the analogy between singular and Hodge homologies, we express this limit as valued in the set of currents. The proof of tightness and the identification of the limiting martingale problem make use of the flat norm and carefully controls of the error terms in the convergence of the generator. Uniqueness of the solution to the martingale problem is left open. An application to hole detection is carried.
- [64] arXiv:2404.10550 (replaced) [pdf, ps, other]
-
Title: Analytical Approximation of the ELBO Gradient in the Context of the Clutter ProblemComments: 19 pages, 5 figures, supporting code available at this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems where the statistical model is a Bayesian network consisting of observations drawn from a mixture of a Gaussian distribution embedded in unrelated clutter, known as the clutter problem. The method employs the reparameterization trick to move the gradient operator inside the expectation and relies on the assumption that, because the likelihood factorizes over the observed data, the variational distribution is generally more compactly supported than the Gaussian distribution in the likelihood factors. This allows efficient local approximation of the individual likelihood factors, which leads to an analytical solution for the integral defining the gradient expectation. We integrate the proposed gradient approximation as the expectation step in an EM (Expectation Maximization) algorithm for maximizing ELBO and test against classical deterministic approaches in Bayesian inference, such as the Laplace approximation, Expectation Propagation and Mean-Field Variational Inference. The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
- [65] arXiv:2404.17626 (replaced) [pdf, ps, other]
-
Title: Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK BiobankSubjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores (p-value < 0.05), found for diabetes, arthritis, gall stones, cystitis, asthma and osteoarthritis. For the interaction and pretrained models that outperformed the baseline, the PRS score was the primary driver behind prediction. Our findings indicate that both interaction terms and pre-training can enhance prediction accuracy but for a limited set of diseases and moderate improvements in accuracy
- [66] arXiv:2405.01507 (replaced) [pdf, ps, other]
-
Title: Accelerating Convergence in Bayesian Few-Shot ClassificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components. Code is publicly available at this https URL.