Statistics Theory
- [1] arXiv:2405.18562 [pdf, ps, html, other]
-
Title: Covariance Operator Estimation via Adaptive ThresholdingComments: 38 pages, 5 figuresSubjects: Statistics Theory (math.ST); Probability (math.PR)
This paper studies sparse covariance operator estimation for nonstationary Gaussian processes with sharply varying marginal variance and small correlation lengthscale. We introduce a covariance operator estimator that adaptively thresholds the sample covariance function using an estimate of the variance components. Building on recent results from empirical process theory, we derive an operator norm bound on the estimation error in terms of the sparsity level of the covariance and the expected supremum of the normalized process. Our theory and numerical simulations demonstrate the advantage of adaptive threshold estimators over universal threshold and sample covariance estimators in nonstationary settings.
- [2] arXiv:2405.18828 [pdf, ps, other]
-
Title: CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-InspirationSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
The present work aims at proving mathematically that a neural network inspired by biology can learn a classification task thanks to local transformations only. In this purpose, we propose a spiking neural network named CHANI (Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration), whose neurons activity is modeled by Hawkes processes. Synaptic weights are updated thanks to an expert aggregation algorithm, providing a local and simple learning rule. We were able to prove that our network can learn on average and asymptotically. Moreover, we demonstrated that it automatically produces neuronal assemblies in the sense that the network can encode several classes and that a same neuron in the intermediate layers might be activated by more than one class, and we provided numerical simulations on synthetic dataset. This theoretical approach contrasts with the traditional empirical validation of biologically inspired networks and paves the way for understanding how local learning rules enable neurons to form assemblies able to represent complex concepts.
- [3] arXiv:2405.19016 [pdf, ps, html, other]
-
Title: Adaptive posterior concentration rates for sparse high-dimensional linear regression with random design and unknown error varianceThe Tien MaiSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
This paper investigates sparse high-dimensional linear regression, particularly examining the properties of the posterior under conditions of random design and unknown error variance. We provide consistency results for the posterior and analyze its concentration rates, demonstrating adaptiveness to the unknown sparsity level of the regression coefficient vector. Furthermore, we extend our investigation to establish concentration outcomes for parameter estimation using specific distance measures. These findings are in line with recent discoveries in frequentist studies. Additionally, by employing techniques to address model misspecification through a fractional posterior, we broaden our analysis through oracle inequalities to encompass the critical aspect of model misspecification for the regular posterior. Our novel findings are demonstrated using two different types of sparsity priors: a shrinkage prior and a spike-and-slab prior.
- [4] arXiv:2405.19248 [pdf, ps, html, other]
-
Title: Bivariate phase-type distributions for experience ratingSubjects: Statistics Theory (math.ST)
In this paper, we consider the problem of experience rating within the classic Markov chain life insurance framework. We begin by investigating various multivariate mixed Poisson models with mixing distributions encompassing independent Gamma, hierarchical Gamma, and multivariate phase-type. In particular, we demonstrate how maximum likelihood estimation for these proposed models can be performed using expectation-maximization algorithms, which might be of independent interest. Subsequently, we establish a link between mixed Poisson distributions and the problem of pricing group disability insurance contracts that exhibit heterogeneity. We focus on shrinkage estimation of disability and recovery rates, taking into account sampling effects such as right-censoring. Finally, we showcase the practicality of these shrinkage estimators through a numerical study based on simulated yet realistic insurance data. Our findings highlight that by allowing for dependency between latent group effects, estimates of recovery and disability rates mutually improve, leading to enhanced predictive performance.
New submissions for Thursday, 30 May 2024 (showing 4 of 4 entries )
- [5] arXiv:2405.18856 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Inference under covariate-adaptive randomization with many strataSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Covariate-adaptive randomization is widely employed to balance baseline covariates in interventional studies such as clinical trials and experiments in development economics. Recent years have witnessed substantial progress in inference under covariate-adaptive randomization with a fixed number of strata. However, concerns have been raised about the impact of a large number of strata on its design and analysis, which is a common scenario in practice, such as in multicenter randomized clinical trials. In this paper, we propose a general framework for inference under covariate-adaptive randomization, which extends the seminal works of Bugni et al. (2018, 2019) by allowing for a diverging number of strata. Furthermore, we introduce a novel weighted regression adjustment that ensures efficiency improvement. On top of establishing the asymptotic theory, practical algorithms for handling situations involving an extremely large number of strata are also developed. Moreover, by linking design balance and inference robustness, we highlight the advantages of stratified block randomization, which enforces better covariate balance within strata compared to simple randomization. This paper offers a comprehensive landscape of inference under covariate-adaptive randomization, spanning from fixed to diverging to extremely large numbers of strata.
Cross submissions for Thursday, 30 May 2024 (showing 1 of 1 entries )
- [6] arXiv:2211.02472 (replaced) [pdf, ps, html, other]
-
Title: Posterior Contraction rate and Asymptotic Bayes Optimality for one-group shrinkage priors in sparse normal means problemSubjects: Statistics Theory (math.ST)
We consider a high-dimensional sparse normal means model where the goal is to estimate the mean vector assuming the proportion of non-zero means is unknown. We model the mean vector by a one-group global-local shrinkage prior belonging to a broad class of such priors that includes the horseshoe prior. We address some questions related to asymptotic properties of the resulting posterior distribution of the mean vector for the said class priors. We consider two ways to model the global parameter in this paper. Firstly by considering this as an unknown fixed parameter and then by an empirical Bayes estimate of it. In the second approach, we do a hierarchical Bayes treatment by assigning a suitable non-degenerate prior distribution to it. We first show that for the class of priors under study, the posterior distribution of the mean vector contracts around the true parameter at a near minimax rate when the empirical Bayes approach is used. Next, we prove that in the hierarchical Bayes approach, the corresponding Bayes estimate attains the minimax risk asymptotically under the squared error loss function. We also show that the posterior contracts around the true parameter at a near minimax rate. These results generalize those of van der Pas et al. (2014) \cite{van2014horseshoe}, (2017) \cite{van2017adaptive}, proved for the horseshoe prior. We have also studied in this work the asymptotic Bayes optimality of global-local shrinkage priors where the number of non-null hypotheses is unknown. Here our target is to propose some conditions on the prior density of the global parameter such that the Bayes risk induced by the decision rule attains Optimal Bayes risk, up to some multiplicative constant. Using our proposed condition, under the asymptotic framework of Bogdan et al. (2011) \cite{bogdan2011asymptotic}, we are able to provide an affirmative answer to satisfy our hunch.
- [7] arXiv:2302.05139 (replaced) [pdf, ps, other]
-
Title: SCoRE Sets: A Versatile Framework for Simultaneous InferenceSubjects: Statistics Theory (math.ST)
We study asymptotic statistical inference in the space of bounded functions endowed with the supremums norm over an arbitrary metric space $S$ using a novel concept: Simultaneous COnfidence Region of Excursion (SCoRE) Sets. They simultaneously quantify the uncertainty of several lower and upper excursion sets of a target function. We investigate their connection to multiple hypothesis tests controlling the familywise error rate in the strong sense and show that they grant a unifying perspective on several statistical inference tools such as simultaneous confidence bands, quantification of uncertainties in level set estimation, for example, CoPE sets, and multiple hypothesis testing over $S$, for example, finding relevant differences or regions of equivalence within $S$. In particular, our abstract setting allows us to refine and reduce the assumptions in recent articles on CoPE sets and relevance and equivalence testing using the supremums norm.
- [8] arXiv:2308.04916 (replaced) [pdf, ps, html, other]
-
Title: Heavy-tailed Bayesian nonparametric adaptationSubjects: Statistics Theory (math.ST)
We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing procedures, where a form of direct model (or estimator) selection is performed, the method can be seen as performing a soft selection through the prior tail. In Gaussian regression, such heavy tailed priors are shown to lead to (near-)optimal simultaneous adaptation both in the $L^2$- and $L^\infty$-sense. Results are also derived for linear inverse problems, for anisotropic Besov classes, and for certain losses in more general models through the use of tempered posterior distributions. We present numerical simulations corroborating the theory.
- [9] arXiv:2311.11068 (replaced) [pdf, ps, html, other]
-
Title: Multi-block linearized alternating direction method for sparse fused Lasso modeling problemsSubjects: Statistics Theory (math.ST)
In many statistical modeling problems, such as classification and regression, it is common to encounter sparse and blocky coefficients. Sparse fused Lasso is specifically designed to recover these sparse and blocky structured features, especially in cases where the design matrix has ultrahigh dimensions, meaning that the number of features significantly surpasses the number of samples. Quantile loss is a well-known robust loss function that is widely used in statistical modeling. In this paper, we propose a new sparse fused lasso classification model, and develop a unified multi-block linearized alternating direction method of multipliers algorithm that effectively selects sparse and blocky features for regression and classification. Our algorithm has been proven to converge with a derived linear convergence rate. Additionally, our algorithm has a significant advantage over existing methods for solving ultrahigh dimensional sparse fused Lasso regression and classification models due to its lower time complexity. Note that the algorithm can be easily extended to solve various existing fused Lasso models. Finally, we present numerical results for several synthetic and real-world examples, which demonstrate the robustness, scalability, and accuracy of the proposed classification model and algorithm
- [10] arXiv:2404.09605 (replaced) [pdf, ps, html, other]
-
Title: Finite-sample expansions for the optimal error probability in asymmetric binary hypothesis testingSubjects: Information Theory (cs.IT); Statistics Theory (math.ST)
The problem of binary hypothesis testing between two probability measures is considered. New sharp bounds are derived for the best achievable error probability of such tests based on independent and identically distributed observations. Specifically, the asymmetric version of the problem is examined, where different requirements are placed on the two error probabilities. Accurate nonasymptotic expansions with explicit constants are obtained for the error probability, using tools from large deviations and Gaussian approximation. Examples are shown indicating that, in the asymmetric regime, the approximations suggested by the new bounds are significantly more accurate than the approximations provided by either of the two main earlier approaches -- normal approximation and error exponents.
- [11] arXiv:2405.10289 (replaced) [pdf, ps, html, other]
-
Title: Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates GuaranteesComments: This revision adds Lemma 1 and corrects several typosSubjects: Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
In nonsmooth, nonconvex stochastic optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of sample average approximations of risk as they approach the population risk. Yet, characterizing this convergence remains a fundamental challenge.
This work introduces a novel perspective by connecting the uniform convergence of subdifferential mappings to that of subgradient mappings as empirical risk converges to the population risk. We prove that, for stochastic weakly-convex objectives, and within any open set, a uniform bound on the convergence of subgradients -- chosen arbitrarily from the corresponding subdifferential sets -- translates to a uniform bound on the convergence of the subdifferential sets itself, measured by the Hausdorff metric.
Using this technique, we derive uniform convergence rates for subdifferential sets of stochastic convex-composite objectives. Our results do not rely on key distributional assumptions in the literature, which require the population and finite sample subdifferentials to be continuous in the Hausdorff metric, yet still provide tight convergence rates. These guarantees lead to new insights into the nonsmooth landscapes of such objectives within finite samples.