Presentation 2-2, 13:35~14:00.

Speaker
Euijoing Song (Seoul National University)

Title
Outlier-Robust Approach for High-dimensional Linear Structural Equation Models

Abstract
This study focuses on learning sub-Gaussian linear structural equation models (SEMs) in high-dimensional and contaminated sample settings. First, it defines a Cellwise-Contamination Linear SEM (CCLSM) to account for outliers. Subsequently, it develops an algorithm for high-dimensional sub-Gaussian CCLSMs, which is an outlier-robust algorithm for high-dimensional sub-Gaussian linear SEMs. Specifically, the method consists of two steps: (1) elementwise ordering estimation and (2) parent estimation. Both problems are addressed using ℓ1-regularized least trimmed squares (LTS): elementwise ordering is determined via the truncated uncertainty score (which can be estimated by a debiased residual variance), while parent estimation is performed directly with ℓ1-regularized LTS. It is proven that not all outliers are influential for each LTS; only some bad samples are. Hence, the proposed method can successfully recover the structure even when all observations on some nodes are outliers, as indicated by its breakdown point. Additionally, it is shown that the number of trimmed samples h = Ω((d + |B|) log p) and the number of samples for the truncated uncertainty score h' = Ω(d2 log p) are sufficient for the proposed algorithm to learn a sub-Gaussian CCLSM, where p is the number of nodes, d is the maximum degree of the moralized graph, and |B| is the maximum number of bad samples. It is demonstrated through various simulated data that the proposed algorithm is statistically consistent for learning the model in high-dimensional and contaminated sample settings. Finally, application to gene expression data demonstrates robustness to contamination.