Research Projects

Current work

Selective inference

Principal component analysis identifies directions of maximum variation in the data. We provide inference guarantees on the extent of this variation, including when the number of principal components is selected using data-driven methods such as the “elbow rule”.

[Paper, Code]: Ronan Perry, Snigdha Panigrahi, Jacob Bien, and Daniela Witten. “Inference on the proportion of variance explained in principal component analysis”. Journal of the American Statistical Association (2025).

Numerous selective inference methods symmetrically widen classical confidence intervals to provide valid inference. We demonstrate that this approach is sub-optimal compared to various conditional approaches.

[Paper, Code]: Ronan Perry, Zichun Xu, Olivia McGough, and Daniela Witten. “Infer-and-widen, or not?”. arXiv (2025).

Miscellaneous

Investigators often wish to understand the robustness of their analyses to unobserved variables. Here, we provide insight on when and how an unobserved variable can make your previously insignificant result significant.

[Paper]: Danielle Tsao, Ronan Perry, and Carlos Cinelli. “On the minimum strength of (unobserved) covariates to overturn an insignificant result”. arXiv (2024).

Prior work

Causal discovery from multi-environment data

Causal graphs are typically identifiable only up to an equivalence class under i.i.d. data. We prove non-parameteric identifiability from heterogeneous data with natural (unknown) distribution shifts if causal mechanism shfits are sparse.

[Paper, code] Ronan Perry, Julius von Kügelgen, and Bernhard Schölkopf. “Causal Discovery in Heterogeneous Environments under the Sparse Mechanism Shift Hypothesis”. Conference and Workshop on Neural Information Processing Systems (NeurIPS) (2022)

Random forests

Posterior probabilities from machine learning classifies are typically overconfidant. We study multiple calibration approaches to the random forest classifier across OpenML-CC18 datasets, in particular honest random forests for which we provide multiclass consistency guarantees and applications to high-dimensional hypothesis testing via mutual information estimation.

[Paper, code] Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, and Joshua T Vogelstein. “Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities”. arXiv preprint arXiv:1907.00325 (2021).
[Package] A scikit-learn compliant Python package for honest random forests.

Although random forest classifiers are extremely successful for tabular data, they are not state of the art for structured data. We develop a random forest algorithm better-suited for such data as images and time series by using structured projections of features which take into account the data geometry.

[Paper, code] Adam Li, Ronan Perry, Chester Huynh, Tyler M. Tomita, Ronak Mehta, Jesus Arroyo, Jesse Patsolic, Benjamin Falk, and Joshua T. Vogelstein. “Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks”. SIAM Journal on Mathematics of Data Science (SIMODS) (2022).

fMRI data analysis

Neuroscience collaborators wished to determine if there existed any differences between novice and expert meditators across meditation tasks and resting state. We provided (i) computationally efficient dimensionality reduction approaches via generalized CCA to reduce the spatial time series to interpretable spatial gradients (ii) high dimensional distance correlation hypothesis tests with novel permutation strategies to account for implicit multilevel dependencies between scans of the same subject.

[Poster, code]: Organization for Human Brain Mapping (OHBM), 2020
[Talk]: Neuromatch 3.0, 2020
[Collaborator’s science paper]: Biological Psychiatry Global Open Science, 2024
[Methods paper, Github]: Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E Priebe, and Joshua T Vogelstein. “Universally Consistent K-Sample Tests via Dependence Measures”. Stat Probab Lett (2025)

As part of this project, we realized there was no existing reliable code for the multiview methods we needed to use. So, we developed an open-source Python package for multiview machine learning methods, featuring a unified API and easy integration with scikit-learn.

[Paper, Webpage, Github] Ronan Perry, Gavin Mischler, Richard Guo, Theodore Lee, Alexander Chang, Arman Koul, Cameron Franz, Hugo Richard, Iain Carmichael, Pierre Ablin, et al. “mvlearn: Multiview Machine Learning in Python”. Journal of Machine Learning Research 22.109 (2021), pp. 1–7.

University projects

Also, for fun, check out some of the interesting projects completed as part of my university classes.

Ronan Perry