Research Projects

Current work

Selective inference

Inference on PCA does not account for selection of the embedding dimension on the basis of the eigenvalues (e.g. the elbow method), which leads to inflated false positive rates. We develop a selective inference approach to provide valid inference in such settings.

Prior work

Causal discovery from multi-environment data

Causal graphs are typically identifiable only up to an equivalence class under i.i.d. data. We prove non-parameteric identifiability from heterogeneous data with natural (unknown) distribution shifts if causal mechanism shfits are sparse.

Random forests

Posterior probabilities from machine learning classifies are typically overconfidant. We study multiple calibration approaches to the random forest classifier across OpenML-CC18 datasets, in particular honest random forests for which we provide multiclass consistency guarantees and applications to high-dimensional hypothesis testing via mutual information estimation.

Although random forest classifiers are extremely successful for tabular data, they are not state of the art for structured data. We develop a random forest algorithm better-suited for such data as images and time series by using structured projections of features which take into account the data geometry.

fMRI data analyses

Neuroscience collaborators wished to determine if there existed any differences between novice and expert meditators across meditation tasks and resting state. We provided (i) computationally efficient dimensionality reduction approaches via generalized CCA to reduce the spatial time series to interpretable spatial gradients (ii) high dimensional distance correlation hypothesis tests with novel permutation strategies to account for implicit multilevel dependencies between scans of the same subject.

As part of this project, we realized there was no existing reliable code for the multiview methods we needed to use. So, we developed an open-source Python package for multiview machine learning methods, featuring a unified API and easy integration with scikit-learn.

University projects

Also, check out some of the interesting projects completed as part of my university classes.