Research is shown with the most recent projects first. Some of these projects are more theoretical in nature, while others are applied. My undergraduate degree was in statistics with an emphasis in actuarial science. I then decided to pursue a career in data science, coupled with a Master’s in statistics. I graduated with my M.S. and B.S. in Statistics in April 2021.

Chronic Kidney Disease Costs and Progression

Description

Chronic Kidney Disease (CKD) affects many lives and has a large impact on health systems around the world. To better understand and predict costs for insurance plan members with CKD in the United States, we built a new model of their individual costs. Our model is the first to explicitly model both the CKD stage transition process and the distribution of costs given those stages. Additionally, it incorporates numerous covariates and comorbidities. We applied the models to two large and rich datasets, one commercial insurance and the other Medicare fee-for-service, totaling about 40 million beneficiary months. We found that the XGBoost models best predict both stage transitions and costs. If XGBoost models are unavailable, a multivariate logistic regression model with regularization to predict stage and a logit-gamma model of the costs given the stage best predicted the member’s healthcare costs in the next month.

Outcome

This project is done in collaboration with the Society of Actuaries (SOA), Dr. Brian Hartman, Dr. Richard Warr, among others. This paper is currently under review by the North American Actuarial Journal. It is currently available through the SOA website.

Inference in Semi-Markov Models with Panel Data

Description

Semi-Markov processes effectively model waiting times and probabilities for many multi-state scenarios. In practice, data are collected at specific points in time where the stage of a process is observed at regular intervals. Intermittently observed measurements such as these are known as panel data. This project estimates the parameters of a semi-Markov model with panel data. The state-of-the-art technique (proposed by Aralis and Brookmeyer) uses the stochastic expectation-maximization (SEM) algorithm for inference. One setback to this method is the large computational burden of the sampling process. Our goal is to improve upon this method by adjusting the expectation step to avoid such sampling. Key skills in this project include semi-Markov modeling and distributional truncation.

Outcome

This was a Master’s thesis project with Dr. Richard Warr as a mentor and coauthor. I presented this project at BYU’s 2020 and 2021 physical and mathematical Spring Research Conference, winning first place in the 2020 session. The project was approved by a committee of BYU faculty in March 2021.

Error Bounds for Convolutions using the Fast-Fourier Tranform (FFT)

Description

The Laplace transform or moment-generating function of a distribution is not always analytically available. Convolutions in these scenarios can be estimated using the discrete Fourier transform. The computationally fast implementation of this transform (FFT) is widely available. This project specifies calculations for exact error bounds involved in convolutions using the FFT. Key skills involved in this project include Monte Carlo sampling, advanced probability theory, and statistical computing.

Outcome

This project was mentored and coauthored by Dr. Richard Warr. The paper was published by Methodology and Computing in Applied Probability in 2020. I presented this project at BYU’s 2018 physical and mathematical Spring Research Conference, winning first place in the session.