Journal articles

* Student or intern collaborator; + corresponding author

  1. H. Zhao* and S. Yang (2022). Outcome-adjusted balance measure for generalized propensity score model selection. Journal of Statistical Planning and Inference, accepted. [arxiv]

    ** Winner of the 2021 DISS Best Poster Award

  2. Z. Jiang, S. Yang, and P. Ding (2022). Multiply robust estimation of causal effects under principal ignorability. Journal of the Royal Statistical Society: Series B, doi:10.1111/rssb.12538. [arxiv]
  3. C. Gao*, K. J. Thompson, S. Yang and J. K. Kim (2022). Nearest neighbor ratio imputation with incomplete multinomial outcome in survey sampling. Journal of the Royal Statistical Society: Series A, doi:10.1111/rssa.12841.
  4. J.Y. Wang, R. Wong, S. Yang, and G. Chan (2022). Estimation of Partially Conditional Average Treatment Effect by Hybrid Kernel-covariate Balancing. Electronic Journal of Statistics, accepted. [arxiv]
  5. D. Johnson*, K. Pieper, and S. Yang+ (2022). Treatment-specific Marginal Structural Cox Model for the Effect of Treatment Discontinuation. Pharmaceutical Statistics, doi:10.1002/pst.2211.
  6. J. W. Yu, D. Bandyopadhyay, S. Yang, L. Kang, and G. Gupta (2022). Propensity score modeling in electronic health records with time-to-event endpoints: application to kidney transplantation. Journal of Data Science, accepted.
  7. M.Y. Huang and S. Yang+ (2022). Robust inference of conditional average treatment effects using dimension reduction. Statistica Sinica, 32, 547-567. [arxiv]
  8. A. Larsen*, S. Yang, A. Rappold, and B. Reich (2022). A spatial causal analysis of wildland fire-contributed PM2.5 using numerical model output. Annuals of Applied Statistics, accepted. [arxiv]
  9. L. Wu* and S. Yang+ (2022). Integrative R-learner of heterogeneous treatment effects combining experimental and observational studies. Proceedings of the 1st Conference on Causal Learning and Reasoning, PMLR, 140, 1–S5.
  10. D. Kong, S. Yang, and L. Wang (2022). Identifiability of causal effects with multiple causes and a binary outcome. Biometrika, 109, 265–272. doi:10.1093/biomet/asab016. [arxiv]
  11. B. J. Reich, S. Yang, and Y. Guan (2022). Discussion on “Spatial+: a novel approach to spatial confounding” by Dupont, Wood and Augustin, Biometrics,
  12. N. Corder* and S. Yang+ (2022). Utilizing stratified generalized propensity score matching to approximate blocked trial designs with multiple treatment levels. Journal of Biopharmaceutical Statistics, doi:10.1080/10543406.2022.2065507.
  13. S. Yang and Y. Zhang (2022). Multiply robust matching estimators of average and quantile treatment effects. Scandinavian Journal of Statistics, doi:10.1111/sjos.12585. [arxiv]
  14. Y. Zhang*, S. Yang, W. Ye, Douglas E. Faries, I. Lipkovich, Z. Kadziola (2021). Best practices of double score matching for estimating causal effects, Statistics in Medicine, 42, 1421–1445.
  15. D. Lee*, S. Yang+, L. Dong, X. Wang, D. Zeng, J.W. Cai (2021). Improving trial generalizability using observational studies, Biometrics, doi:10.1111/biom.13609. [arxiv]

    ** Winner of the 2020 ENAR Distinguished Student Paper Competition Award

  16. S. Yang (2021). Semiparametric efficient estimation of structural nested mean models with irregularly spaced observations. Biometrics, [arxiv]
  17. B. J. Reich, S. Yang, Y. Guan, A. B. Giffin, M. J. Miller and A. G. Rappold (2021). A review of spatial causal inference methods for environmental and epidemiological applications. International Statistical Review, 89, 605-634. [arxiv]
  18. S. Yang, J. K. Kim, and Youngdeok Hwang (2021). Integration of data from probability surveys and big found data for finite population inference using mass imputation. Survey Methodology, 47, 29–58.
  19. F. Cools, D. Johnson, A. J. Camm, J. P. Bassand, F. Verheugt, S. Yang, A. Tsiatis, D. A. Fitzmaurice, S. Z. Goldhaber, G. Kayani, S. Goto, S. Haas, F. Misselwitz, A. Turpie, K. Fox, K. Pieper, A. K. Kakkar (2021). Risks associated with discontinuation of oral anticoagulation in newly diagnosed patients with atrial fibrillation: results from the GARFIELD-AR Registry. Journal of Thrombosis and Hemostasis, doi:10.1111/jth.15415. (Collaboration work)
  20. S. Yang, Y. Zhang, G. Liu, and Q. Guan (2021). SMIM: a unified framework of Survival sensitivity analysis using Multiple Imputation and Martingale. Biometrics, 10.1111/biom.13555. [arxiv]
  21. S. Yang, J. K. Kim, and R. Song (2020). Doubly robust inference when combining probability and non-probability samples with high-dimensional data, Journal of the Royal Statistical Society: Series B, 82, 445–465.
  22. S. Yang, K. Pieper, and F. Cools (2020). Semiparametric estimation of structural failure time model in continuous-time processes, Biometrika, 107, 123-136.
  23. N. Corder* and S. Yang (2020). Estimating average treatment effects utilizing fractional imputation when confounders are subject to missingness, Journal of Causal Inference, 8, 249-271.
  24. L. Dong*, E. Laber, Y. Goldberg, R. Song, S. Yang (2020). Ascertaining properties of weighting in the estimation of optimal treatment regimes under monotone missingness, Statistics in Medicine, doi: 10.1002/sim.8678.
  25. S. Yang and P. Ding (2020). Combining multiple observational data sources to estimate causal effects, Journal of American Statistical Association, 115, 1540–1554.
  26. S. Yang and J. K. Kim (2020). Statistical data integration in survey sampling: a review, Japanese Journal of Statistics and Data Science, 3, 625–650.
  27. W. Li*, S. Yang,+ and P. Han (2020). Robust estimation for moment condition models with data missing not at random, Journal of Statistical Planning and Inference, 207, 246–254.
  28. S. Yang and J. K. Kim (2020). Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework. Scandinavian Journal of Statistics, 47, 839–861.
  29. S. Chen, S. Yang, and J.K. Kim (2020). Nonparametric mass imputation for data integration. Journal of Survey Statistics and Methodology,
  30. S. Yang (2019). Book reviews: Flexible imputation of missing data, 2nd ed. Journal of American Statistical Association, 114, 1421–1421.
  31. S. Yang, L. Wang, and P. Ding (2019). Causal inference with confounders missing not at random, Biometrika, 106, 875–888.
  32. S. Yang and D. Zeng (2018). Discussion on penalized spline of propensity methods for treatment comparison by Zhou, Elliott and Little, Journal of American Statistical Association, 114, 30–32.
  33. S. Yang  and J. J. Lok (2018). Sensitivity analysis for unmeasured confounding in coarse structural nested mean models, Statistica Sinica, 28, 1703–1723.
  34. S. Yang (2018). Propensity score weighting for causal inference with clustered data, Journal of Causal Inference,
  35. S. Yang and J. K. Kim (2018). Nearest neighbor imputation for general parameter estimation in survey sampling, Advances in Econometrics, 39, 211–236.
  36. S. Yang and P. Ding (2018). Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores, Biometrika, 105, 487–493.
  37. Z. Wang, J. K. Kim, and S. Yang (2018). An approximate Bayesian inference under informative sampling, Biometrika, 105, 91–102.
  38. J. Lok, S. Yang, B.Sharkey, Hughes, M (2018). Estimation of the cumulative incidence function under multiple dependent and independent censoring mechanisms, Lifetime Data Analysis, 24, 201–223.
  39. S. Yang, A. A. Tsiatis, and M. Blazing (2018). Modeling survival distribution as a function of time to treatment discontinuation: a dynamic treatment regime approach, Biometrics, 74, 900–909.
  40. S. Yang and J. K. Kim (2017). A semiparametric inference to regression analysis with missing covariates in survey data, Statistica Sinica, 27, 261–285.
  41. J. K. Kim and S. Yang (2017). A note on multiple imputation under complex sampling, Biometrika, 104, 221–228.
  42. S. Yang  and J. K. Kim (2017). Discussion: dissecting multiple imputation from a multi-phase inference perspective: what happens when god’s, imputer’s and analyst’s models are uncongenial? by X. Xie and X. L. Meng, Statistica Sinica, 27, 1568–1573.
  43. S. Yang, and J. J. Lok (2016). A goodness-of-fit test for structural nested mean models, Biometrika, 103, 734–741.
  44. S. Yang, and J. K. Kim (2016). Fractional imputation in survey sampling: a comparative review, Statistical Science, 31, 415–432.
  45. S. Yang, G. Imbens, Z. Cui, D. Faries and Z. Kadziola (2016), Propensity score matching and stratification in observational studies with multi-level treatments, Biometrics, 72, 1055–1065. With R package available “multilevelMatching“.
  46. S. Yang and J. K. Kim (2016). A note on multiple imputation for method of moments estimation, Biometrika103, 244–251.
  47. S. Yang and J. K. Kim (2015). Likelihood-based inference with missing data under missing-at-random, Scandinavian Journal of Statistics, 43, 436–454.

    ** Winner of the 2014 JSM Student Paper Competition Award

  48. L. Peyer, G. Welk, L. B. Davis, S. Yang, and J. K. Kim (2015). Factors associated with parent concern for child weight and parenting behaviors, Childhood Obesity, 11, 269–274. (Collaboration work)
  49. S. Yang and Z. Zhu (2015). Variance estimation and kriging prediction for a class of non-stationary spatial models, Statistica Sinica,25, 135–149.
  50. J. K. Kim and S. Yang (2014). Fractional hot deck imputation for robust estimation under item nonresponse in survey sampling, Survey Methodology40, 211–230.
  51. J. K. Kim, Z. Zhu, and S. Yang (2013). Improved estimation for June Area Survey incorporating several information, Proceedings 59th ISI World Statistics Congress, Hong Kong, China, 199–204.
  52. S. Yang, J. K. Kim and D. W. Shin (2013). Imputation methods for quantile estimation under missing at random, Statistics and Its Interface6, 369–377.
  53. S. Yang, J. K. Kim and Z. Zhu (2013). Parametric fractional imputation for mixed models with nonignorable missing data, Statistics and Its Interface6, 339–347.


Technical Reports 


  1. P. Sang, D. Kong, and S. Yang+. Functional principal component analysis for longitudinal observations with sampling at random. [arxiv]
  2. S. Yang, D. Zeng, X. Wang. Improved Inference for Heterogeneous Treatment Effects Using Real-World Data Subject to Hidden Confounding. [arxiv]
  3. S. Yang, C. Gao*, X. Wang, and D. Zeng. Elastic integrative analysis of randomized trial and real-world data for treatment heterogeneity estimation. [arxiv]
  4. B. Colnet, I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.P. Vert, J. Josse+, S. Yang+. Causal inference methods for combining randomized trials and observational studies: a review. [arxiv]
  5. Y. Guan, G. L. Page, B. J. Reich, M. Ventrucci and Shu Yang. A spectral adjustment for spatial confounding. [arxiv]
  6. A. B. Giffin*, B. J. Reich , S. Yang+, and A. Rappold. Generalized propensity score approach to causal inference with spatial interference. [arxiv]

    ** Winner of the 2021 ENAR Distinguished Student Paper Competition Award

  7. A. B. Giffin*, B. J. Reich, S. Yang, and A. Rappold. Instrumental variables, spatial confounding and interference. [arxiv]
  8. A. B. Giffin*, W. Gong, S. Majumder, A. Rappold, B. J. Reich, and S. Yang. Estimating intervention effects on infectious disease control: the effect of community mobility reduction on Coronavirus spread. [arxiv]
  9. L. Wu*, S. Yang, B. J. Reich, and A. Rappold. Estimating spatially varying health effects in app-based citizen science research. [arxiv]

    ** Winner of the 2021 ASA Section on Statistics in Epidemiology Young Investigator Award

    ** Winner of the IMB Student Research Award from the 34th New England Statistics Symposium

  10. L. Wu*, and S. Yang. Transfer learning of individualized treatment rules from experimental to real-world data. [arxiv]
  11. Q. Guan* and S. Yang+. A unified framework for causal inference with multiple imputation using martingale. [arxiv]
  12. S. Tang*, S. Yang+, T. Wang, Z. Cui. L. Li, D. Faries. Causal inference of hazard ratio based on propensity score matching. [arxiv]

    ** Winner of the 2021 ENAR Distinguished Student Paper Competition Award

  13. X. Mao, Z. Wang, and S. Yang. Matrix completion for survey data prediction with multivariate missingness. [arxiv]
  14. S. Yang and Z. Zhu. Semiparametric estimation of spectral density and variogram with irregular observations. [arxiv]
  15. E. Cho* and S. Yang. Variable selection for doubly robust causal inference.
  16. C. Gao* and S. Yang. Pretest estimation in combining probability and non-probability samples.
  17. C. Gao*, S. Yang+, and J. K. Kim. Soft calibration for correcting selection bias under mixed-effects models. [arxiv]
  18. C. Gao*, P. Acharya, P. Shi, S. Yang, and A. Zhang. Self-supervised Single Image Denoising via Low-rank Tensor Approximated Convolutional Neural Network.
  19. H. Zhao*, X. Wang and S. Yang. Double score matching in observational studies with multi-level treatments.
  20. S. Liu*, S. Yang+, Y. Zhang, and G. Liu. Sensitivity analysis in longitudinal clinical trials via distributional imputation.[arxiv]
  21. S. Liu*, S. Yang+, Y. Zhang, and G. Liu. Multiply robust estimators in longitudinal studies with missing data under control-based imputation. [arxiv]
  22. S. Liu*, Y. Zhang, G. T. Golm, G. Liu, and S. Yang+.Robust analyses for longitudinal clinical trials with dropouts and non-normal continuous outcomes. [arxiv]
  23. S. Xu*, S. Yang, B. J. Reich. A Bayesian non-parametric method for estimating causal quantile effects.
  24. M. Yu*, W. Lu, S. Yang, and P. Ghosh. Multiplicative structural nested mean model for zero-inflated outcomes.
  25. J. Chu*, W. Lu, and S. Yang+. Targeted optimal treatment regime learning using summary statistics. [arxiv]
  26. D. Lee*, S. Yang+, and X. Wang. Generalizable survival analysis of randomized controlled trials with observational studies. [arxiv]




  • S. Yang (2014). Fractional imputation method of handling missing data and spatial statistics. Iowa State University. [Link]