Journal articles

* Student or intern collaborator; + corresponding author

  1. L. Wu*, C. Gao*, S. Yang+, B. J. Reich, and A. Rappold (2024). Estimating spatially varying health effects in app-based citizen science research. Journal of the Royal Statistical Society: Series C [arxiv]

    ** Winner of the 2021 ASA Section on Statistics in Epidemiology Young Investigator Award

    ** Winner of the IMB Student Research Award from the 34th New England Statistics Symposium

  2. J. Coulombe and S. Yang (2024). Multiply robust estimation of marginal structural models in observational studies subject to covariate-driven observations. Biometrics [arxiv]
  3. C. Gao*, Z. Zhang, and S. Yang+ (2024). Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model. ICML [arxiv] [code]
  4. Y. Cheng* and S. Yang (2024). Inference for Optimal Linear Treatment Regimes in Personalized Decision-making. UAI (40th Conference on Uncertainty in Artificial Intelligence) [arxiv]

    ** Selected as an oral presentation

  5. T. Wang*, H. Zhao*, S. Yang+, S. Tang, Z. Cui. L. Li, D. Faries (2024). Propensity score matching for estimating a marginal hazard ratio. Statistics in Medicine, doi:10.1002/sim.10103. [arxiv] [code]

    ** Winner of the 2021 ENAR Distinguished Student Paper Competition Award

  6. S. Fairfax* and S. Yang (2024). Distributional imputation for the analysis of censored recurrent events. Statistics in Medicine, 43, 2622–2640. [code]

    ** Winner of the 2023 JSM Poster Award Competition Honorable Mention

  7. B. Colnet, I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.P. Vert, J. Josse+, S. Yang+ (2024). Causal inference methods for combining randomized trials and observational studies: a review. Statistical Science, 1, 165–191. [arxiv]
  8. D. Gill, S. Lester, C. Free, A. Pfaff, E. Lversen, B. Reich, S. Yang, et al (2024). A diverse portfolio of marine protected areas can better advance global conservation and equity. Proceedings of National Academy of Sciences, 10.1073/pnas.2313205121.
  9. S. Yang+ and X. Zhang (2024). Response to comment on “Transporting survival of an HIV clinical trial to the external target populations by Lee et al. (2024)”. Journal of Biopharmaceutical Statistics,
  10. D. Lee*, C. Gao*, S. Ghosh, and S. Yang+ (2024). Transporting survival of an HIV clinical trial to the external target populations. Journal of Biopharmaceutical Statistics, [arxiv] [code]
  11. D. Lee*, S. Yang, M. Berry, T. Stinchcombe, H. Cohen, and X. Wang (2024). genRCT: A Statistical Analysis Framework for Generalizing RCT Findings to Real-World Population. Journal of Biopharmaceutical Statistics, [code]
  12. X. Mao, H. Wang, Z. Wang, and S. Yang (2024). Mixed dataframe matrix completion in survey under heterogeneous missingness. Journal of Computation and Graphical Statistics, [arxiv]
  13. P. Zhao, J. Josse, A. Chambaz, and S. Yang (2024). Positivity-free Policy Learning with Observational Data. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTAT), PMLR, 238:1918-192. [arxiv] [code]

    ** Top 1% selected as an oral presentation

  14. S. Liu*, S. Yang+, Y. Zhang, and G. Liu (2024). Multiply robust estimators in longitudinal studies with missing data under control-based imputation. Biometrics, 80, ujad036. [arxiv]
  15. ** Winner of the 2023 ASA BIOP RISW Student Travel Award

    ** Winner of the 2024 ENAR RAB Student Poster Award Competition

  16. Q. Guan* and S. Yang+ (2024). A unified framework for causal inference with multiple imputation using martingales. Statistica Sinica, 34, 1649-1673. [arxiv]
  17. S. Yang, C. Gao*, X. Wang, and D. Zeng (2023). Elastic integrative analysis of randomized trial and real-world data for treatment heterogeneity estimation. Journal of the Royal Statistical Society: Series B, 85, 575-596. [arxiv]
  18. C. Gao*, S. Yang+, and J. K. Kim (2023). Soft calibration for correcting selection bias under mixed-effects models. Biometrika, 110, 897–911. [arxiv]
  19. C. Gao* and S. Yang (2023). Pretest estimation in combining probability and non-probability samples. Electronic Journal of Statistics, 17, 1492–1546. [arxiv]
  20. J. Chu*, S. Yang, and W. Lu (2023). Multiply robust off-policy evaluation and learning under truncation by death, Proceeding of the 40th (ICML) International Conference on Machine Learning, PMLR, 202, 6195–6227.
  21. J. Chu*, W. Lu, and S. Yang+ (2023). Targeted optimal treatment regime learning using summary statistics. Biometrika, 110, 913–931. [arxiv]
  22. Y. Guan, G. L. Page, B. J. Reich, M. Ventrucci, and S. Yang (2023). A spectral adjustment for spatial confounding. Biometrika, 110, 699–719. [arxiv]
  23. M. Yu*, W. Lu, S. Yang, and P. Ghosh (2023). Multiplicative structural nested mean model for zero-inflated outcomes. Biometrika, 110, 519–536.
  24. Y. Cheng*, L. Wu, and S. Yang (2023). Enhancing treatment effect estimation: a model robust approach integrating randomized experiments and external controls using the double penalty integration estimator. UAI (39th Conference on Uncertainty in Artificial Intelligence) 2023. [arxiv]
  25. S. Yang, Y. Zhang, G. Liu, and Q. Guan (2023). SMIM: a unified framework of Survival sensitivity analysis using Multiple Imputation and Martingale. Biometrics, 79, 230–240. [arxiv]
  26. S. Liu*, Y. Zhang, G. T. Golm, G. Liu, and S. Yang+ (2023). Robust analyses for longitudinal clinical trials with dropouts and non-normal continuous outcomes. Statistical Theory and Related Fields, 8, 1-14. [arxiv]
  27. S. Liu*, S. Yang+, Y. Zhang, and G. Liu (2023). Sensitivity analysis in longitudinal clinical trials via distributional imputation. Statistical Methods in Medical Research, 32, 181–194. [link] [arxiv]
  28. S. Yang and Y. Zhang (2023). Multiply robust matching estimators of average and quantile treatment effects. Scandinavian Journal of Statistics, 50, 235–265. [arxiv]
  29. L. Wu*, and S. Yang (2023). Transfer learning of individualized treatment rules from experimental to real-world data. Journal of Computation and Graphical Statistics, 32, 1036–1045. [arxiv]
  30. E. Cho* and S. Yang (2023). Variable selection for doubly robust causal inference. Statistics and Its Interface, accepted. [arxiv]
  31. D. Kong, S. Yang, and L. Wang (2022). Identifiability of causal effects with multiple causes and a binary outcome. Biometrika, 109, 265–272. doi:10.1093/biomet/asab016. [arxiv]
  32. Z. Jiang, S. Yang, and P. Ding (2022). Multiply robust estimation of causal effects under principal ignorability. Journal of the Royal Statistical Society: Series B, 84, 1423–1445. [arxiv] [code]
  33. D. Lee*, S. Yang+, L. Dong, X. Wang, D. Zeng, J.W. Cai (2023). Improving trial generalizability using observational studies, Biometrics, 79, 1213–1225. [arxiv] [code]

    ** Winner of the 2020 ENAR Distinguished Student Paper Competition Award

  34. D. Lee*, S. Yang+, and X. Wang (2022). Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population. Journal of Causal Inference, 10, 415-440. [arxiv] [code]
  35. S. Yang and X. Wang (2022). RWD-integrated randomized clinical trial analysis. 2022 ASA Biopharmaceutical Report Real World Evidence (Editors: Herbert Pang, Ling Wang, Kristi L. Griffiths), 29, 15–21.
  36. X. Mao, Z. Wang, and S. Yang (2022). Matrix completion under complex survey sampling. Annals of the Institute of Statistical Mathematics, doi:10.1007/s10463-022-00851-5. [arxiv]
  37. C. Gao*, K. J. Thompson, S. Yang and J. K. Kim (2022). Nearest neighbor ratio imputation with incomplete multinomial outcome in survey sampling. Journal of the Royal Statistical Society: Series A, 185, 1903-1930.
  38. J.Y. Wang, R. Wong, S. Yang, and G. Chan (2022). Estimation of Partially Conditional Average Treatment Effect by Hybrid Kernel-covariate Balancing. Electronic Journal of Statistics, [arxiv]
  39. A. B. Giffin*, B. J. Reich , S. Yang, and A. Rappold (2022). Generalized propensity score approach to causal inference with spatial interference. Biometrics, 79, 2220–2231. [arxiv]

    ** Winner of the 2021 ENAR Distinguished Student Paper Competition Award

  40. A. B. Giffin*, W. Gong, S. Majumder, A. Rappold, B. J. Reich, and S. Yang (2022). Estimating intervention effects on infectious disease control: the effect of community mobility reduction on Coronavirus spread. Spatial Statistics, 52, 100711. [arxiv]
  41. H. Zhao*, X. Zhang and S. Yang (2022). Double score matching in observational studies with multi-level treatments. Communications in Statistics – Simulation and Computation,
  42. H. Zhao* and S. Yang (2022). Outcome-adjusted balance measure for generalized propensity score model selection. Journal of Statistical Planning and Inference, 221, 188–200. [arxiv]

    ** Winner of the 2021 DISS Best Poster Award

  43. D. Johnson*, K. Pieper, and S. Yang+ (2022). Treatment-specific Marginal Structural Cox Model for the Effect of Treatment Discontinuation. Pharmaceutical Statistics, 21, 988-1004.
  44. J. W. Yu, D. Bandyopadhyay, S. Yang, L. Kang, and G. Gupta (2022). Propensity score modeling in electronic health records with time-to-event endpoints: application to kidney transplantation. Journal of Data Science, 20, 188–208.
  45. M.Y. Huang and S. Yang+ (2022). Robust inference of conditional average treatment effects using dimension reduction. Statistica Sinica, 32, 547-567. [arxiv]
  46. A. Larsen*, S. Yang, A. Rappold, and B. Reich (2022). A spatial causal analysis of wildland fire-contributed PM2.5 using numerical model output. Annuals of Applied Statistics, 16, 2714-2731. [arxiv]
  47. L. Wu* and S. Yang+ (2022). Integrative R-learner of heterogeneous treatment effects combining experimental and observational studies. Proceedings of the 1st Conference on Causal Learning and Reasoning, PMLR, 140, 1–S5.
  48. B. J. Reich, S. Yang, and Y. Guan (2022). Discussion on “Spatial+: a novel approach to spatial confounding” by Dupont, Wood and Augustin, Biometrics,
  49. N. Corder* and S. Yang+ (2022). Utilizing stratified generalized propensity score matching to approximate blocked trial designs with multiple treatment levels. Journal of Biopharmaceutical Statistics, doi:10.1080/10543406.2022.2065507.
  50. Y. Zhang*, S. Yang, W. Ye, Douglas E. Faries, I. Lipkovich, Z. Kadziola (2021). Best practices of double score matching for estimating causal effects, Statistics in Medicine, 42, 1421–1445.
  51. S. Yang (2022). Semiparametric efficient estimation of structural nested mean models with irregularly spaced observations. Biometrics, 78, 937–949. [arxiv]
  52. B. J. Reich, S. Yang, Y. Guan, A. B. Giffin, M. J. Miller and A. G. Rappold (2021). A review of spatial causal inference methods for environmental and epidemiological applications. International Statistical Review, 89, 605-634. [arxiv]
  53. S. Yang, J. K. Kim, and Youngdeok Hwang (2021). Integration of data from probability surveys and big found data for finite population inference using mass imputation. Survey Methodology, 47, 29–58.
  54. F. Cools, D. Johnson, A. J. Camm, J. P. Bassand, F. Verheugt, S. Yang, A. Tsiatis, D. A. Fitzmaurice, S. Z. Goldhaber, G. Kayani, S. Goto, S. Haas, F. Misselwitz, A. Turpie, K. Fox, K. Pieper, A. K. Kakkar (2021). Risks associated with discontinuation of oral anticoagulation in newly diagnosed patients with atrial fibrillation: results from the GARFIELD-AR Registry. Journal of Thrombosis and Hemostasis, doi:10.1111/jth.15415. (Collaboration work)
  55. S. Yang, J. K. Kim, and R. Song (2020). Doubly robust inference when combining probability and non-probability samples with high-dimensional data, Journal of the Royal Statistical Society: Series B, 82, 445–465.
  56. S. Yang, K. Pieper, and F. Cools (2020). Semiparametric estimation of structural failure time model in continuous-time processes, Biometrika, 107, 123-136.
  57. N. Corder* and S. Yang (2020). Estimating average treatment effects utilizing fractional imputation when confounders are subject to missingness, Journal of Causal Inference, 8, 249-271.
  58. L. Dong*, E. Laber, Y. Goldberg, R. Song, S. Yang (2020). Ascertaining properties of weighting in the estimation of optimal treatment regimes under monotone missingness, Statistics in Medicine, doi: 10.1002/sim.8678.
  59. S. Yang and P. Ding (2020). Combining multiple observational data sources to estimate causal effects, Journal of American Statistical Association, 115, 1540–1554.
  60. S. Yang and J. K. Kim (2020). Statistical data integration in survey sampling: a review, Japanese Journal of Statistics and Data Science, 3, 625–650.
  61. W. Li*, S. Yang,+ and P. Han (2020). Robust estimation for moment condition models with data missing not at random, Journal of Statistical Planning and Inference, 207, 246–254.
  62. S. Yang and J. K. Kim (2020). Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework. Scandinavian Journal of Statistics, 47, 839–861.
  63. S. Chen, S. Yang, and J.K. Kim (2020). Nonparametric mass imputation for data integration. Journal of Survey Statistics and Methodology,
  64. S. Yang (2019). Book reviews: Flexible imputation of missing data, 2nd ed. Journal of American Statistical Association, 114, 1421–1421.
  65. S. Yang, L. Wang, and P. Ding (2019). Causal inference with confounders missing not at random, Biometrika, 106, 875–888.
  66. S. Yang and D. Zeng (2018). Discussion on penalized spline of propensity methods for treatment comparison by Zhou, Elliott and Little, Journal of American Statistical Association, 114, 30–32.
  67. S. Yang  and J. J. Lok (2018). Sensitivity analysis for unmeasured confounding in coarse structural nested mean models, Statistica Sinica, 28, 1703–1723.
  68. S. Yang (2018). Propensity score weighting for causal inference with clustered data, Journal of Causal Inference,
  69. S. Yang and J. K. Kim (2018). Nearest neighbor imputation for general parameter estimation in survey sampling, Advances in Econometrics, 39, 211–236.
  70. S. Yang and P. Ding (2018). Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores, Biometrika, 105, 487–493. [arixv]
  71. Z. Wang, J. K. Kim, and S. Yang (2018). An approximate Bayesian inference under informative sampling, Biometrika, 105, 91–102.
  72. J. Lok, S. Yang, B.Sharkey, Hughes, M (2018). Estimation of the cumulative incidence function under multiple dependent and independent censoring mechanismsLifetime Data Analysis, 24, 201–223.
  73. S. Yang, A. A. Tsiatis, and M. Blazing (2018). Modeling survival distribution as a function of time to treatment discontinuation: a dynamic treatment regime approach, Biometrics, 74, 900–909.
  74. S. Yang and J. K. Kim (2017). A semiparametric inference to regression analysis with missing covariates in survey data, Statistica Sinica, 27, 261–285.
  75. J. K. Kim and S. Yang (2017). A note on multiple imputation under complex sampling, Biometrika, 104, 221–228.
  76. S. Yang  and J. K. Kim (2017). Discussion: dissecting multiple imputation from a multi-phase inference perspective: what happens when god’s, imputer’s and analyst’s models are uncongenial? by X. Xie and X. L. Meng, Statistica Sinica, 27, 1568–1573.
  77. S. Yang, and J. J. Lok (2016). A goodness-of-fit test for structural nested mean models, Biometrika, 103, 734–741.
  78. S. Yang, and J. K. Kim (2016). Fractional imputation in survey sampling: a comparative review, Statistical Science, 31, 415–432.
  79. S. Yang, G. Imbens, Z. Cui, D. Faries and Z. Kadziola (2016), Propensity score matching and stratification in observational studies with multi-level treatmentsBiometrics, 72, 1055–1065. With R package available “multilevelMatching“.
  80. S. Yang and J. K. Kim (2016). A note on multiple imputation for method of moments estimation, Biometrika103, 244–251.
  81. S. Yang and J. K. Kim (2015). Likelihood-based inference with missing data under missing-at-randomScandinavian Journal of Statistics, 43, 436–454.

    ** Winner of the 2014 JSM Student Paper Competition Award

  82. L. Peyer, G. Welk, L. B. Davis, S. Yang, and J. K. Kim (2015). Factors associated with parent concern for child weight and parenting behaviors, Childhood Obesity, 11, 269–274. (Collaboration work)
  83. S. Yang and Z. Zhu (2015). Variance estimation and kriging prediction for a class of non-stationary spatial modelsStatistica Sinica,25, 135–149.
  84. J. K. Kim and S. Yang (2014). Fractional hot deck imputation for robust estimation under item nonresponse in survey samplingSurvey Methodology40, 211–230.
  85. J. K. Kim, Z. Zhu, and S. Yang (2013). Improved estimation for June Area Survey incorporating several information, Proceedings 59th ISI World Statistics Congress, Hong Kong, China, 199–204.
  86. S. Yang, J. K. Kim and D. W. Shin (2013). Imputation methods for quantile estimation under missing at randomStatistics and Its Interface6, 369–377.
  87. S. Yang, J. K. Kim and Z. Zhu (2013). Parametric fractional imputation for mixed models with nonignorable missing dataStatistics and Its Interface6, 339–347.


Technical Reports 


  1. S. Yang, D. Zeng, X. Wang. Improved Inference for Heterogeneous Treatment Effects Using Real-World Data Subject to Hidden Confounding. [arxiv]
  2. P. Sang, D. Kong, and S. Yang+. Functional principal component analysis for longitudinal observations with sampling at random. [arxiv]
  3. A. B. Giffin*, B. J. Reich, S. Yang, and A. Rappold. Instrumental variables, spatial confounding and interference. [arxiv]
  4. S. Yang and Z. Zhu. Semiparametric estimation of spectral density and variogram with irregular observations. [arxiv]
  5. C. Gao*, S. Yang, and A. Zhang. Self-supervised Single Image Denoising via Low-rank Tensor Approximated Convolutional Neural Network. [arxiv]
  6. C. Gao*, S. Yang, M. Shan, W. Ye, I. Lipkovich, and D. Faries. Integrating randomized trial data with external controls: a semiparametric approach with selective borrowing. [arxiv] [code]
  7. ** Winner of the 2024 ICSA Student Paper Award

  8. S. Xu*, S. Yang, B. J. Reich. A Bayesian non-parametric method for estimating causal quantile effects.
  9. X. Tan*, S. Yang, W. Ye, D. E. Faries, I. Lipkovich, Z. Kadziola. When doubly robust methods meet machine learning for estimating treatment effects from real-world data: a comparative study. [arxiv]
  10. B. Smith, S. Yang, A. Apter, and D. Scharfstein. Trials with irregular and informative assessment times: a sensitivity analysis approach. [arxiv]
  11.  T. Hong, W. Lu, S. Yang, and P. Ghosh. Multivariate choice models with irregularly spaced longitudinal observations: application to the lockdown effect on consumer behaviors. [arxiv]
  12. Y. Zhang, D. Kong, and S. Yang+. Towards R-learner of conditional average treatment effects with a continuous treatment: T-identification, estimation, and inference. [arxiv]
  13. ** Winner of the 2023 ASA Section on Nonparametric Statistics Student Paper Award

  14. Y. Zhang and S. Yang+. Semiparametric localized principal stratification analysis with continuous strata. [arxiv]
  15. T. Xu*, Y. Zhang, and S. Yang. Augmented match weighted estimators for average treatment effects.
  16. C. Cui, S. Yang, B. J. Reich, and D. Gill. Matching estimators of causal effects in clustered observational studies with application to quantifying the impact of marine protected areas on biodiversity. [arxiv]
  17. P. Zhao, J. Josse, and S. Yang. Efficient and robust transfer learning of optimal individualized treatment regimes with right-censored survival data.
  18.   D. Faries, C. Gao, X. Zhang, C. Hazlett, J. Stamey, S. Yang, et al. Real effect or bias? Best practices for evaluating the robustness of real-world evidence through quantitative sensitivity analysis for unmeasured confounding. [Authorea]
  19. P. Ding, Y. Fang, D. Faries, S. Gruber, W. He, H. Lee, J.Y. Lee, P. Mishra-Kalyani, M. Shan, M. van der Laan, S. Yang, and X. Zhang (authors listed in an alphabetical order). Sensitivity analysis for unmeasured confounding in medical product development and evaluation using real world evidence. [arxiv]
  20.   Z. Wang, S. Yang and J.K. Kim. Multiple bias-calibration for adjusting selection bias of non-probability samples using data integration. [arxiv]
  21. S. Yang and P. Ding. Two-phase rejective sampling. [arxiv]




  • S. Yang (2014). Fractional imputation method of handling missing data and spatial statistics. Iowa State University. [Link]