Journal articles

* Student or intern collaborator; + corresponding author

  1. J. Chu*, W. Lu, and S. Yang+ (2023). Targeted optimal treatment regime learning using summary statistics. Biometrika, doi:10.1093/biomet/asad020. [arxiv]
  2. C. Gao*, S. Yang+, and J. K. Kim (2023). Soft calibration for correcting selection bias under mixed-effects models. Biometrika, doi:10.1093/biomet/asad016. [arxiv]
  3. E. Cho* and S. Yang (2023). Variable selection for doubly robust causal inference. Statistics and Its Interface, accepted. [arxiv]
  4. S. Liu*, S. Yang+, Y. Zhang, and G. Liu (2023). Sensitivity analysis in longitudinal clinical trials via distributional imputation. Statistical Methods in Medical Research, 32, 181–194. [link] [arxiv]
  5. S. Yang and Y. Zhang (2023). Multiply robust matching estimators of average and quantile treatment effects. Scandinavian Journal of Statistics, 50, 235–265. [arxiv]
  6. S. Yang, C. Gao*, X. Wang, and D. Zeng (2022). Elastic integrative analysis of randomized trial and real-world data for treatment heterogeneity estimation. Journal of the Royal Statistical Society: Series B, accepted. [arxiv]
  7. Y. Guan, G. L. Page, B. J. Reich, M. Ventrucci, and Shu Yang (2022). A spectral adjustment for spatial confounding. Biometrika, accepted. [arxiv]
  8. Q. Guan* and S. Yang+ (2022). A unified framework for causal inference with multiple imputation using martingales. Statistica Sinica, doi:10.5705/ss.202021.0404. [arxiv]
  9. D. Lee*, S. Yang+, and X. Wang (2022). Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population. Journal of Causal Inference, 10, 415-440. [arxiv]
  10. S. Yang and X. Wang (2022). RWD-integrated randomized clinical trial analysis. 2022 ASA Biopharmaceutical Report Real World Evidence (Editors: Herbert Pang, Ling Wang, Kristi L. Griffiths), 29, 15–21.
  11. X. Mao, Z. Wang, and S. Yang (2022). Matrix completion under complex survey sampling. Annals of the Institute of Statistical Mathematics, doi:10.1007/s10463-022-00851-5. [arxiv]
  12. M. Yu*, W. Lu, S. Yang, and P. Ghosh (2022). Multiplicative structural nested mean model for zero-inflated outcomes. Biometrika,
  13. D. Kong, S. Yang, and L. Wang (2022). Identifiability of causal effects with multiple causes and a binary outcome. Biometrika, 109, 265–272. doi:10.1093/biomet/asab016. [arxiv]
  14. Z. Jiang, S. Yang, and P. Ding (2022). Multiply robust estimation of causal effects under principal ignorability. Journal of the Royal Statistical Society: Series B, doi:10.1111/rssb.12538. [arxiv]
  15. C. Gao*, K. J. Thompson, S. Yang and J. K. Kim (2022). Nearest neighbor ratio imputation with incomplete multinomial outcome in survey sampling. Journal of the Royal Statistical Society: Series A, 185, 1903-1930.
  16. J.Y. Wang, R. Wong, S. Yang, and G. Chan (2022). Estimation of Partially Conditional Average Treatment Effect by Hybrid Kernel-covariate Balancing. Electronic Journal of Statistics, [arxiv]
  17. A. B. Giffin*, B. J. Reich , S. Yang, and A. Rappold (2022). Generalized propensity score approach to causal inference with spatial interference. Biometrics, doi: 10.1111/biom.13745. [arxiv]

    ** Winner of the 2021 ENAR Distinguished Student Paper Competition Award

  18. A. B. Giffin*, W. Gong, S. Majumder, A. Rappold, B. J. Reich, and S. Yang (2022). Estimating intervention effects on infectious disease control: the effect of community mobility reduction on Coronavirus spread. Spatial Statistics, 52, 100711. [arxiv]
  19. H. Zhao*, X. Zhang and S. Yang (2022). Double score matching in observational studies with multi-level treatments. Communications in Statistics – Simulation and Computation,
  20. H. Zhao* and S. Yang (2022). Outcome-adjusted balance measure for generalized propensity score model selection. Journal of Statistical Planning and Inference, 221, 188–200. [arxiv]

    ** Winner of the 2021 DISS Best Poster Award

  21. D. Johnson*, K. Pieper, and S. Yang+ (2022). Treatment-specific Marginal Structural Cox Model for the Effect of Treatment Discontinuation. Pharmaceutical Statistics, 21, 988-1004.
  22. J. W. Yu, D. Bandyopadhyay, S. Yang, L. Kang, and G. Gupta (2022). Propensity score modeling in electronic health records with time-to-event endpoints: application to kidney transplantation. Journal of Data Science, 20, 188–208.
  23. M.Y. Huang and S. Yang+ (2022). Robust inference of conditional average treatment effects using dimension reduction. Statistica Sinica, 32, 547-567. [arxiv]
  24. A. Larsen*, S. Yang, A. Rappold, and B. Reich (2022). A spatial causal analysis of wildland fire-contributed PM2.5 using numerical model output. Annuals of Applied Statistics, 16, 2714-2731. [arxiv]
  25. L. Wu*, and S. Yang (2022). Transfer learning of individualized treatment rules from experimental to real-world data. Journal of Computation and Graphical Statistics, [arxiv]
  26. L. Wu* and S. Yang+ (2022). Integrative R-learner of heterogeneous treatment effects combining experimental and observational studies. Proceedings of the 1st Conference on Causal Learning and Reasoning, PMLR, 140, 1–S5.
  27. B. J. Reich, S. Yang, and Y. Guan (2022). Discussion on “Spatial+: a novel approach to spatial confounding” by Dupont, Wood and Augustin, Biometrics,
  28. N. Corder* and S. Yang+ (2022). Utilizing stratified generalized propensity score matching to approximate blocked trial designs with multiple treatment levels. Journal of Biopharmaceutical Statistics, doi:10.1080/10543406.2022.2065507.
  29. Y. Zhang*, S. Yang, W. Ye, Douglas E. Faries, I. Lipkovich, Z. Kadziola (2021). Best practices of double score matching for estimating causal effects, Statistics in Medicine, 42, 1421–1445.
  30. D. Lee*, S. Yang+, L. Dong, X. Wang, D. Zeng, J.W. Cai (2021). Improving trial generalizability using observational studies, Biometrics, doi:10.1111/biom.13609. [arxiv]

    ** Winner of the 2020 ENAR Distinguished Student Paper Competition Award

  31. S. Yang (2021). Semiparametric efficient estimation of structural nested mean models with irregularly spaced observations. Biometrics, [arxiv]
  32. B. J. Reich, S. Yang, Y. Guan, A. B. Giffin, M. J. Miller and A. G. Rappold (2021). A review of spatial causal inference methods for environmental and epidemiological applications. International Statistical Review, 89, 605-634. [arxiv]
  33. S. Yang, J. K. Kim, and Youngdeok Hwang (2021). Integration of data from probability surveys and big found data for finite population inference using mass imputation. Survey Methodology, 47, 29–58.
  34. F. Cools, D. Johnson, A. J. Camm, J. P. Bassand, F. Verheugt, S. Yang, A. Tsiatis, D. A. Fitzmaurice, S. Z. Goldhaber, G. Kayani, S. Goto, S. Haas, F. Misselwitz, A. Turpie, K. Fox, K. Pieper, A. K. Kakkar (2021). Risks associated with discontinuation of oral anticoagulation in newly diagnosed patients with atrial fibrillation: results from the GARFIELD-AR Registry. Journal of Thrombosis and Hemostasis, doi:10.1111/jth.15415. (Collaboration work)
  35. S. Yang, Y. Zhang, G. Liu, and Q. Guan (2021). SMIM: a unified framework of Survival sensitivity analysis using Multiple Imputation and Martingale. Biometrics, 10.1111/biom.13555. [arxiv]
  36. S. Yang, J. K. Kim, and R. Song (2020). Doubly robust inference when combining probability and non-probability samples with high-dimensional data, Journal of the Royal Statistical Society: Series B, 82, 445–465.
  37. S. Yang, K. Pieper, and F. Cools (2020). Semiparametric estimation of structural failure time model in continuous-time processes, Biometrika, 107, 123-136.
  38. N. Corder* and S. Yang (2020). Estimating average treatment effects utilizing fractional imputation when confounders are subject to missingness, Journal of Causal Inference, 8, 249-271.
  39. L. Dong*, E. Laber, Y. Goldberg, R. Song, S. Yang (2020). Ascertaining properties of weighting in the estimation of optimal treatment regimes under monotone missingness, Statistics in Medicine, doi: 10.1002/sim.8678.
  40. S. Yang and P. Ding (2020). Combining multiple observational data sources to estimate causal effects, Journal of American Statistical Association, 115, 1540–1554.
  41. S. Yang and J. K. Kim (2020). Statistical data integration in survey sampling: a review, Japanese Journal of Statistics and Data Science, 3, 625–650.
  42. W. Li*, S. Yang,+ and P. Han (2020). Robust estimation for moment condition models with data missing not at random, Journal of Statistical Planning and Inference, 207, 246–254.
  43. S. Yang and J. K. Kim (2020). Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework. Scandinavian Journal of Statistics, 47, 839–861.
  44. S. Chen, S. Yang, and J.K. Kim (2020). Nonparametric mass imputation for data integration. Journal of Survey Statistics and Methodology,
  45. S. Yang (2019). Book reviews: Flexible imputation of missing data, 2nd ed. Journal of American Statistical Association, 114, 1421–1421.
  46. S. Yang, L. Wang, and P. Ding (2019). Causal inference with confounders missing not at random, Biometrika, 106, 875–888.
  47. S. Yang and D. Zeng (2018). Discussion on penalized spline of propensity methods for treatment comparison by Zhou, Elliott and Little, Journal of American Statistical Association, 114, 30–32.
  48. S. Yang  and J. J. Lok (2018). Sensitivity analysis for unmeasured confounding in coarse structural nested mean models, Statistica Sinica, 28, 1703–1723.
  49. S. Yang (2018). Propensity score weighting for causal inference with clustered data, Journal of Causal Inference,
  50. S. Yang and J. K. Kim (2018). Nearest neighbor imputation for general parameter estimation in survey sampling, Advances in Econometrics, 39, 211–236.
  51. S. Yang and P. Ding (2018). Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores, Biometrika, 105, 487–493. [arixv]
  52. Z. Wang, J. K. Kim, and S. Yang (2018). An approximate Bayesian inference under informative sampling, Biometrika, 105, 91–102.
  53. J. Lok, S. Yang, B.Sharkey, Hughes, M (2018). Estimation of the cumulative incidence function under multiple dependent and independent censoring mechanismsLifetime Data Analysis, 24, 201–223.
  54. S. Yang, A. A. Tsiatis, and M. Blazing (2018). Modeling survival distribution as a function of time to treatment discontinuation: a dynamic treatment regime approach, Biometrics, 74, 900–909.
  55. S. Yang and J. K. Kim (2017). A semiparametric inference to regression analysis with missing covariates in survey data, Statistica Sinica, 27, 261–285.
  56. J. K. Kim and S. Yang (2017). A note on multiple imputation under complex sampling, Biometrika, 104, 221–228.
  57. S. Yang  and J. K. Kim (2017). Discussion: dissecting multiple imputation from a multi-phase inference perspective: what happens when god’s, imputer’s and analyst’s models are uncongenial? by X. Xie and X. L. Meng, Statistica Sinica, 27, 1568–1573.
  58. S. Yang, and J. J. Lok (2016). A goodness-of-fit test for structural nested mean models, Biometrika, 103, 734–741.
  59. S. Yang, and J. K. Kim (2016). Fractional imputation in survey sampling: a comparative review, Statistical Science, 31, 415–432.
  60. S. Yang, G. Imbens, Z. Cui, D. Faries and Z. Kadziola (2016), Propensity score matching and stratification in observational studies with multi-level treatmentsBiometrics, 72, 1055–1065. With R package available “multilevelMatching“.
  61. S. Yang and J. K. Kim (2016). A note on multiple imputation for method of moments estimation, Biometrika103, 244–251.
  62. S. Yang and J. K. Kim (2015). Likelihood-based inference with missing data under missing-at-randomScandinavian Journal of Statistics, 43, 436–454.

    ** Winner of the 2014 JSM Student Paper Competition Award

  63. L. Peyer, G. Welk, L. B. Davis, S. Yang, and J. K. Kim (2015). Factors associated with parent concern for child weight and parenting behaviors, Childhood Obesity, 11, 269–274. (Collaboration work)
  64. S. Yang and Z. Zhu (2015). Variance estimation and kriging prediction for a class of non-stationary spatial modelsStatistica Sinica,25, 135–149.
  65. J. K. Kim and S. Yang (2014). Fractional hot deck imputation for robust estimation under item nonresponse in survey samplingSurvey Methodology40, 211–230.
  66. J. K. Kim, Z. Zhu, and S. Yang (2013). Improved estimation for June Area Survey incorporating several information, Proceedings 59th ISI World Statistics Congress, Hong Kong, China, 199–204.
  67. S. Yang, J. K. Kim and D. W. Shin (2013). Imputation methods for quantile estimation under missing at randomStatistics and Its Interface6, 369–377.
  68. S. Yang, J. K. Kim and Z. Zhu (2013). Parametric fractional imputation for mixed models with nonignorable missing dataStatistics and Its Interface6, 339–347.


Technical Reports 


  1. S. Yang, D. Zeng, X. Wang. Improved Inference for Heterogeneous Treatment Effects Using Real-World Data Subject to Hidden Confounding. [arxiv]
  2. B. Colnet, I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.P. Vert, J. Josse+, S. Yang+. Causal inference methods for combining randomized trials and observational studies: a review. [arxiv]
  3. P. Sang, D. Kong, and S. Yang+. Functional principal component analysis for longitudinal observations with sampling at random. [arxiv]
  4. A. B. Giffin*, B. J. Reich, S. Yang, and A. Rappold. Instrumental variables, spatial confounding and interference. [arxiv]
  5. L. Wu*, S. Yang, B. J. Reich, and A. Rappold. Estimating spatially varying health effects in app-based citizen science research. [arxiv]

    ** Winner of the 2021 ASA Section on Statistics in Epidemiology Young Investigator Award

    ** Winner of the IMB Student Research Award from the 34th New England Statistics Symposium

  6. S. Tang*, S. Yang+, T. Wang, Z. Cui. L. Li, D. Faries. Causal inference of hazard ratio based on propensity score matching. [arxiv]

    ** Winner of the 2021 ENAR Distinguished Student Paper Competition Award

  7. S. Yang and Z. Zhu. Semiparametric estimation of spectral density and variogram with irregular observations. [arxiv]
  8. C. Gao* and S. Yang. Pretest estimation in combining probability and non-probability samples.
  9. C. Gao*, S. Yang, and A. Zhang. Self-supervised Single Image Denoising via Low-rank Tensor Approximated Convolutional Neural Network. [arxiv]
  10. S. Liu*, S. Yang+, Y. Zhang, and G. Liu. Multiply robust estimators in longitudinal studies with missing data under control-based imputation. [arxiv]
  11. S. Liu*, Y. Zhang, G. T. Golm, G. Liu, and S. Yang+.Robust analyses for longitudinal clinical trials with dropouts and non-normal continuous outcomes. [arxiv]
  12. S. Xu*, S. Yang, B. J. Reich. A Bayesian non-parametric method for estimating causal quantile effects.
  13. D. Lee*, S. Ghosh, and S. Yang+. Transporting survival of an HIV clinical trial to the external target populations. [arxiv]
  14. X. Tan*, S. Yang, W. Ye, D. E. Faries, I. Lipkovich, Z. Kadziola. When doubly robust methods meet machine learning for estimating treatment effects from real-world data: a comparative study. [arxiv]
  15. B. Smith, S. Yang, A. Apter, and D. Scharfstein. Trials with irregular and informative assessment times: a sensitivity analysis approach. [arxiv]
  16. Y. Zhang, D. Kong, and S. Yang+. Towards R-learner of conditional average treatment effects with a continuous treatment: T-identification, estimation, and inference. [arxiv]
  17. ** Winner of the 2023 ASA Section on Nonparametric Statistics Student Paper Award

  18. C. Cui, S. Yang, B. J. Reich, and D. Gill. Matching estimators of causal effects in clustered observational studies with application to quantifying the impact of marine protected areas on biodiversity. [arxiv]
  19. P. Zhao, J. Josse, and S. Yang. Efficient and robust transfer learning of optimal individualized treatment regimes with right-censored survival data. [arxiv]




  • S. Yang (2014). Fractional imputation method of handling missing data and spatial statistics. Iowa State University. [Link]