Software development for Stan to improve survey statistics for non-probability samples

NIH RePORTER · NIH · R01 · $233,137 · view on reporter.nih.gov ↗

Abstract

1 Project Summary This proposal is a supplement to our NIH grant R01 AG067149-01: Improving Representativeness in Non-probability Surveys and Causal Inference with Regularized Regression and Poststrati cation. That project involves developing certain Bayesian methods for sampling adjustment in a general, exible, and reliable way that can be used for a wide range of problems in public health research. The project requires extensive use of the Stan probabilistic programming platform, both as part of the research e ort and as part of resulting methods. This NOSI is synergistic with that grant. It will support new software engineering initiatives to improve the core Stan platform in three ways: (1) Providing the option for JSON format outputs will improve interoperability and facilitate incorporating Bayesian methods into machine learning pipelines; (2) Extending and refactoring the core Stan inference algorithms for greater memory eciency and increased parallel processing will improve the overall speed and scalability of infer- ence, allowing for Bayesian methods to be used with increasingly complex models. This will allow researchers to compare a greater number and wider range of models in order to nd those with optimal behaviors. (3) The addition of a standard logging framework will bene t both the Stan user community and the developer community. The parent grant's research agenda is threefold. Firstly, it is directed to addressing the unique challenges posed by public health datasets and questions by investigating adaptations to state-of- the art modelling techniques. Secondly, it strives to improve causal inferences for demographic subgroups. Thirdly, and more broadly, it seeks to improve current methodology by developing work ows to test and validate models with non-representative data in order to obtain better and more trustworthy population based estimates. The work in the NOSI is relevant in two ways. First, it will directly support the research in the main project. During our research, computational challenges arise. The progress in research reveals areas where the computational infrastructure needs to be improved; thus, the NOSI will enable us to do our NIH-funded research more e ectively. Second, it's important for the results of our research to be used by others. The computing work in the NOSI will make it easier for applied practitioners to make use of the research we have been developing. Furthermore, the addition of a common data format for inputs and outputs, greater processing speed and eciency, and standardized logging will make it easier to use Stan in complex processing pipelines, therefore improving overall cloud-readiness.

Key facts

NIH application ID
10405924
Project number
3R01AG067149-02S1
Recipient
COLUMBIA UNIV NEW YORK MORNINGSIDE
Principal Investigator
ANDREW GELMAN
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$233,137
Award type
3
Project period
2020-08-01 → 2023-04-30