Coming with the new century, integration of computer technology into medical practice has enabled scientists to collect massive volumes of electronic health records (EHR) and, in the meantime, deep learning has been developed as the major tool of massive data analysis. However, the EHR data are heterogeneous [varied much for different groups of patients] and fragmented [consisting of a high proportion of missing values], which poses a significant barrier to the applicability and generalizability of current deep neural networks. This project aims to build a health prediction system based on a new type of stochastic neural network (StoNet) with massive, heterogeneous, and fragmented data, while considering integration of the omics, imaging and EHR data in training the system. The StoNet is formulated as a composition of many simple regressions; it is asymptotically equivalent to the deep neural network (DNN) in function approximation as the training sample size becomes large, but its structure is more flexible for dealing with the complexity of EHR data. The StoNet is trained by an adaptive stochastic gradient Markov chain Monte Carlo (MCMC) algorithm. By leveraging on the flexible structure of the StoNet and the sophisticated adaptive stochastic gradient MCMC algorithm, this project provides a rigorous statistical framework for deep learning with massive, heterogeneous and fragmented EHR data. We show that the StoNet forms a bridge from linear models to deep learning, enabling many of the theory and methods developed for linear models to be transferred to deep learning. In particular, we show the sparse learning theory developed for linear models with the Lasso penalty can be transferred to the StonNet, leading to an innovative consistent sparse deep learning method; we address the data heterogeneity issue by replacing each regression of the first hidden layer of the StoNet by a mixture regression; and we address the missing data issue by training the StoNet with an adaptive stochastic gradient MCMC algorithm where the missing data are imputed as for a linear model with multiple imputation methods. The Markovian structure of the StoNet enables the network parameters to be locally learned with fragmented data and leads to an innovative way for nonlinear sufficient dimension reduction of high-dimensional data, facilitating integration of different types of data in StoNet training. We also show the prediction uncertainty of the StoNet can be easily quantified with a recursive application of Eve's law.