Biostatistics and Data Science

SPH - Biostatistics Home Image Rotator


Announcements

  • Courses in Data Science Flyer
    • PH1975: Introduction to Data Science
      • This course is an introduction to modern data science. Prerequisites: PH1690. R and Python will be used for this course
        Syllabus-PH1975.pdf
      • Topics
        • Data structure, Foundations of algorithms
        • R and Python programming for data science
        • experimental design
        • Data collection and cleaning, data management
        • Database systems, SQL programming
        • Clinical databases
        • Data visualization, Report preparation, Exploratory analysis techniques
    • PH1976: Fundamentals of Data Analytics and Predictions
      • This course is an introduction to data analysis and prediction techniques and tools.
        Syllabus-PH1976.pdf
      • Topics
        • Introduction to Statistical Learning
        • Linear Regression, Logistic Regression, Linear Discriminant Analysis
        • Cross-validation, the Bootstrap
        • Model selection & Regularization (Ridge and Lasso)
        • Dimensionality reduction methods
        • Non-linear Models (Polynomial regression, Splines, Generalized additive models)
        • Tree-based methods (Trees, Bagging, Random Forests, Boosting)
        • Support Vector Machines
        • Neural Networks
        • Unsupervised Learning (Clustering Methods)
    • PH1977: Data Science Computing
      • This course is an introduction to mainstream programming and high performance computing techniques and tools in data science.
        Syllabus-PH1977.pdf
      • Topics
        • The fundamental Computer System
        • Profiling to find bottlenecks
        • Matrix and Vector computation
        • Concurrency & Multiprocessing modules
        • High Performance Computing (using Clusters & GPUs)
        • Introduction to big data files, text data, and web scraping
        • Data manipulation and management via Bash, Pandas, Relational Data Management Systems
        • Introduction to big data systems: Hadoop, Spark
        • Introduction to convex optimization
        • Introduction to deep learning via Pytorch
        • Introduction to (interactive) visualization
    • PH1978: Machine Learning in Practice
      • This course is an advanced data analysis and prediction techniques and tools with applications.
        Syllabus-PH1978.pdf
      • Topics
        • Fundamental concepts of machine learning and its applications.
        • Overview of Supervised and Unsupervised Learning
        • Data representation and features engineering
        • Model Evaluation and Improvement
        • Algorithm Chains and Pipelines
        • Introduction to Deep learning models
        • Working with Text data
        • Working with Image data
        • Introduction to Recommendation Systems
        • Machine learning applications in health sciences
    • PH1998 Special Topic: Advanced Data Science Analytic Methods
      • Learn the state-of-the-art concepts and algorithms in deep learning, generative adversarial networks and their application to imaging, EHR and big omics data analysis.
      • Topics
        • Deep neural networks
        • Deep residual neural networks
        • Dynamics of output of neural networks
        • Analytic solutions to deep wide neural networks
        • Gaussian processes
        • High dimensional data reduction
        • Variational autoencoders
        • Graphic variational autoencoders
        • Variational inference
        • Generative adversarial networks (GANs)
        • Wasserstein and conditional GANs
        • Deep learning counterfactual representations
    • PH1998 Special Topic: Big Data in Practice
      • New concepts of Big Data and Big Data analytic methods. Big Data research projects from different application areas.
      • Topics
        • Concepts of Data Science and Big Data
        • Application Examples Big Data Practice
        • Data visualization and result summarization skills
        • Collaboration Skills in Big Data Practice
        • Communication skills, Teamwork skills
        • Machine Learning and Predictive Modeling Methods: A Big Data Perspective (Big Volume of data, Big Velocity of data, Big Variety of Data, Big Value of Data)
        • Leadership skills in Big Data Practice
        • Practice by participating in one of Big Data projects—Learning by doing
    • Data Science Computing
      • This course is an introduction to mainstream programming and high performance computing techniques and tools in data science. Topics for this course include, The fundamental Computer System, Profiling to find bottlenecks, Matrix and Vector computation, Concurrency & Multiprocessing modules, High Performance Computing (using Clusters & GPUs), Introduction to big data files, text data, and web scraping, Data manipulation and management via Bash, Pandas, Relational Data Management Systems, Introduction to big data systems: (Hadoop, Spark), Introduction to convex optimization, Introduction to deep learning via Pytorch and Introduction to (interactive) visualization.
    • Advanced Data Science Analytic Methods
      • In this course, we will introduce the current developments of Big Data analytic methods. In particular, we will train students to learn the state-of-the-art concepts and algorithms in deep learning, generative adversarial networks and their application to imaging, HER and big omics data analysis. The emphasis will be on creative thinking, problem-solving skills, and hand-on data exploration to generate and address important scientific and business questions from variety of complex Big Data theoretic developments and applications. flyer.pdf
    • Communication, Collaboration and Leadership for Biostatisticians and Data Scientists
      • Designed for aspiring biostatisticians and data scientists alike, our PH 1998: Communication, Collaboration and Leadership for Biostatisticians and Data Scientists course is designed to help equip students with valuable leadership and communication skills that are essential to every domain of the workforce and complement our education program's curriculum.flyer.pdf

    • Spatial-Temporal Analysis for Population Health Data
      • This course is designed for students who wish to analyze spatialtemporal data for population health. Topics include research ethics, study design, databases for spatial-temporal population health data, data retrieving and processing, geocoding using google map API and census geocoder API with R programming environment, exploratory data analysis and data visualization for spatial-temporal data. The course will also introduce a variety of statistical modeling methods for point-level and area-level population data, and focus on their application and interpretation. flyer.pdf

Biostatistics and Data Science Degree Programs

The Biostatistics Department of the UTHealth School of Public Health (SPH) offers graduate studies leading to the Master of Science (MS) and Doctor of Philosophy (PhD) degrees.

Biostatistics is a discipline encompassing the study and development of statistical, mathematical, and computer methods applied to the biological and health sciences. Biostatisticians play a key role in the design, conduct and analysis of research studies of health and disease. There is ample opportunity for experience in consulting and collaborative research. Alumni of the Biostatistics program are prominent in academia, industry and government.

Minors for all degree programs can be selected from the Health Promotion and Behavioral Sciences, Environmental and Occupational Health Sciences, Epidemiology and Disease Control and/or Management, Policy and Community Health Departments. In addition to courses at UTHealth SPH, a wide variety of courses are available through cross registration with other schools and institutions in the Texas Medical Center as well as Rice University and the University of Houston.



Admissions

Dejian Lai, PhD

Professor of Biostatistics
Faculty Admissions Representative
(713) 500-9270
Dejian.Lai@uth.tmc.edu

Emmanuel Moon

Academic and Admissions Advisor
(713) 500-9564
Emmanuel.J.Moon@uth.tmc.edu


Useful Links


Master of Science (MS)


Master of Science in Biostatistics Competencies

MS-B1. Use appropriate statistical methods and models to analyze data from the public health, biomedical, or bioinformatics arena.
MS-B2. Demonstrate the correct use of probability distributions and theory of statistical inference within biostatistics and public health.
MS-B3. Outline a statistical analysis strategy to appropriately answer a research question.
MS-B4. Use multiple statistical software packages to analyze data to answer public health research questions.


Doctor of Philosophy (PhD)


Doctor of Philosophy in Biostatistics Competencies

PhD-B1. Prove or derive a statistical theory and apply the result to public health, biomedical, or bioinformatics data.
PhD-B2. Design a simulation study to show or evaluate a method’s effectiveness.
PhD-B3. Use a unified methodological and theoretical framework to analyze various types of data (such as binary, count or continuous data).
PhD-B4. Use advanced computing techniques (for example, EM algorithm and Monte Carlo integration) to analyze complex data.

Guidelines for Doctoral Qualifying Exam

Guidelines for the new doctoral exam, as approved by the Associate Dean for Academic Affairs.

  1. All doctoral students entering in August 2010 or after will follow this new policy and take the new preliminary exam, oral defense and dissertation defense.  The Department of Biostatistics preliminary exam will include both take-home and in-class portions. The take-home portion will typically focus on material covered in PH 1820, PH1821, PH1830 and PH1831 and is a two-day exam. The in-class exam will focus on material covered in PH1910, PH1911, PH 1915 and PH1951, is a four-hour exam and students are allowed to bring text books and lecture notes pertinent to the recommended courses listed above. All doctoral students entering before August 2010 can choose either the new system or the old system. If you choose the new system, all rules on the new system should be followed. If you choose the old system, you will take the qualifying exam after completing 30 credit hours and demonstrating that you fulfill the requirements of the minor and the breadth. However, if your qualifying exam takes place after January 2011, your biostatistics portion will be the same format as the preliminary exam according to the new system. January 2011 will be the last time the department will give the old format of qualifying exam for biostatistics students.
  2. Those under the old exam system are expected to take biostatistics courses suggested by your advisory committee to meet the requirements of the degree program, regardless of which biostatistics exam you will take. Those under the new exam system are still expected to take additional biostatistics courses (beyond those recommended above) as well as the minor and breadth courses after the preliminary exam as you develop your proposal under the supervision of your dissertation committee.
  3. The biostatistics preliminary exam will be given once a year in August.
  4. It is the School’s policy that a student who failed twice on preliminary exam (new system) or qualifying exam (old system) can not stay in the doctoral program.

Addendum to PhD Students-Teaching Requirements

Doctoral students are required to obtain some teaching experience on biostatistics courses for majors for at least one semester. A typical example is to serve as a teaching assistant for a high level course in biostatistics after they complete the preliminary exam.

Requirements for Thesis/Dissertation

The dissertation should be in a paper format and is supposed to include two submitted papers. A dissertation proposal defense is required before the student advances to doctoral candidacy. For submitted papers, only the dissertation chair needs to verify. For further information about  Dissertation and Thesis Proposal.


Preliminary (Qualifying) Exam Awards


All the Biostatistics and Data Science PhD students who pass the preliminary (qualifying) exam on their first attempt are eligible for the “Qualifying Exam Awards”:

  • Best Preliminary-Exam Award: Highest final score
  • Best Preliminary-Exam Award for Statistical Theories: Highest theory score
  • Best Preliminary-Exam Award for Statistical Applications: Highest application score

One student may get one or more of the three awards. The awardees will receive an award certificate and will be considered a high priority for TA or GRA positions if the positions are available and the awardees are eligible.

Career


Career opportunities abound in the field of biostatistics throughout academia, industry and government. Examples include the pharmaceutical industry, the chemical industry, medical research centers, schools of public health, medical schools and government agencies such as the National Institutes of Health, Centers for Disease Control and Prevention, National Center for Health Statistics, state and local health departments, and the World Health Organization

usnews

Source: https://www.burning-glass.com/research-project/quant-crunch-data-science-job-market/

quantcrunch

Source: https://www.burning-glass.com/wp-content/uploads/The_Quant_Crunch.pdf

Faculty members in the Department of Biostatistics & Data Science have led and contributed to the development of cutting-edge statistical and data science methods for many areas including clinical trial design and analysis, longitudinal and correlated data analysis, machine/statistical learning approaches, survival data analysis, imaging data analysis, Bayesian statistics, statistical genomics and genetics, bioinformatics, graphic and network modeling, stochastic processes, missing data, time series and streaming data, spatial-temporal data, dynamic modeling and prediction, and Big Data approaches. These methods are applied to a wide range of biomedical and health science problems including:

  • Clinical trials for cardiovascular cell therapy, hypertension, stem cells, early treatment of retinopathy of prematurity, heart attack prevention, early treatment of acute spinal cord injury
  • Alzheimer and Parkinson’s diseases
  • Brain injuries and neurosciences
  • Behavioral sciences and mental health
  • Infectious diseases: HIV prevention and AIDS treatment
  • Occupational and environmental exposures in the etiology of adult leukemia
  • Inter-uterine growth through ultrasound measurement
  • Cancer and cervical cancer detection using optical spectroscopy,
  • Analysis of infant mortality in developing countries
  • Queuing models for emergency medical services
  • Stochastic modeling of movement through the health care system
  • Health effects of air pollution
  • Analysis of health services utilization and health care technology assessment
  • US-Mexico border health issues
  • Molecular evolution and phylogenetics
  • Electronic health records (EHR) and medical insurance claim data

Our Department hosts two research centers:

Coordinating Center for Clinical Trials (CCCT)

Center for Big Data in Health Sciences (CBD-HS)

Folfec Atem, Ph.D, MSE

Folfec Atem, Ph.D
Assistant Professor

Research Interests

  » Modeling Longitudinal Studies
  » Single And Multiple Imputation
  » Clinical Trial Designs

 click for full bio

Cici Bauer, PhD

Cici Bauer, PhD
Assistant Professor

Research Interests

  » Spatial-temporal Modeling
  » Bayesian hierarchical models
  » Analysis of wearable device data

 click for full bio

Wenyaw Chan, Ph.D

Wenyaw Chan, Ph.D
Professor of Biostatistics and Data Science and UT Distinguished Teaching Professor
Research Interests

  » Longitudinal Study & Joint Models
  » Statistical Inference on Markov Chain Models
  » Statistical Methods for Public Health Studies

 click for full bio

Baojiang Chen, PhD

Baojiang Chen, PhD
Associate Professor

Research Interests

  » Missing data
  » Longitudinal data analysis
  » Survival analysis

 click for full bio

Barry R Davis, Ph.D

Barry R Davis, M.D,Ph.D
Professor, Director of Center of CCCT and Guy S. Parcel Chair in School of Public Health
Research Interests

» Statistical Methods
» Clinical Trials
» Epidemiology

 click for full bio

Stacia DeSantis, Ph.D

Stacia DeSantis, Ph.D
Professor

Research Interests

  » Bayesian statistics and modeling
  » Hidden Markov Modeling
  » Network meta-analysis

 click for full bio

Yun-Xin Fu, Ph.D

Yun-Xin Fu
Professor

Research Interests

  » Conservation Biology
  » Population & Quantitative Genetics
  » Computational biology

 click for full bio

Dejian Lai, Ph.D

Dejian Lai, Ph.D
Professor

Research Interests

  » Biostatistics
  » Chaos
  » Demography

 click for full bio

Vahed Maroufy

Vahed Maroufy, Ph.D
Assistant Professor

Research Interests

» Big EHR and Gene Expression Analysis
» Geometry of Statistical Models & Mixture Models
» Bayesian Inference & Sensitivity Analysis

 click for full bio

Luis Leon Novelo, Ph.D

Luis Leon Novelo, Ph.D
Assistant Professor

Research Interests

» Bayesian Statistics
» Electronic Health Record
» Traumatic Brain Surgery

 click for full bio

Ruosha Li, Ph.D

Ruosha Li, PhD
Associate Professor

Research Interests

  » Survival Analysis, Quantile Regression
  » Prediction, Validation, Association
  » Clinical trials, Public Health Studies

click for full bio

Xi Luo, Ph.D

Xi Luo, Ph.D
Associate Professor

Research Interests

  » Machine Learning & Big Data
  » Causal Inference & Predictive Modeling
  » Imaging, Omics Data, Mobile Health

click for full bio

Hongyu Miao, MS, PhD

Hongyu Miao, MS, PhD
Associate Professor, Consulting Service Center
Director

Research Interests

  » Graphical Model & Time Series
  » Machine Learning
  » cHealth, EHR, neuroimaging, omics, clinical trials

click for full bio

Adriana Perez, Ph.D

Adriana Perez, Ph.D
Professor

Research Interests

  » Statistical Methods For Handling Missing Data
  » Sampling, sample size & power estimation
  » Group random trials;Stat Met Epidemiology

click for full bio

Michael Swartz, Ph.D

Michael D. Swartz, PhD
Associate Professor

Research Interests

  » Hierarchical models
  » Statistical genetics
  » Statistical Applications for Public Health Studies

click for full bio

Hulin Wu, Ph.D

Hulin Wu, PhD
The Betty Wheless Trotter Professor & Chair

Research Interests

  » Big Healthcare & EHR data
  » Reuse of Public Big Data for Research
  » Dynamic modeling of time course data

click for full bio

Momiao Xiong, Ph.D

Momiao Xiong, Ph.D
Professor

Research Interests

  » Statistical Genetics
  » Big data analysis in Omics & Imaging
  » Artificial Intelligence & Causal Inference

 click for full bio

Jose-Miguel Yamal, Ph.D

Jose-Miguel Yamal, PhD
Associate Professor of Biostatistics & Data Science

Research Interests

  » Clinical Trials
  » Machine learning & high-dimensional data
  » EHR and Big Data

 click for full bio

James Yang, Ph.D

James Yang, Ph.D
Associate Professor

Research Interests

  » Statistical genetics and genomics
  » Bioinformatics
  » Statistical methods for medical research

click for full bio

Ashraf Yaseen, Ph.D

Ashraf Yaseen, Ph.D
Assistant Professor

Research Interests

  » Bionformatics
  » Machine Learning
  » Data Analysis

click for full bio

Wei Zhang, Ph.D

Wei Zhang, Ph.D
Assistant Professor

Research Interests

  » Statistical methods
  » Clinical outcome research
  » Neuro-imaging data of pediatric epilepsy

click for full bio

Hongjian Zhu, Ph.D

Hongjian Zhu, Ph.D
Associate Professor

Research Interests

  » Clinical Trials, Adaptive Designs
  » Causual Inference, Large Sample Theory
  » Cardiovascular Disease

click for full bio


Adjunct Faculty

Veera Baladandayuthapani, PhD

Primary Affiliation: MD Anderson Cancer Center

click for full bio

Julia S. Benoit, PhD

Primary Affiliation: University of Houston

click for full bio

Scott B Cantor, PhD

Primary Affiliation: MD Anderson Cancer Center

click for full bio

John E. Cornell, PhD

Primary Affiliation:

click for full bio

Judith K Dunn, MS. Ph.D

Primary Affiliation:

click for full bio

Susan Galloway Hilsenbeck, Ph.D

Primary Affiliation: Breast Center at Baylor College of Medicine

click for full bio

Xuelin Huang, PhD

Primary Affiliation: MD Anderson Cancer Center

click for full bio

Kresimir Josic, PhD

Primary Affiliation: University of Houston

click for full bio

J. Jack Lee, PhD

Primary Affiliation: MD Anderson Cancer Center

click for full bio

Yuanyuan Liang, PhD

Primary Affiliation:

click for full bio

Sheng Luo, PhD

Primary Affiliation: Duke University

click for full bio

Clyde Martin, PhD

Primary Affiliation: Texas Tech University

click for full bio

Charles Minard, PhD, MS

Primary Affiliation:

click for full bio

Jing Ning, PhD

Primary Affiliation: MD Anderson Cancer Center

click for full bio

Leif Peterson, PhD, MPH

Primary Affiliation: The Methodist Hospital Research Institute

click for full bio

Joan Reisch, PhD

Primary Affiliation: UT Southwestern Medical School

click for full bio

Bruce Rodda, PhD, MBA

Primary Affiliation: Strategic Statistical Consulting LLC

click for full bio

Paul Scheet, PhD

Primary Affiliation: MD Anderson Cancer Center

click for full bio

E. O'Brian Smith, PhD

Primary Affiliation: Baylor College of Medicine

click for full bio

Cross Appointments

Alice Z. Chuang, PhD

UTHealth Science Center

click for full bio

Claudia Pedroza, PhD

UTHealth Science Center Medical School

click for full bio

Jim Zheng, PhD

UTHealth Science Center, School of Biomedical Informatics

click for full bio

Emeritus Professors

Ralph F. Frankowski

Ph.D. The University of Michigan, Rackham Graduate School, Ann Arbor, MI, 1967

Robert J. Hardy

Ph.D. University of California, Berkeley 1969

Asha S. Kapadia

Ph.D, Harvard University, 1969

Barbara C. Tilley

Ph.D, University of Texas School of Public Health, 1981

Staff

Marice Barahona

Administrative Manager marice.barahona@uth.tmc.edu
(713) 500-9440

Kevin Banks

Research Coordinator kevin.j.banks@uth.tmc.edu
(713) 500-9584

Michael Gonzalez

Programmer Analyst III
michael.o.gonzalez@uth.tmc.edu
(713) 500-9578

Emmanuel Moon

Academic & Admissions Advisor I
emmanuel.j.moon@uth.tmc.edu
(713) 500-9521

Leqing Wu

Programmer Analyst III
leqing.wu@uth.tmc.edu
(713) 500-9579

Title Job Link

Assistant/Associate Professor of Instruction (NTT)

Department of Biostatistics & Data Science at School of Public Health is an academic home for biostatisticians and data scientists at UTHealth to conduct high-quality education, research and service to improve the health of the people of Texas, the nation and the world. Our Department offers the following education programs:

  • Data Science Certificates
  • MS/PhD in Public Health with a major in Biostatistics

Our faculty members have a diverse array of statistical and data science methodology expertise with application interests in variety of biomedical and health science fields. For more details, please see our faculty profile webpage

Our Department hosts three research and collaboration/service centers:

  • Coordinating Center for Clinical Trials (CCCT) which has a more-than-40-year history of successfully coordinating large multicenter clinical trials and providing comprehensive services for clinical trial operation, data management and data analysis.
  • Center for Big Data in Health Sciences (CBD-HS) hosts several Big Data research projects, including development of predictive models and heterogeneous data integration methods for Electronic Healthcare Record (EHR) data, insurance claim data and public databases or data repositories.
  • Center for Biostatistics Collaboration and Data Services will provide opportunities for our faculty and students to collaborate and interact with biomedical and health science investigators to address scientific questions using their statistical and data science skills.