Center for Big Data in Health Sciences

Developing and promoting innovative data science technologies that harness Big Data to improve the health of the people of Texas, the nation and the world.

Explore degree programs >

About the Center for Big Data in Health Sciences (CBD-HS)

Revolutionizing public health through Big Data. 

The Center for Big Data in Health Sciences is a coalition of faculty and staff from across the Texas Medical Center, including the School of Public Health, School of Biomedical Informatics, MD Anderson Cancer Center, McGovern Medical School and more, who are working together to solve public health problems with one of science’s most untapped resources—Big Data.

  • Build a national/international-level Big Data research program for biomedical and health sciences via developing/promoting use of state-of-the-art Big Data analytic approaches and technologies
  • Build a data-driven research platform to bridge the gap between the computational/quantitative scientists and biomedical/health investigators
  • Support development of data science education programs to train next generation of health data scientists
  • Engage and develop partnerships with industries to promote individual health and community well-being by improving diagnosis, treatment, and prevention of diseases and injuries using Big Data
Membership eligibility

We are seeking CBD-HS members with expertise in the following areas:

  • Statistical methodologies for Big Data analytics
  • Bioinformatics data analysis and modeling: Omics data analysis and integration
  • Biomathematical modeling and computational biology
  • Big Data analytics software development
  • Data mining and machine learning
  • Expertise and experience in novel data types: text documents, audio, video, EMR, EHR, mHealth, imaging data, EEG, sensor-based data, wearable device data, GPS data, location-based data, social media data, network data et.
  • High Performance Computing: parallel computing, cloud computing, high performance computing algorithms, numerical optimization algorithms
  • Any clinical, biomedical and health science investigators who are interested in using Big Data for their research and practice

Current research initiatives

If you are interested in learning more about any of the research initiatives we are working on, or would like to get involved, please contact Kevin Banks

GEO Big Data Project

  1. Develop scalable Big Data analytic pipelines to analyze a large number of time course gene expression data sets from the GEO data repository
  2. Develop a web-based collaboration platform to share the large number of analysis results with genetic and biomedical collaborators in order to extract scientific insights and disseminate the large number of findings from the scalable analytic pipelines via publications.

EHR Collaboration Working Group

Promoting collaborations between statisticians/data scientists and biomedical/clinical/epidemiological investigators to use EHR/EMR and medical insurance claim data to develop predictive models for disease risks and evaluate effects of clinical treatments to provide treatment recommendations based on the real-world evidence

EHR Methodology Research Working Group

Developing novel statistical methods and predictive models for EHR and medical insurance claim data in order to address clinical and public health questions

UK Biobank Research Working Group

Developing novel predictive models and statistical methods to integrate heterogeneous and different types of data from the UK Biobank study to address epidemiological and public health questions

How can we help you?

Contributions to UTHealth community: Collaboration/consulting service and support

We provide collaboration support and consulting services to biomedical and health science investigators:

  • Design research projects and tools/strategies for Big Data collection
  • Develop database or data warehouse for Big Data management
  • Big Data harmonization and integration
  • Big Data visualization
  • Big Data analytics
  • Big Data modeling and predictions
  • A Big Data research platform for Big Data identification, management, integration, visualization, analytics, modeling and prediction will be developed to support the Big Data research at UTHealth.

Industry engagement

We will actively develop collaborations and partnerships with related industries, including local companies, national and international corporations who may own Big Data and need analytic support. This will not only benefit our Center's faculty for research purpose, but also this is good for our students to get more opportunities for summer internships and jobs.

CBD-HS available resources

Data Resources, Cerner Health Facts

The Cerner Health Facts database covers all of the health care records for 85 systems with 750 facilities in the United States from 2000 to 2018. The patient-level data in Cerner includes longitudinal encounters with detailed records of diagnoses, medications, clinical events, procedures and lab procedures. It represents a total of 69 million unique patients across the United States. Of the 69 million patients, 52% are female and 42% are male (6% are gender-unidentified). The racial makeup of the 69 million patients is 49.5% Caucasian, 11.8% African American, 2.9% Hispanic, 1.8% Asian and Native American, less than 1% Pacific Islander, Middle Eastern Indian, and 16.4% racial status unidentified. Patient marital status is 33% married, 22.6% single, 3.3% divorced, 3% widowed, and others are marital status unidentified. The mean patient age is 46.8 years old, with a range of 0-90 years old. In total, the database includes 487 million unique encounters with 939 million diagnoses, coded in International Classification of Diseases (ICD-9) codes. The database has 674 million medication records, 118 million procedure records, 5.3 billion clinical event records and 4.2 billion lab procedure records.

Hardware/Software Resources


The Department of Biostatistics and Data Science hosts several state-of-the-art high performance computing equipment. Two recently acquired HPE servers each with 36 cores 72 threads, 768GB memory, and 2 x NVIDIA V100 GPU/16GB. The two servers are connected with a HPE 3PAR storage node of 192 TB capacity by 2 x 10Gbps fiber, and clustered to a Hadoop/HBase/Spark system for big data analysis. The department also has 3 other servers shown in the figure below. [PHOTO]

Technical Staff

A team of highly skilled technical staff provides support for computing, data management, and networking. The Department has a programmer analyst/system administrator to install, maintain, and manage all hardware, software, and networks. A database manager to assist with database creation, manipulation and retrieval. The School of Public Health also provides additional assistance through the IT Department with computer services, network services, telecom, administrative support, and help desk.

Texas Advanced Computing Center (TACC)

The Texas Advanced Computing Center (TACC) is a service available to UT researchers that help in utilizing powerful advanced computing technologies. TACC designs and deploys the world's most powerful advanced computing technologies and innovative software solutions. TACC's environment includes a comprehensive cyberinfrastructure ecosystem of leading-edge resources in high performance computing, visualization, data analysis, storage, archive, cloud, data-driven computing, connectivity, tools, APIs, algorithms, consulting, and software.  They provide systems and software support to researchers, and have worked on over 3000 projects by more than 1000 researchers at over 350 institutions nationally and worldwide that address scientific concepts to improve the quality of life. TACC has a number of HPC clusters including, “Stampede” with 6400 computing nodes, 102,656 cores, 205 terabytes of memory and a peak performance of 10 petaflops (PF), ranked #10 in the world Top500 Supercomputers, November 2015), “Lonestar” which UT System institution investigators have exclusive access to has 1901 computing nodes, 22,256 cores and 302 TF theoretical peak performance,  “Corral” is a collection of storage and data management resources primarily located at TACC, with 5 petabytes of storage installed in the UT data centers at TACC and in Arlington, and an additional petabyte of unreplicated storage for low-latency applications.

Contact us

Kevin Banks
Research Coordinator 
[email protected]
(713) 500-9584


    Conducting needs assessments and "meeting people where they are"

    UTHealth School of Public Health researchers work to bridge that gap between what intervention programs offer versus what's needed by creating programs based on input from the individuals who have lived the experiences. 

    READ MORESPH - Our Impact - Meeting People Where They Are

    Vanessa Schick, PhD; and J. Michael Wilkerson, PhD, MPH

    Carol Huber appointed to the Value Based Payment and Quality Improvement Advisory Committee for Texas

    Huber will serve as a member representing regional healthcare partnerships.

    READ MORESPH - Our Impact 2020 - Carol Huber appointed to Value Based Payment and Quality Improvement Advisory Committee for Texas

    Carol Huber

    Meeting the public health education needs of the Permian Basin community

    UTHealth School of Public Health and the University of Texas Permian Basin (UTPB) College of Business have partnered to provide graduate students with the opportunity to earn a Graduate Certificate in Public Health while simultaneously earning a Master of Business Administration (MBA) beginning Spring 2020.

    READ MOREUTPB Partnership CertificateSPH - Our Impact - UTPB Partnership Certificate

    UTHealth School of Public Health Dean Eric Boerwinkle, PhD, UTPB President Dr. Sandra Woodley

    Preventing and caring for HIV in homeless youth

     Alexis Sims, a doctoral student in health promotion and behavioral sciences at The University of Texas Health Science Center at Houston (UTHealth) School of Public Health, has been awarded a $100,000 supplemental research grant from the National Institutes of Health to investigate HIV prevention and care in homeless youth.

    READ MORESPH - Our Impact - NIH funding for HIV

    Alexis Sims, MPH

    Fighting back against the vaping epidemic among youth

    As e-cigarette use by young people reaches epidemic proportions, researchers at The University of Texas Health Science Center at Houston (UTHealth) have received a $3.1 million grant from the National Institutes of Health to conduct the first-ever assessment on the long-term results of a nationwide nicotine vaping prevention program for youth called CATCH My Breath.

    READ MORESPH - Our Impact - vaping epidemic

    Steven H. Kelder, PhD, MPH

    Leading data collection effort aimed at reducing teen pregnancy

    The data collection effort, expected to take six months, is the second part of a yearlong planning phase to address the issue of pregnancy prevention among children in foster care. Melissa Peskin, PhD, associate professor with UTHealth School of Public Health, will lead the effort.

    READ MORESPH - Our Impact - CLYC slider

    Dr. Markham works with community partner. Photo by Aaron Nieto.