Center for Big Data in Health Sciences

Developing and promoting innovative data science technologies that harness Big Data to improve the health of the people of Texas, the nation and the world.

ABOUT
RESEARCH
SERVICES
RESOURCES
USEFUL LINKS
CONTACT

About the Center for Big Data in Health Sciences (CBD-HS)

Revolutionizing public health through Big Data.

The Center for Big Data in Health Sciences is a coalition of faculty and staff from across the Texas Medical Center, including the School of Public Health, School of Biomedical Informatics, MD Anderson Cancer Center, McGovern Medical School and more, who are working together to solve public health problems with one of science’s most untapped resources—Big Data.

Goals

Build a national/international-level Big Data research program for biomedical and health sciences via developing/promoting use of state-of-the-art Big Data analytic approaches and technologies
Build a data-driven research platform to bridge the gap between the computational/quantitative scientists and biomedical/health investigators
Support development of data science education programs to train next generation of health data scientists
Engage and develop partnerships with industries to promote individual health and community well-being by improving diagnosis, treatment, and prevention of diseases and injuries using Big Data

Membership eligibility

We are seeking CBD-HS members with expertise in the following areas:

Statistical methodologies for Big Data analytics
Bioinformatics data analysis and modeling: Omics data analysis and integration
Biomathematical modeling and computational biology
Big Data analytics software development
Data mining and machine learning
Expertise and experience in novel data types: text documents, audio, video, EMR, EHR, mHealth, imaging data, EEG, sensor-based data, wearable device data, GPS data, location-based data, social media data, network data et.
High Performance Computing: parallel computing, cloud computing, high performance computing algorithms, numerical optimization algorithms
Any clinical, biomedical and health science investigators who are interested in using Big Data for their research and practice

Current research initiatives

If you are interested in learning more about any of the research initiatives we are working on, or would like to get involved, please contact Kevin Banks.

GEO Big Data Project

Goals:

Develop scalable Big Data analytic pipelines to analyze a large number of time course gene expression data sets from the GEO data repository
Develop a web-based collaboration platform to share the large number of analysis results with genetic and biomedical collaborators in order to extract scientific insights and disseminate the large number of findings from the scalable analytic pipelines via publications.

EHR Collaboration Working Group

Promoting collaborations between statisticians/data scientists and biomedical/clinical/epidemiological investigators to use EHR/EMR and medical insurance claim data to develop predictive models for disease risks and evaluate effects of clinical treatments to provide treatment recommendations based on the real-world evidence

EHR Methodology Research Working Group

Developing novel statistical methods and predictive models for EHR and medical insurance claim data in order to address clinical and public health questions

UK Biobank Research Working Group

Developing novel predictive models and statistical methods to integrate heterogeneous and different types of data from the UK Biobank study to address epidemiological and public health questions

How can we help you?

Contributions to UTHealth community: Collaboration/consulting service and support

We provide collaboration support and consulting services to biomedical and health science investigators:

Design research projects and tools/strategies for Big Data collection
Develop database or data warehouse for Big Data management
Big Data harmonization and integration
Big Data visualization
Big Data analytics
Big Data modeling and predictions
A Big Data research platform for Big Data identification, management, integration, visualization, analytics, modeling and prediction will be developed to support the Big Data research at UTHealth.

Industry engagement

We will actively develop collaborations and partnerships with related industries, including local companies, national and international corporations who may own Big Data and need analytic support. This will not only benefit our Center's faculty for research purpose, but also this is good for our students to get more opportunities for summer internships and jobs.

CBD-HS available resources

Data Resources, Cerner Health Facts

The Cerner Health Facts database covers all of the health care records for 85 systems with 750 facilities in the United States from 2000 to 2018. The patient-level data in Cerner includes longitudinal encounters with detailed records of diagnoses, medications, clinical events, procedures and lab procedures. It represents a total of 69 million unique patients across the United States. Of the 69 million patients, 52% are female and 42% are male (6% are gender-unidentified). The racial makeup of the 69 million patients is 49.5% Caucasian, 11.8% African American, 2.9% Hispanic, 1.8% Asian and Native American, less than 1% Pacific Islander, Middle Eastern Indian, and 16.4% racial status unidentified. Patient marital status is 33% married, 22.6% single, 3.3% divorced, 3% widowed, and others are marital status unidentified. The mean patient age is 46.8 years old, with a range of 0-90 years old. In total, the database includes 487 million unique encounters with 939 million diagnoses, coded in International Classification of Diseases (ICD-9) codes. The database has 674 million medication records, 118 million procedure records, 5.3 billion clinical event records and 4.2 billion lab procedure records.

Hardware/Software Resources

Infrastructure

The Department of Biostatistics and Data Science hosts several state-of-the-art high performance computing equipment. Two recently acquired HPE servers each with 36 cores 72 threads, 768GB memory, and 2 x NVIDIA V100 GPU/16GB. The two servers are connected with a HPE 3PAR storage node of 192 TB capacity by 2 x 10Gbps fiber, and clustered to a Hadoop/HBase/Spark system for big data analysis. The department also has 3 other servers shown in the figure below. [PHOTO]

Technical Staff

A team of highly skilled technical staff provides support for computing, data management, and networking. The Department has a programmer analyst/system administrator to install, maintain, and manage all hardware, software, and networks. A database manager to assist with database creation, manipulation and retrieval. The School of Public Health also provides additional assistance through the IT Department with computer services, network services, telecom, administrative support, and help desk.

Texas Advanced Computing Center (TACC)

The Texas Advanced Computing Center (TACC) is a service available to UT researchers that help in utilizing powerful advanced computing technologies. TACC designs and deploys the world's most powerful advanced computing technologies and innovative software solutions. TACC's environment includes a comprehensive cyberinfrastructure ecosystem of leading-edge resources in high performance computing, visualization, data analysis, storage, archive, cloud, data-driven computing, connectivity, tools, APIs, algorithms, consulting, and software. They provide systems and software support to researchers, and have worked on over 3000 projects by more than 1000 researchers at over 350 institutions nationally and worldwide that address scientific concepts to improve the quality of life. TACC has a number of HPC clusters including, “Stampede” with 6400 computing nodes, 102,656 cores, 205 terabytes of memory and a peak performance of 10 petaflops (PF), ranked #10 in the world Top500 Supercomputers, November 2015), “Lonestar” which UT System institution investigators have exclusive access to has 1901 computing nodes, 22,256 cores and 302 TF theoretical peak performance, “Corral” is a collection of storage and data management resources primarily located at TACC, with 5 petabytes of storage installed in the UT data centers at TACC and in Arlington, and an additional petabyte of unreplicated storage for low-latency applications.

Additional databases and centers

Center for Biostatistics Collaboration and Data Services

Immunology and Infectious Diseases Databases

Dataset Recommendation

Research Platform for Time Course Gene Expression Data

Contact us

Kevin Banks
Research Coordinator
[email protected]
(713) 500-9584

SEE OUR IMPACT

Conducting needs assessments and "meeting people where they are"

UTHealth School of Public Health researchers work to bridge that gap between what intervention programs offer versus what's needed by creating programs based on input from the individuals who have lived the experiences.
READ MORESPH - Our Impact - Meeting People Where They Are
SEE OUR IMPACT

Carol Huber appointed to the Value Based Payment and Quality Improvement Advisory Committee for Texas

Huber will serve as a member representing regional healthcare partnerships.
READ MORESPH - Our Impact 2020 - Carol Huber appointed to Value Based Payment and Quality Improvement Advisory Committee for Texas
SEE OUR IMPACT

Meeting the public health education needs of the Permian Basin community

UTHealth School of Public Health and the University of Texas Permian Basin (UTPB) College of Business have partnered to provide graduate students with the opportunity to earn a Graduate Certificate in Public Health while simultaneously earning a Master of Business Administration (MBA) beginning Spring 2020.
READ MOREUTPB Partnership CertificateSPH - Our Impact - UTPB Partnership Certificate
SEE OUR IMPACT

Preventing and caring for HIV in homeless youth

Alexis Sims, a doctoral student in health promotion and behavioral sciences at The University of Texas Health Science Center at Houston (UTHealth) School of Public Health, has been awarded a $100,000 supplemental research grant from the National Institutes of Health to investigate HIV prevention and care in homeless youth.
READ MORESPH - Our Impact - NIH funding for HIV
SEE OUR IMPACT

Fighting back against the vaping epidemic among youth

As e-cigarette use by young people reaches epidemic proportions, researchers at The University of Texas Health Science Center at Houston (UTHealth) have received a $3.1 million grant from the National Institutes of Health to conduct the first-ever assessment on the long-term results of a nationwide nicotine vaping prevention program for youth called CATCH My Breath.
READ MORESPH - Our Impact - vaping epidemic
SEE OUR IMPACT

Leading data collection effort aimed at reducing teen pregnancy

The data collection effort, expected to take six months, is the second part of a yearlong planning phase to address the issue of pregnancy prevention among children in foster care. Melissa Peskin, PhD, associate professor with UTHealth School of Public Health, will lead the effort.

READ MORESPH - Our Impact - CLYC slider

Center for Big Data in Health Sciences

Developing and promoting innovative data science technologies that harness Big Data to improve the health of the people of Texas, the nation and the world.

About the Center for Big Data in Health Sciences (CBD-HS)

Revolutionizing public health through Big Data.

Goals

Membership eligibility

Current research initiatives

GEO Big Data Project

Goals:

EHR Collaboration Working Group

EHR Methodology Research Working Group

UK Biobank Research Working Group

How can we help you?

Contributions to UTHealth community: Collaboration/consulting service and support

Industry engagement

CBD-HS available resources

Data Resources, Cerner Health Facts

Hardware/Software Resources

Infrastructure

Technical Staff

Texas Advanced Computing Center (TACC)

Additional databases and centers

Contact us

CENTER FACULTY

EXPERTS IN EDUCATIONAND RESEARCH

DR. HULIN WU

MEET THECENTER DIRECTOR

SEE OUR IMPACT

Conducting needs assessments and "meeting people where they are"

SEE OUR IMPACT

Carol Huber appointed to the Value Based Payment and Quality Improvement Advisory Committee for Texas

SEE OUR IMPACT

Meeting the public health education needs of the Permian Basin community

SEE OUR IMPACT

Preventing and caring for HIV in homeless youth

SEE OUR IMPACT

Fighting back against the vaping epidemic among youth

SEE OUR IMPACT

Leading data collection effort aimed at reducing teen pregnancy

Institute on Aging Lecture Series: Xiaoqian Jiang, PhD

Substance Use and Stigma in Healthcare Virtual Networking Session

Dissertation proposal defense by Pauline C Rotich, MS | PhD Candidate : COVID-19 policy implications on Occupational Health

Welcome Week 2025- Getting to Know Handshake

Welcome Week 2025- Practice Starts Now/GET PHIT

Welcome Week 2025: Ella Coffee

Integrating AI to address NMDOH - Virtual Networking Session

Student Organization Fair

Dissertation Proposal Defense by Kimberly Hsu, MD:Comparing Effectiveness of DPC on Cost, Disease Prevention and Clinical Outcomes

Dissertation Proposal Defense by Jinze Li, MSPH: Advancing a Wastewater-Based Surveillance Framework for Monitoring and Predicting

Welcome Week 2025- Alumni Panel

Dissertation Proposal Defense by Nompumelelo O. Mzizi, BSc, MSPH

Dissertation Proposal Defense by Holly McMillan MCD CCC SLP: Exploring the Physical and Functional Spectrum of Oncologic Trismus

Cookies & Careers for Epidemiology Doctoral Students

Kashmere Gardens Community Walk and Learn

Graduate Archer Fellowship Program Information Session #1 (Summer 2026)

Dissertation Defense by Chunhui Gu: Leveraging Multi-omics to Enhance High-throughput Proteomics via Graph Neural Network

Local Wellness Policies and the Role of School Health Advisory Councils

Dissertation Proposal Defense by Alvin Estacio, MS: Characterizing Workplace Health Program Participation for Cleaning Workers

Salutation: Back to School Student Mixer!

EXPERTS IN EDUCATION
AND RESEARCH

MEET THE
CENTER DIRECTOR