About the data
Real-time data is messy and we see the ramifications of this in a few of our graphs. The interpretations of this data should be considered in light of several limitations. Briefly,
- Data source: Texas Department of State Health Services.
- Day-to-day data at the state-level (which we are displaying) will be different than county-level dashboards. This is due to the reporting lag from the county to the state.
- Testing and cases data are underestimated. This is because not all clinics are reporting data in a comprehensive, systematic manner. As time moves on, this data gets more and more accurate to reflect the "true" burden in Texas.
- Positive COVID19 cases after May 19 do NOT include antibody tests.
- There are data dumps for smaller counties. A "data dump" is a large amount of cases counted on one date instead of spread over time. We probably see this, for example, in Anderson county on June 17. This impacts projection, hot spots, and R(t) analyses. Interpret cautiously.
- There are some situations where the daily cases are negative. This may be due to retrospective county corrections. The models “smooth” the data to account for these outliers, but larger outliers may still have significant influence on analyses.
Details of the statistical modeling
R(t) was estimated using the R0 package in R. The generation time estimates were generated using a gamma distribution with mean 3.96 and standard deviation 4.75, based on estimates from Ganyani, et al. 2020. We investigated several options based on the literature and found this choice of distribution and parameters to not affect the results significantly. Some of the R(t) estimates may be unreliable for counties with “data dumps”. Interpret cautiously for these counties as we are not able to observe the trend over time. To smooth the data, we used 7-day moving averages in the R(t) estimation. Case reports are lagged from test administration (at the beginning the lag was closer to 2-3 weeks and by May it was closer to 2-5 days), thus what we currently are able to estimate is a reflection of tests performed several days prior. This lag may differ by county, hospital, and testing center. R(t) is only estimated for counties who have had at least 50 total cases.
Ganyani T, Kremer C, Chen D, Torneri A, Faes C, Wallinga J, Hens N. Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data, March 2020. Eurosurveillance. 2020 Apr 30;25(17):2000257.
To build the predictions of new cases, an Auto regressive Integrated Moving Average (ARIMA) model was built on 7-day moving averages of new cases to predict 10-days in the future. The order of autoregression, degree of differencing, and moving average for the model was selected using the auto.arima function in R, selecting the best model based on the Aikake Information Criteria. If the number of max daily cases was too low, no estimates were produced. The limitations of these projections include: (1) these projections are built purely on previous data trends and do not account for any covariates at this time; (2) predictions may be unreliable for counties that have data dumps; (3) case reports are lagged from test administration (at the beginning the lag was closer to 2-3 weeks and by May it was closer to 2-5 days), thus what we currently are able to estimate is a reflection of tests performed several days prior. This lag may differ by county, hospital, and testing center.
UTHealth School of Public Health team
Jose-Miguel Yamal, PhD, Associate Professor of Biostatistics and Data Science
Ashraf Yaseen, PhD, Assistant Professor of Data Science
Shreela Sharma, PhD, Professor of Epidemiology
Katelyn Jetelina, PhD, Assistant Professor of Epidemiology
Bijal Bala, PhD, Associate Professor of Epidemiology
Nalini Ranjit, PhD, Associate Professor of Health Promotion and Behavioral Sciences
Alanna Morrison, PhD, Professor and Chair, Department of Epidemiology, Human Genetics and Environmental Sciences
Sungjin (Elin) Cho
Support for the dashboard has been generously provided by the Department of Epidemiology, Human Genetics and Environmental Sciences. Faculty have appointments in both the Biostatistics and Data Science Department and Department of Epidemiology, Human Genetics and Environmental Sciences at the University of Texas Health Science Center, School of Public Health.