Correlation in the COVID-19 cases and deaths depending on the delay. Part 2.

Por: Datahack
marzo 24, 2022
By Michal Deak.

Part 1 here.

What did we calculate (or the discussion)

The number TD = 17 as the result for the time from infection to death for the 2nd pandemic wave is very close to the “real” measured time TD = 18.5 days in which it takes to die from COVID-19 after being infected by SARS-Cov-2. It looks like one can obtain some answers from a simple analysis of simple data. The discrepancy of the result for the 1st pandemic wave TD =9 though feels less encouraging to make brave conclusions and it should be so. Let us now look back to the beginning and try to explain what did we actually calculate.

First of all, the day most cases are detected is not the day of infection, so, the time TD we get from this kind of analysis should be somehow shorter than the correct value. It explains why the 1st wave TD is so much shorter than the correct one. There might be a delay in the time when a death is classified as COVID-19 caused death which might differ case to case. With the improvement of contact tracing during the later stages of the pandemic, we might safely assume, that the time between the infection and positive testing of an individual for the 2nd wave of the pandemic got shorter.

If we assume, that the real TD did not change between the pandemic/epidemic waves, we might conclude, that the times between infection and test, and the time between the death and registration of the death as a COVID-19 death got significantly shorter between the waves or they compensate each other in such way, that the result of our analysis for the 2nd wave is pretty close to the real TD. Just by the nature of the data we analyze, the information we can obtain from it is limited.

Figure 6: Correlation plot of new SARS-Cov-2 cases to new SARS-Cov-2 death cases. Delay of deaths 17 days (right) and 9 days (left). Averaged over 7 days.

Results with recent data

Above we have looked at the first months of the pandemic in United States, but using Power BI and publicly available data from Johns Hopkins University [5] we have created a dashboard using which one can do a hands on analysis of the correlation dependence on the delay in the COVID-19 data.

One can chose a country, time window and the delay and observe the correlation of new cases versus new deaths in a plot created automatically. The dashboard provides also plot of overlay of the deaths and the delayed cases and calculates a fatality rate using the slope of the fit of the data in the correlation scatter plot. Examples using US and Spain data can be seen in Figs. 

Figure 7: Power BI dashboard for the correlation of US cases vs deaths data in the time window from 5th June 2021 till 15th November 2021.

Figure 8: Power BI dashboard for the correlation of Spain cases vs deaths data in the time window from 27th December 2020 till 23th February 2021.

Final conclusions

Although the results seem interesting we might be skeptical, if we have learned anything new at all. We have definitely learned though, that there was a change in the correlation between the SARS-Cov-2 cases and COVID-19 deaths in summer in the middle of June 2020. Using the Power BI dashboard one finds, that each of the peaks in the data has different properties. This might be caused by change in data collection, testing, environmental changes, virus changes or recently vaccination of the population.

We have learned however, that even a simple analysis can give interesting and illuminating results. Before the analysis we knew, that a rise in COVID-19 cases causes later a rise in COVID-19 caused deaths. We have formulated a hypothesis and suprisingly found, that the results we obtained are not far from what we expected. It was definitely an exercise worth the time.

Codes used to make the plots for this article is available here.







