Introduction
COVID-19 (previously 2019-nCoV) is caused by the new SARS-CoV-2 coronavirus and originated from Wuhan city, China (Zhu et al., Reference Zhu, Zhang, Wang, Li, Yang, Song, Zhao, Huang, Shi, Lu, Niu, Zhan, Ma, Wang, Xu, Wu, Gao and Tan2020). Although the first case was reported in December 2019, 2020, COVID-19 has continued to spread around the world and over 150 countries have been affected to this day (WHO, 2020). COVID-19 was first reported in the US on January 20, 2020 (Holshue et al., Reference Holshue, DeBolt, Lindquist, Lofy, Wiesman, Bruce, Spitters, Ericson, Wilkerson, Tural, Diaz, Cohn, Fox, Patel, Gerber, Kim, Tong, Lu, Lindstrom, Pallansch, Weldon, Biggs, Uyeki and Pillai2020). While little action was taken initially, an exponential increase in the number of cases have spurred immediate actions to contain COVID-19. For instance, social distancing is being enforced via the closure of educational institutions, restrictions on travel, and suspension of events. Unfortunately, containment has been hindered by various factors such as the initial production of defective test kits and a current lack of test kits and medical supplies (Yeo & Ganem, Reference Yeo and Ganem2020). As a result, it is currently unclear how well the US will cope with COVID-19 over the next few months, especially in view of the lack of adequate projections of the scale of COVID-19 infections and mortality across the country at the time of completing this study (March 19, 2020).
Objective
In this paper, we attempt to construct a mathematical model that simulates the scale of COVID-19 outbreak in the West and East Coast population regions of the US.
Methods
Designing the Model
To model the COVID-19 outbreak in the US, we use a variant of the SEIR model (see (Dureau et al., Reference Dureau, Kalogeropoulos and Baguelin2013; Lin et al., Reference Lin, Zhao, Gao, Lou, Yang, Musa, Wang, Cai, Wang, Yang and He2020) for some examples). We mainly focus on the West and East Coast population regions of the United States. We assume that there is no travel between the different population zones because people living at the opposite ends of the coasts seldom intertravel and travels will eventually halt while the US transitions to lockdown. We also assume that the natural birth and death rates are equal because the total population does not differ significantly during a short timeframe in the absence of a catastrophe.
Figure 1 depicts the basis of our model, focusing on a particular coast, where arrows indicate the flow of the population at different stages.
In Fig. 1, $ S $ represents the susceptible population, and $ E $ represents the exposed population (i.e. individuals who have been infected but are not yet themselves infectious). The infected population $ I $ is divided into two groups, $ {I}_H $ and $ {I}_C $, wherein the subscripts H and C stand for “hospital” and “community” respectively. $ {I}_H $ represents those that are infected and isolated (such as those that have been pre-tested and found to carry the virus), and $ {I}_C $ for those that are infected but not isolated (such as those with unreported cases or present mild symptoms that are overlooked). Thus, people in $ {I}_H $ are unable to spread the virus whereas those in $ {I}_C $ can spread the virus. While $ {I}_H $ may potentially contribute to in-hospital transmissions to health professionals, the relative number of cases would be much lower and can be negligible because hospitals are equipped with personal protective equipment and proper sanitization, and health professionals represent a low number of the population. As testing kits in the US are currently in low supply, the current model projects significantly higher levels of $ {I}_C $ than $ {I}_H $ at any given point in time. Once infected, there are two possibilities: either recovery or death; these outcomes are represented by the populations $ {R}_H $, $ {R}_C $ and $ {D}_H $, $ {D}_C $ respectively. We also assume the recovered population will have acquired immunity to the virus and are no longer susceptible (Bao et al., Reference Deng, Bao, Liu, Xiao, Liu, Xue, Lv, Qi, Gao, Yu, Xu, Qu, Li, Xiang, Yu, Gong, Liu, Wang, Wang, Song, Liu, Zhao, Han, Zhao, Liu, Wei and Qin2020; Lu et al., Reference Lu, Zhao, Li, Niu, Yang, Wu, Wang, Song, Huang, Zhu, Bi, Ma, Zhan, Wang, Hu, Zhou, Hu, Zhou, Zhao, Chen, Meng, Wang, Lin, Yuan, Xie, Ma, Liu, Wang, Xu, Holmes, Gao, Wu, Chen, Shi and Tan2020).
In summary, below are the key assumptions we used to design the model:
• no travel between coasts,
• equal birth and death rates,
• infected populations that arise from in-hospital transmissions are negligible,
• virus immunity after contracting COVID-19,
• $ {I}_C=1 $ at the start (see implementation of model).
The actual mathematical model underlying Fig 1 is as follows.
In the differential equations above, $ N=S+E+{I}_H+{I}_C+{R}_H+{R}_C $ represents the effective total population in the US, i.e. the population that matters for the purposes of virus transmission. The various parameters in the differential equations and their chosen values (based on (Lauer et al., Reference Lauer, Grantz, Bi, Jones, Zheng, Meredith, Azman, Reich and Lessler2020; Lazzerini et al., Reference Lazzerini and Putoto2020; Lin et al., Reference Lin, Zhao, Gao, Lou, Yang, Musa, Wang, Cai, Wang, Yang and He2020; Linton et al., Reference Linton, Kobayashi, Yang, Hayashi, Akhmetzhanov, Jung, Yuan, Kinoshita and Nishiura2020; Liu et al., Reference Liu, Hu, Xiao, He, Kang, Rong, Lin, Zhong, Huang, Deng, Zeng, Tan, Zeng, Zhu, Li, Gong, Wan, Chen, Guo, Li, Sun, Liang, Song, He and Ma2020)) are also summarized in the table below. In particular, in our implementation we account for the variations of incubation period $ {\alpha}^{-1} $ and infectious period $ {\gamma}^{-1} $ by sampling 100 values from an Erlang distribution of shape 2 with minimum restricted to be 1 day (i.e. minimum possible incubation period is 1 day). The West Coast also takes the lower of the $ R $ values to account for the lower average population density.
Notice that the reproduction rate $ R $ is not present in the differential equations. However, $ R $ is important even if the contact rate is hard to directly estimate, and we will use $ R $ and $ \gamma $ to estimate $ \beta $ using the relation $ \beta = R\gamma $. The reproduction rate is also not static over time; as hinted above in the table, we will use different values over the following three phases.
• Phase 1: There is no action done on the epidemic.
• Phase 2: Some action is done to slow down the epidemic while preparing for the next phase.
• Phase 3: Coast shutdown; schools moved online, events canceled, and most public areas closed.
To prepare for a quasi-worst-case-scenario, we assume $ R\ge 1 $, a necessary condition for any disease outbreak (Keeling et al., Reference Keeling, Rohani and Pourbohloul2008). This is because $ R=1 $ represents a neutral reproduction rate, whereas $ R<1 $ and $ R>1 $ represents a decline and increase of spread, respectively.
Implementing the Model
We simulated our model in MATLAB to prepare for a quasi-worst-case-scenario. We employed a fourth-order Runge-Kutta method (Hubbard & West, Reference Hubbard and West1991) to numerically solve the above specified ordinary differential equations, with the range $ 0\le t\le 250 $ and stepsize $ h=1/3 $ (both measured in days). As for the initial conditions for each coast, we assume that a single infected person is introduced, and that there were no pre-existing immune responses that may help defend against the virus due to SARS-CoV-2 being sufficiently divergent from other CoVs (Bao et al., Reference Deng, Bao, Liu, Xiao, Liu, Xue, Lv, Qi, Gao, Yu, Xu, Qu, Li, Xiang, Yu, Gong, Liu, Wang, Wang, Song, Liu, Zhao, Han, Zhao, Liu, Wei and Qin2020; Lu et al., Reference Lu, Zhao, Li, Niu, Yang, Wu, Wang, Song, Huang, Zhu, Bi, Ma, Zhan, Wang, Hu, Zhou, Hu, Zhou, Zhao, Chen, Meng, Wang, Lin, Yuan, Xie, Ma, Liu, Wang, Xu, Holmes, Gao, Wu, Chen, Shi and Tan2020). Hence, the entire population is initially susceptible due to the virus.
Results
Using elements outlined in the previous sections to model the situation for the West and East Coast population regions, as well as the entire US, we ran 50 simulations using the model described above. We then compared our simulation results to current data between January 20, 2020 and March 19, 2020 (Kaggle, 2020), assuming a delay in reporting time of about 8 to 10 days and using population estimates derived from (US Census Bureau, 2020).
The West Coast
The first case of COVID-19 in the West Coast was reported on January 20, 2020 in Seattle, WA. We used a population estimate of 53 million. If January 20, 2020 is designated Day 0, we assume Phase 2 started at around Day 45, and Phase 3 started at around Day 60. The simulation results suggest that, under quasi-worst-case-scenario, the data for the number of people infected by COVID-19 is far more than reported: about 80% of $ {I}_C $ (those infected but not isolated) are not accounted for. This is probable as many symptoms of COVID-19 may be passed off as just cases of mild flu. In the quasi-worst-case-scenario, we predict the number of reported infections at its peak to be about 25,000 and the actual number of infections at about 90,000 (Fig. 2A).
The East Coast
The first case of COVID-19 in the East Coast was reported on February 1, 2020 in Boston, MA. However, since no other new cases were reported for about two weeks, we assume a starting time of one month later than for the West Coast. In the graphs below, we will still plot them starting from the first reported case in the US (i.e. January 20, 2020). For the East Coast, we used a population estimate of 152 million. Based on government actions, we also assume that the starting dates for Phases 2 and 3 are about a week behind that of the West Coast. The simulation results are similar to that of the West Coast; for a quasi-worst-case scenario, the actual number of infections is predicted to be about 400,000 for reported, and about 1,450,000 for the actual number of infections (Fig. 2B).
The United States: Hypothetical 25% Infected Worse-Case Scenario
As COVID-19 has escalated into a pandemic, we decided to compare to the previous 2009 Influenza (H1N1) pandemic. Although H1N1 and SARS-CoV-2 are distinct viruses with different virology and epidemiology, 2009 H1N1 pandemic influenza is the only other pandemic to have emerged in modern society and can help potentially help project the full extent of the spread of COVID-19. Considering that around 24% of people were infected during the 2009 H1N1 pandemic (Kerkhove et al., Reference Kerkhove, Hirve, Koukounari and Mounts2013), as well as the non-static nature of our parameters, we ran another simulation on the entire US population with approximately 22% to 26% of the population infected (based on randomness) to predict such a scenario. For this simulation experiment, the values we used for $ R $ and the starting days of the phases was a rough weighted estimate of the two coasts. The values for the other parameters remained the same, as described in the previous section. We also assumed five infected individuals were introduced into the population. As Fig. 2 C illustrates, in this quasi-worst-case-scenario, about 3.8 million of the US population would die from COVID-19 assuming a 5% mortality rate, and the remaining 76.2 million would recover.
Discussion
This paper presents a model for simulating the scale of COVID-19 pandemic in the United States. The simulations focused on the West and East Coast population regions (Figs. 2A, 2B) and also modeled a hypothetical worst-case scenario for the entire US (Fig. 2C). The model is different from other computational models because factors specific to the US are considered, such as the three phases of government response, travel restrictions that restrict cross-boundary transmissions, limits to testing kits and healthcare accessibility resulting in semi-strict isolation, as well as different transmission dynamics during each response phase.
The model fits the reported data by assuming that about 80% of the non-isolated cases are not reported; however, the actual number of cases may be much higher (Figs. 2A, 2B). That assumption agrees with a prior study (Li et al., Reference Li, Pei, Chen, Song, Zhang, Yang and Shaman2020) and provides another evidence that COVID-19 is not easy to contain. We arrived at an 8 to 10 days reporting delay in a method similar to (Li et al., Reference Li, Pei, Chen, Song, Zhang, Yang and Shaman2020). While this is an important factor to consider in disease modeling, the actual reporting delay of COVID-19 is currently too early to tell, and reporting delays usually show very high variance for many reasons (e.g. nature of the disease, transparency of healthcare systems). It may be worthwhile to note that some common distinctive respiratory viral diseases were found to have a mean of 9 days in reporting delay elsewhere (Marinović et al., Reference Marinović, Swaan, Steenbergen and Kretzschmar2015), a value which aligns with our simulation estimates. The model also predicts that the peak of the outbreak would occur by early April, and the outbreak would wind down by the start of July. However, the model does not account for delays in reporting; as such, one should expect the peak reported number of infections to occur around mid-April. The peak would also occur sooner on the West Coast, which was affected first. All peaks should occur around mid-April unless there are major changes to the reproduction rates (for which there is no current evidence), significant lapses during the handling of COVID-19 or maintaining social distancing. Otherwise, COVID-19 will only continue to progress for the worse in terms of scale and duration; for instance, the July cool-down might be delayed, or worse, the US could even face subsequent outbreaks reminiscent of historical outbreaks. It should be noted that the trends in the model’s estimates share similarities with the epidemic curves in some nations during the SARS-CoV-1 outbreak (Wallinga et al., Reference Wallinga and Teunis2004).
One way to mitigate the spread of COVID-19 is to successfully isolate more infected people, i.e. to have a higher relative proportion of $ {I}_H $ over $ {I}_C $. That outcome should be achieved when defective testing kits are replaced and more testing kits are allocated, so that more exposed people can be tested and isolated accordingly. For instance, if 50% more people had been tested since the start of the outbreak, the model would predict 40% of the projected severity (Figs. 3A, 3B). As the US faced mishaps in the allocation of testing kits during the initial handling of COVID-19 (Yeo & Ganem, Reference Yeo and Ganem2020), the original peak is likely inevitable. Testing more people from this point onwards (i.e. late March) would help accelerate the reduction; a simulation for is shown in Figs. 3C, 3D. Additional ways to mitigate the mortality of COVID-19 include improving accessibility to healthcare and boosting quantities of essential supplies such as masks, respirators, and ventilators. However, the sudden onset of COVID-19 and the overall unpreparedness of the United States make these data inaccessible (Yeo & Ganem, Reference Yeo and Ganem2020). It should also be noted that while the currently global mortality of COVID-19 is 3.8%, it is noticeably different across nations (Kaggle, 2020; Lazzerini et al., Reference Lazzerini and Putoto2020). As such, the actual case-fatality rate of COVID-19 in the US cannot be accurately determined currently until further progress of COVID-19. As a result, we chose a 5% mortality rate in our quasi-worst-case scenario prediction of COVID-19 in the US, which is the weighted average of top 5 mortality rates in countries with noticeable number of COVID-19 cases (>1500) as of March 18 (Lazzerini et al., Reference Lazzerini and Putoto2020). It is also important to note that other factors (e.g. abiotic, immunological) can further modulate our parameters and affect our current projections. As a result, our study used data from prior studies based on other countries hit relatively hard by COVID-19 such as China and Italy, and enables us to construct a quasi-worst-case model of COVID-19 in the US.
Our model and projections can be further improved when transmission dynamics in the US begin to be documented and governmental response begin to be properly executed. As of today (August 8, 2020; manuscript revision), the trends observed initially bore resemblance to our simulations, albeit at a smaller scale as we are doing a quasi-worst case scenario. However, this has sharply deviated since the start of June where the US is currently experiencing a substantial surge of new cases, especially a spike of new cases in July that is twice the April peak (CDC, 2020). This can be attributed to many clear evidences that distancing measures have not been properly followed nor well-implemented. Some examples include several states actively pushing for re-opening despite warnings from health officials, social justice movements across the US, et cetera. In consideration of these unexpected events that have significantly perturbed the reproduction rate, we can introduce a “Phase 4” to account for this by increasing the reproduction rate in the East and West Coasts at Day 135 (see Supplementary Material).
We are mindful that computational simulations are, by their very nature, approximations. There are currently no predictive models that satisfactorily produce a picture of the spread or clinical impact of the disease, including ours, as too many factors can affect the spread of a disease. In our model, we consider the East and West Coasts of the US as strictly isolated regions with static total population. While this is not true in reality, minimal movements between the coasts is expected once distancing measures are implemented. Moreover, our predictions can vary noticeably with small variations in assumptions and assigned parameter values. To cite one example, minor changes in the distribution between $ {I}_H $, $ {I}_C $, and $ R $ can dramatically affect the model’s predictions (see Supplementary Material). Also, at the societal level, the length of time that the cohort of patients remain hospitalized is often unknown in the early stages of a pandemic and can greatly influence the use and deployment of medical supplies and personnel, thus altering the course of the infection and recovery rates. In addition, our test-and-isolate scenario is also quite optimistic as we assumed that hospitalized individuals can be fully isolated regardless of medical capacity despite a potential surge in COVID-19 patients. While some countries have been able to successfully contain COVID-19 by relying on efficient tests and isolation, it remains uncertain whether the US is able to similarly do so in light of constrains to testing, healthcare supplies, and capacities (Yeo & Ganem, Reference Yeo and Ganem2020). Nevertheless, the few scenarios presented and discussed in this paper strongly suggest that even with current containment and mitigation efforts, COVID-19 outbreak will significantly impact the health of the US population in the East and West Coast.
Conclusion
We constructed a modified SEIR computational model using current data that reflect the initial stages of COVID-19 in the US while considering different transmission dynamics across the anticipated phases of response to COVID-19 over the next months. Our current model projects that the number of new cases would peak around mid-April 2020, but our projections could potentially change over the course of COVID-19 in consideration of external factors (e.g. abiotic factors and governmental responses) that can significantly modulate transmission dynamics in the US, as well as the sensitivity of our model to adjustments in values assigned for each parameter. Future work will include adjusting the model accordingly to real-time COVID-19 situations (e.g. see Supplementary Material) and implementing the model to different disease outbreaks and other communities. We hope that this study provides an effective framework for modeling future disease outbreaks, help the mathematical modeling community to project COVID-19 and other past/present/future outbreaks, and make the nation more aware of the scope and nature of COVID-19.
Acknowledgments
We thank Professor Bruce Ganem (Department of Chemistry and Chemical Biology) and Professor John Muckstadt (School of Operations Research and Industrial Engineering) of Cornell University for their encouragement, invaluable advice, and dedicated guidance throughout the study. We also thank Amandine Gamble (Department of Ecology and Evolutionary Biology) of UCLA for helpful comments on the manuscript.
Author Contributions
YYY conceived the study and obtained the data. YRY and WJY designed and implemented the model and interpreted the results while YYY provided biological context. YYY, YRY and WJY wrote the manuscript. All authors contributed equally to the study.
Funding Information
This research received no grants from any funding agency, commercial, or not-for-profit sectors.
Data Availability Statement
The data used in the study are openly available at (Kaggle, 2020; US Census Bureau, 2020).
Conflict of Interest
The authors declare that they have no competing interests.
Supplementary Materials
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/exp.2020.45.
Comments
Comments to the Author: (Overall comment omitted due to word limit)
- Evidence not available at time of writing: I understand, but neither the readers nor myself being able to read the cover letter, the author should either (i) update/validate their assumptions with the latest data to justify their conclusions, or (ii) clarify their analysis and conclusions were rather historical and might not be in line with the latest data. The paper may provide a case study (if technical issues below are addressed), but the conclusions are not justified unless the underlying assumptions are up-to-date.
- Feasibility of test-and-trace: The predicted curves are 10 –100 times the SK and SG. I suggest focusing on isolation rather than hospitalization unless the bed occupancy is estimated using lengths-of-stay and medical capacity data.
- Change reflecting R=1: I realized x-axis shows timesteps: please use unit of day.
- 25% scenario: R0 determines the final size with the homogeneous-mixing assumption. Though R0 used for this scenario is not shown, I feel using the initial R throughout may be more appropriate for the quasi-worst-case-scenario.
- Code: It is nice that the authors uploaded their code. I found two problems:
1. 1/gamma and 1/alpha were drawn from Erlang distribution independently at each timestep; the resulting distribution of incubation/infectious periods are not following Erlang distribution.
2. betaa is calculated for each individual as R×gamma[l], which suggests that individuals change their infectiousness according to their duration of infectiousness so that the product becomes constant R, which is implausible in actual transmission dynamics.