With so much emphasis on modeling the epidemic and the use of model results to guide (really, mis-guide) measures to mitigate the effects of the epidemic, I thought it might be helpful to try to help people understand exactly how the Minnesota model works. Models are complicated, and trying to explain them in simple, understandable terms is difficult, but I will do my best. I have been aided in the effort by Darin Oenning, who helped me make sure I understand how the math in the formulas worked, and others who gave me very useful insights. This is lengthy, so I am breaking it into multiple posts. In the first post, we will cover the basic scheme of the Minnesota model. In the next one we will go over the initial steps of the progression into the model, leading up to the Infected bucket. In the third post, we cover movement from the Infected bucket through the rest of the scheme. Then, the fourth post will go through the model runs as applied against various mitigation schemes. And finally, I am going to present an alternative model scheme, one similar to the one I drew up for an earlier post. I appreciate all comments, especially any that show me the error of my ways in understanding this subject.
In working through the model, you will note two things. One is that I am describing the model based on a no-mitigation run. This is how the model works without mitigation efforts. The model runs looking at those mitigation measures primarily treats them as affecting contact rates. (In the real world, if an epidemic seems serious enough, some people are likely modifying their behavior without government intervention, but that isn’t taken into account in the no-mitigation world.) The second is that I have ignored the issue of health resource overrun. Primarily this is because it is not clear to me that such overrun will occur in any reasonable scenario. But more importantly, the way the issue is handled in the Minnesota model is extremely confusing to track through the formulas. It is simpler to first go through the model assuming health resources are always adequate. Then if on any of the days covered by a model run and scenario, it appears that the patients needing care exceed the resources required to provide that care, you can address that in a separate satellite model which identifies the impacts of such a shortfall. The Minnesota model materials, again, can be found at (Modeling Materials).
So, let’s plunge in. Below is the basic scheme of the Minnesota model, as it appears in the technical paper. For the rest of the post, we are just being descriptive, generally not discussing the mathematical or formula aspects of the model.
This is a fairly standard approach to modeling epidemics, using a “SEIR” model. You would strongly suspect that the basis for this model is the infamous Imperial College work, since there are several references in the Minnesota modeling materials to obtaining certain estimates from the paper describing that Imperial College work. (One note about relying on the Imperial College work–it was based largely on data from China, data that we now know is extremely suspect, both incomplete and likely inaccurate.) The model basically moves people from the Susceptible group to the Exposed group to the Infected group, and from there the Infected group is split into three subgroups, those who end up being Hospitalized, those who end up in an ICU and those who become Recovered. The Recovered group includes all those from the Hospitalized group, who by the modelers’ definition must go to Recovered, along with a portion of people from both the ICU group and the Infected group. Unfortunately, some number of people die from the infection and in this model all those deaths are presumed to occur in the ICU, so you see the branch down to Dead. Now what I described about the disposition of those in the Hospitalized group is what the paper says, but the formulas are actually more complicated and don’t quite work that way.
The Susceptible group is just your starting population–could be a country, a state, whatever group or subgroup you want to model the effects of the epidemic on; for this model it is presumably the population of Minnesota. An interesting side note is that a population is never really isolated, but the models of anything less than the world usually assume they are, otherwise you have very complex efforts to identify travel and contact effects among various populations, although in reality we know that the epidemic was initially spread in the United States by travelers from China and other countries.
The Exposed group is just what it sounds like, the people who got exposed to the virus. The Infected group is just what it sounds like, the people who got infected by the virus. The Hospitalized group is that portion of the Infected group who needed hospitalization. The ICU group is that portion of the Infected group who needed an ICU and a ventilator. The Recovered group is, as described above, what is left over from those who didn’t need a hospital or an ICU, plus those who lived to make it out of the Hospitalized or ICU groups. Those who don’t make it out obviously end up in the Dead bucket.
Now, the model has some clinically weird characteristics. Without going into the math in this post, and without being able to see what the actual model run looked like (the model was run for the year from March 23, 2020 to March 22, 2021) it basically assumes that, over time, everyone who is Susceptible becomes Exposed and everyone who is Exposed becomes Infected. From a clinical perspective these are very suspect assumptions. First, at some point epidemics reach a natural stasis point, even without interventions. As more and more of the population becomes infected and presumably develops antibodies to re-infection, it becomes harder and harder for the virus to find new hosts, and infection transmission basically ceases. The more infectious a virus is, the higher the percent of the population that needs to have developed that immunity to stop the transmission process. This coronavirus strain appears to be extremely infectious, so the percent of the population that needs to have immunity may be as high as 80% or 85%. But it is some percent and that is the realistic maximum of the Susceptible group that can become Exposed. It is not 100%. It may be that the modelers have some limit built in to the model, probably through the contact model, but we don’t know what that percent limit is from the materials we have been given.
The contact model just mentioned is the regulator of how many people in any given block of time move from Susceptible to Exposed. We don’t have the contact model formulas, but they are described to some extent in the technical paper. The contact model is based on European work on daily social contacts within and between age groups. The modelers say they intend to update this with more specific US and Minnesota data on social contact patterns. The model population is stratified into nine ten-year age groups, with the last one actually being everyone 80 years or older. The population is also stratified by whether a person does or doesn’t have one of several comorbidities, but the presence of these comorbidities does not affect the level of contacts. That I would think is a little clinically suspect as well; people who have multiple illnesses are not likely to have the same level of contacts as someone who is healthy. The contact model apparently does not encompass variation in population density for where an individual lives, nor does it pick up living situations which would increase contacts, like say for example, a nursing home or even in a family. Think that might have something to do with how quickly a person gets Exposed?
But since the contact model regulates movement from Susceptible to Exposed, and since everyone who is exposed is then presumed to become Infected, the main concern I have about the contact model is that it does not appear to take into account the variation in susceptibility of individuals, apart from age. In reality, what we are seeing in Minnesota is that the very most susceptible members of any age group are becoming infected early on. It does not appear to me that the model picks up this phenomenon, which could skew your perception of how serious the epidemic is. This is a classic statistics error, assuming that your sample is evenly pulled from the population. If it isn’t, and it certainly isn’t with this epidemic, you are going to over-estimate or under-estimate the results for the whole population.
Next, the model assumes everyone who ends up in the Exposed bucket moves to the Infected bucket. This is also clinically dubious. There is ample evidence that for poorly understood reasons at this point, not everyone who is exposed to the virus becomes infected. One of the most striking examples to me is the testing done in the Italian village of Vo’, where children and in some cases adults, who lived in a household with one or more persons who were infected, did not themselves become infected. The Diamond Princess cruise ship, aircraft carrier examples, and the Boston homeless shelter results offer similar support to the notion of a group that does not get infected after exposure. (Please understand, I am assuming it is not that these people are infected but asymptomatic; they simply are not infected by the virus at a detectable level. An asymptomatic person has a detectible infection.) Given how infectious this virus is, these results are surprising, but good for the overall course of the epidemic. These exposed but uninfected, and apparently uninfectable, people are helping cut the transmission chain. Possible explanations are insufficient dose, or number of virus particles, to cause infection, which would imply the possibility of later infection if a higher dose is transmitted; antibodies to other coronaviruses which cause cross-reactivity to this strain; genetic variation in the receptors used to gain entry into a cell or other genetic variations affecting infection; or other protective mechanisms. At this point, it seems to be a bed assumption that everyone who is Exposed becomes Infected and instead of assuming everyone becomes Infected, some percent should be identified as Uninfected.
Now it gets even stranger. You are in the Infected bucket. You are going to one of three places initially, and ultimately to one of two places. Clinically what would actually happen is that there are some people who are asymptomatic–they are infected but have no signs of the disease, seek no medical attention, are blissfully unaware that they have the virus. There is another group that has mild or moderate disease. These people have symptoms and may seek medical attention, although I suspect many with mild symptoms in the initial phase of the epidemic just thought they had a cold. Now everyone is hypersensitized to the symptoms and some people are more likely to seek medical attention. The mild/moderate illness group, by my definition, would not be hospitalized, although many may receive some medical care.
Then you have the group that develops severe illness, which by my definition would be anyone who is hospitalized. But it could also include some people who are not hospitalized. Some people have advance directives that might preclude any hospital or intensive care. Some people may develop such a rapid exacerbation of the disease that they do not get to the hospital before expiring. We hear and it seems believable, that there are people dying at home. And for people who do need a hospital, most likely will go directly to a non-ICU ward in the hospital, and if their condition worsens, some portion may end up in the ICU, and some of those may return to a regular ward. Some portion of patients may go directly to the ICU. Some patients who are in the hospital may die, whether or not they are in the ICU at the time of death.
The Minnesota model does not capture this clinical flow. The paper explaining the model basically says once you are Infected, you either went to the hospital, you went to the ICU or you didn’t need to go to either, so you went right to Recovered. The model appears to start with the Hospitalized group and has a formula, based on the Imperial College paper and some other data, using hospitalization rates as a percent of cases by age band to move people from Infected to Hospitalized. At this point that data on hospitalization rates is highly questionable. To determine who goes to the ICU bucket, the model uses a similar formula based on ICU rates. There is then some crosstalk between the formulas for Hospitalized and ICU that appears to be an attempt to correct for the errors in clinical flow, as well as for the dreaded shortfall of ICU capacity, but the crosstalk is inconsistent with the schematic or what the technical paper says is happening. Whoever doesn’t go to the Hospitalized or ICU buckets ends up in Recovered. And, as I pointed out before, the paper says deaths only occur in the ICU group, so all Hospitalized must end up in the Recovered group (although again, there is crosstalk between the actual formulas).
But now we must consider a truly interesting term in the formulas for moving people out of the Infected buckets to their intermediate destination. That term is δ or the Greek letter delta. It interacts with the severity rates used in the formulas and is in the Hospitalized, ICU and Recovered bucket formulas. It is the “detection rate”, which is the percentage of those actually infected who are detected, presumably by infection testing. As will be discussed more in Part 2, to make the model on Day 1 fit apparent real world facts, the modelers set this detection rate to 1% to match observed death rates. But as the model runs past Day 1, the detection rate was set at 75% or .75 based on an interview with the CDC director who said that 25% of cases are asymptomatic. So the modelers said, well we are going to miss 25% of the Infected because they are asymptomatic so we will adjust each of the formulas downward. (And they put a wide range on this as well, from 50% to 90% of cases being detected.) Now, we pointed out before that there is a more logical way to deal with this flow, by bucketing the Infected group by clinical severity. But because the modelers didn’t do that, and because they treated everyone as becoming Exposed and everyone who is Exposed as becoming Infected, they know they created too big a pool to go to the Hospitalized, ICU or Recovered groups.
But here is the most amazing thing. Because delta is in every one of the formulas, unless the modelers have some other formulas they haven’t shown us, this 25% of the Infected people who are supposedly asymptomatic don’t go to the Recovered bucket, which is what you might expect would happen. No, they disappear, they go to limbo. Truly an amazing feat of legerdemain, just making people disappear like that. Just think about that for a minute, you would assume that the Recovered and the Dead must sum to 1oo% of the Infected, but nope, it is only 75%. Something is seriously screwed-up.
And perhaps more importantly now, what is a reasonable estimate for the detection rate given the severity rates being used? Many emerging studies suggest a much higher level of infections than is shown by positive test results. Most of these studies indicate detection rates under 20%. The former head of the FDA today said he believes the detection rate is only 5% to 10%. Now look at what happens to the percents of the population who actually need hospitalization or ICU or who die when you change the detection rate down to that level. The results change dramatically.
I believe that one fundamental principal for constructing a model is that its schema should as much as possible reflect what actually happens. It lays the groundwork for how you think about the formulas that define the dynamics of the model over time. In this case, the model should reflect clinical processes. It doesn’t and this results in some very basic errors that seriously skew the results away from reality. The most obvious examples are the assumptions regarding who becomes Exposed and Infected, and the 25% of Infected persons who simply vanish in the model.
One final point about the model construction. Most of the important formulas are probablistic; they run a range of possibilities for a parameter or formula term. So your results from any given run of the model have a central estimate, which might be thought of as the most likely outcome, but they have a range of uncertainty derived from those various alternative possible values of the parameters or terms. The more uncertain you are about parameters or terms, the wider the end range of uncertainty. The Minnesota model has a very, very wide range of uncertainty in its results. That is a warning sign about using it for decision-making.
So, after all that, key takeaways would be:
- the Minnesota model is clinically illogical, particularly in how it categorizes cases
- the model uses suspect or dated information for many assumptions and parameters
- the model overstates both Exposed and Infected persons, which makes all the result numbers too high
- the model attempts to correct for some of its shortcomings by the mysterious delta term, but this appears to result in 25% of Infected persons just disappearing from the model, not ending up in either the Recovered or Dead bucket
- the model inherently has a high level of uncertainty in its results