Skip to Main Content

Biostatistics Seminar: BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic

May 06, 2020
Biostatistics Seminar: BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic
  • 00:03- All right, and it says the meeting is being recorded.
  • 00:06Okay, so thanks everyone,
  • 00:10for coming to this seminar.
  • 00:13And I hope everyone is doing well.
  • 00:17Today, I'm going to talk about some issues
  • 00:21of selection bias in early analysis
  • 00:24of the COVID-19 pandemic.
  • 00:28You can find the manuscript on line, on arXiv,
  • 00:31and the slides of this talk is also available on my webpage.
  • 00:38So, here are the three collaborators,
  • 00:42involved in this project.
  • 00:45So Nianqiao is a PHD student at Harvard,
  • 00:48and we kind of only met online.
  • 00:50We never met in person, and I sort of created
  • 00:54a dataset in January, and I wanted some help,
  • 00:58and somehow she saw this and she said: I could help you.
  • 01:03And we kind of developed a collaboration.
  • 01:08And Sergio and Rajen are both, ah,
  • 01:12lecturers in the Stats Lab in Cambridge.
  • 01:18And I'd like to thank many, many people
  • 01:19who have given us very helpful suggestions.
  • 01:23This is just some of them.
  • 01:28I'd like to begin with just saying COVID-19
  • 01:32is personal for everyone, and what I would share
  • 01:37is partly my story, my personal story with COVID-19.
  • 01:44So here is a photo of me and my parents,
  • 01:50taken last September, when I went back to China,
  • 01:56to see my family.
  • 01:58So both myself and my parents,
  • 02:01we all grow up in Wuhan, China.
  • 02:06And on a sunny day in September, we went to,
  • 02:10well, this is the Yellow Crane Tower,
  • 02:13a sort of landmark building in Wuhan.
  • 02:17And the funny thing is, I think I've never been there,
  • 02:20on top of the tower, in my entire life.
  • 02:24And this is actually the first time I went there.
  • 02:27This is something like if you have a famous local attraction
  • 02:32for tourists, you actually don't go, as a local.
  • 02:39And so, on January 23, because the epidemic
  • 02:43was growing so fast in Wuhan, it started a lockdown.
  • 02:52So, if we went on top of the Yellow Crane Tower,
  • 02:56this is what we would see on a typical day,
  • 03:00before the lockdown.
  • 03:02And on the right, so, there's sort of what happens
  • 03:06after the lockdown, and I liked how the journalist
  • 03:10used sort of this gloomy weather as the background,
  • 03:13and certainly reflected everybody's mood,
  • 03:17after the lockdown.
  • 03:21So, this project begins on January 29.
  • 03:26So had a conversation with my parents over the phone,
  • 03:30and they told me that a close relative of ours
  • 03:35was just diagnosed with, quote/unquote, viral pneumonia.
  • 03:41So, basically at that point, we all think that must
  • 03:45be COVID-19, but because there was not enough tests,
  • 03:51this relative could not get confirmed.
  • 03:54And this prompted me to start looking
  • 03:56through the data available at the time.
  • 03:59But I quickly realized that the epidemiological data
  • 04:03from Wuhan are very unreliable.
  • 04:07And here is some anecdotal evidence.
  • 04:10The first evidence is about inadequate testing.
  • 04:16So actually this relative of mine could not get
  • 04:18an RT-PCR test until mid-February,
  • 04:22and she actually developed symptoms on about January 20.
  • 04:29So by mid-February, she was already recovering.
  • 04:33And she took, I think, several tests.
  • 04:36Her first test was actually negative,
  • 04:38and a few days later she was tested again,
  • 04:40and the result came back positive.
  • 04:43So there's also a lot of false negative tests.
  • 04:46I think, in general.
  • 04:49And another problem with the epidemiological data from Wuhan
  • 04:53is insufficient contact tracing.
  • 04:56So, her husband, this relative of mine's husband,
  • 05:03he also showed COVID symptoms, but he quickly recovered
  • 05:08from that, and in the end he was never tested for COVID.
  • 05:17So, you can also see the insufficient testing
  • 05:19from this incidence plot.
  • 05:22So this is the daily confirmed cases, up until mid-February,
  • 05:29and this is when the travel ban started,
  • 05:33or the lockdown started, January 23,
  • 05:36and on February 12, there was a huge spike
  • 05:41of over 10,000 cases, much more than the previous few weeks.
  • 05:50And the reason for that was not suddenly because people
  • 05:54were infected on that date.
  • 05:57It's because of a change of diagnostic criterion.
  • 06:01So before February 12,
  • 06:04everybody needs to have a positive RT-PCR test
  • 06:10to be confirmed a COVID-19 case.
  • 06:13But since February 12, because there,
  • 06:16the health system in Wuhan was so overwhelmed,
  • 06:20the government decided to change diagnostic criterion.
  • 06:23So without RT-PCR tests, you can still be diagnosed
  • 06:28with COVID-19 if you satisfy several other criteria.
  • 06:34And this sort of change in diagnostic criteria
  • 06:37only happened in the Hubei Province
  • 06:41and not elsewhere in China.
  • 06:45So a solution, if we like to avoid these problems
  • 06:50with data from Wuhan, so one clever solution
  • 06:55is to use cases that are reported from, sorry,
  • 06:58exported from Wuhan.
  • 07:01So this has two benefits.
  • 07:03First of all, testing and contact tracing
  • 07:05were quite intensive in other locations.
  • 07:09So, it's reasonable to expect that a lot of the bias
  • 07:13due to sort of under-ascertainment will be less severe
  • 07:16if we use data from elsewhere.
  • 07:20And also, many locations, particularly in some cities
  • 07:26in China, published detailed case reports,
  • 07:31instead of just case counts.
  • 07:34And if you look at these detailed case reports there are
  • 07:36a lot of information that can be used for inference.
  • 07:44This is not our idea.
  • 07:47And I think one of the, at least one of the first persons
  • 07:51to use this design was a report from Neil Ferguson's group
  • 07:57in Imperial College, London,
  • 07:59and they published a report on January 17,
  • 08:03and what it did was a simple sort of division of the number
  • 08:07of cases detected internationally, over the number
  • 08:11of people traveled from Wuhan, internationally.
  • 08:15And they found that it could be
  • 08:18over 1,700 cases by January 17, in Wuhan.
  • 08:26So, I started this on January 29,
  • 08:30and within about two weeks, managed to put something online.
  • 08:37Which we also used internationally confirmed cases
  • 08:40to estimate epidemic growth.
  • 08:44And what we used were 46 coronavirus cases
  • 08:48who traveled from Wuhan and then were subsequently confirmed
  • 08:53in six Asian countries and regions.
  • 08:59And the main result was that the epidemic was doubling
  • 09:02in size every 2.9 days.
  • 09:06And we used the Bayesian analysis, and the 95 percent
  • 09:10critical interval was two to 4.1.
  • 09:14And of course, when I was writing this article,
  • 09:17I was mostly just working on this dataset that we collected,
  • 09:22very hard and (muttering), thinking about what model
  • 09:27is suitable for this kind of data.
  • 09:30And just before I posted this pre-print,
  • 09:34I realized there was a similar article
  • 09:38that already published in The Lancet, on January 31.
  • 09:45And what's really puzzling is they used almost the same data
  • 09:51and very similar models, but somehow reached
  • 09:54completely different conclusions.
  • 09:58So they used data from December 31 to January 28,
  • 10:02that are exported from Wuhan internationally.
  • 10:05And they would like to infer the number
  • 10:07of infections in Wuhan.
  • 10:10And one of the main results,
  • 10:12which was this epidemic doubling time, was 6.4 days,
  • 10:16and the 95 percent critical interval was 5.8 to 7.1.
  • 10:21So that's drastically different from ours.
  • 10:24So again, ours was 2.7, within two to four,
  • 10:29and this was 6.4.
  • 10:33And this is talking about the doubling time.
  • 10:36So the doubling time of six days versus three days,
  • 10:40that's sort of really, really different.
  • 10:43And the confidence intervals, the credible intervals
  • 10:45didn't even overlap.
  • 10:49So I was really puzzled by this.
  • 10:52And before I tell you what I think,
  • 10:58how the Lancet paper got it wrong,
  • 11:01I'd like to just show you this plot.
  • 11:03You probably have seen this many times before,
  • 11:05in news articles, which is just sort of a logarithm
  • 11:10of the total cases versus the days, ah,
  • 11:16or some time, zero, for each country.
  • 11:21And what you see is for both the total number of cases
  • 11:26and the total number of deaths,
  • 11:29it sort of grew about 100-fold in the first 20 days.
  • 11:35At least among these countries
  • 11:36that were most hard-hit by COVID-19.
  • 11:42And if you just use that as a variable of estimate,
  • 11:45of the doubling time, that corresponds
  • 11:47to a doubling time of three days.
  • 11:52Of course, this is sort of very kind of anecdotal,
  • 11:56because this data were not collected in a very careful way,
  • 12:01and the amount of cases were not reported,
  • 12:04but this is just to show you that perhaps
  • 12:07the doubling time of 6.4 days was a bit just, too long.
  • 12:14So, towards the end of the talk,
  • 12:17I'll tell you what we think led
  • 12:21to these very different results.
  • 12:24Just some spoilers, so the crucial difference
  • 12:30is that the Lancet study actually did not
  • 12:33take into account the travel ban on January 23.
  • 12:38And that actually had a very,
  • 12:39very circumstantial selection effect on the data.
  • 12:45And this will be made precise later on in the talk.
  • 12:53So, for the rest of the talk,
  • 12:54I'll first give you an overview of selection bias.
  • 12:57So no math, just sort of an outline of what kind
  • 13:01of selection bias you could encounter in COVID-19 studies.
  • 13:05Then I'll talk about how we sort of overcome them,
  • 13:08by sort of collecting the dataset very carefully
  • 13:12and building a model very carefully.
  • 13:17And then I'll talk about why
  • 13:20the Lancet study I just mentioned
  • 13:22and some other early analysis were severely biased.
  • 13:26If there is time, I will tell you a little bit
  • 13:29about our Bayesian nonparametric model.
  • 13:33And then I'll give you some lessons
  • 13:36I learned from this work.
  • 13:40So selection bias.
  • 13:42So we identified at least five kinds
  • 13:46of selection bias in COVID-19 studies.
  • 13:49So the first one is due to under-ascertainment.
  • 13:53So this may occur if symptomatic patients
  • 13:56do not seek healthcare, or could not be diagnosed.
  • 14:00So essentially, all studies using cases confirmed
  • 14:04when testing is insufficient,
  • 14:08would be susceptible to this kind of bias.
  • 14:11And there is no cure to this.
  • 14:14It may lead to varied kind of direction and magnitude
  • 14:21of bias, and basically what we can do is to,
  • 14:27to think about a clever design to avoid this problem,
  • 14:32to focus on locations where the testing is intensive.
  • 14:42The second bias is due to non-random sample selection.
  • 14:48So, basically this means that the cases included
  • 14:51in the study are not representative of the population.
  • 14:56So this essentially applies to all studies,
  • 15:03because detailed information about COVID-19 cases
  • 15:06are usually sparse; they're not always published.
  • 15:11But especially for studies that do not have a clear
  • 15:14inclusion criterion, and if they just sort of simply
  • 15:19collect data out of convenience, then there could be
  • 15:25a lot of non-random sample selection bias.
  • 15:30And again, statistical models are not really gonna help you
  • 15:33with this kind of bias.
  • 15:35You'd use, you'd follow some protocol for data collection,
  • 15:40and you would exclude some data that do not meet
  • 15:44the sample inclusion criterion.
  • 15:47Even when that may, leads to inefficient estimates.
  • 15:57The third bias is due to the travel ban.
  • 16:00This is kind of my spoiler about that Lancet study.
  • 16:06So basically, outbound travel from Wuhan
  • 16:09to anywhere else was banned from January 23 to April eight.
  • 16:16So if the study analyzed cases exported from Wuhan,
  • 16:21then they're susceptible to this selection defect.
  • 16:27And this would usually lead to underestimation
  • 16:31of epidemic growth, and the reason is that, so,
  • 16:35the epidemic is growing very fast,
  • 16:37but then you essentially can't observe cases
  • 16:41that were supposed to leave Wuhan after January 23.
  • 16:44So if you just wait for a long time,
  • 16:47and then look at the epidemic curve among the cases
  • 16:50exported from Wuhan, it may appear that, ah,
  • 16:55it sort of dies down a little bit,
  • 16:58but that's not because of the epidemic being controlled.
  • 17:01That's because of the travel ban.
  • 17:04And fortunately this bias, you can correct for it
  • 17:08by deriving some likelihood function
  • 17:10tailored for the travel restrictions.
  • 17:15The fourth bias is ignoring, is due to ignoring
  • 17:20the epidemic growth, and basically if you think about people
  • 17:25who have been in Wuhan before January 23,
  • 17:29they're much more likely to be infected
  • 17:31towards the end of their exposure period than early,
  • 17:37and that's because the epidemic was growing quickly.
  • 17:42So, there are many studies, or I should say
  • 17:45there are several studies of the incubation period
  • 17:48that simply treat infections as uniformly distributed
  • 17:52over the patients' exposure period to Wuhan.
  • 17:57And this will lead to overestimation
  • 17:59of the incubation period.
  • 18:02Because actually, the infection time is much,
  • 18:04much closer to sort of the end of their exposure.
  • 18:11And this is also a bias that can be corrected for,
  • 18:15by doing statistical analysis carefully.
  • 18:21The fifth and last bias is due to right-truncation.
  • 18:25So this happens in early analysis because,
  • 18:30to sort of win time to battle for this epidemic,
  • 18:36and to publish sort of fast.
  • 18:38So as you all know, there's a race for publications
  • 18:43about COVID-19; a lot of people sort of truncated
  • 18:48the dataset before a certain time,
  • 18:52but by that time the epidemic maybe
  • 18:54was still quickly growing or evolving.
  • 18:58And this could lead to some right-truncation bias.
  • 19:03And this generally would lead to underestimation
  • 19:07of the incubation period.
  • 19:10So this is, so incubation period, I forgot to mention,
  • 19:13is just the time between infection to showing symptoms.
  • 19:20So, right-truncation would lead to underestimation
  • 19:22of incubation period, because people with longer
  • 19:26incubation period may not have showed symptoms
  • 19:31by the time that these datasets were collected.
  • 19:38So the solution to this is we need to both collect cases
  • 19:45that meet the selection criterion, and continue
  • 19:48that data collection until a sufficiently long time.
  • 19:54Or, you derive some likelihood function to correct
  • 19:59for the right-truncation.
  • 20:00So we'll go over this later.
  • 20:04So just to recap,
  • 20:07so on a very high level, there are at least five
  • 20:11kinds of biases in COVID-19 analysis.
  • 20:15And if you read sort of article pre-prints or use articles,
  • 20:20I think you will find some kind, I mean,
  • 20:24some resemblance of these biases in many studies.
  • 20:30And the keys to avoid selection bias is basically,
  • 20:35I mean, this is simple in words,
  • 20:38but you just do everything carefully.
  • 20:40You design the study carefully,
  • 20:42and collect the sample carefully,
  • 20:45and analyze the data carefully.
  • 20:47But the reality, of course, is not that simple.
  • 20:51And what I will show below, it's an example
  • 20:55of our try to eliminate or to reduce selection bias,
  • 21:02as much as possible.
  • 21:06So, let me tell you the dataset we collected.
  • 21:10So we found 14 locations in Asia,
  • 21:16some are international, so Japan, South Korea, Taiwan,
  • 21:21Hong Kong, Macau, Singapore.
  • 21:23Some are sort of in mainland China.
  • 21:27So there are several cities in mainland China.
  • 21:31So all these locations have published detailed case reports
  • 21:36from their first local case.
  • 21:40So, most of the Chinese locations, I mean,
  • 21:43they were done with the first wave of the epidemic
  • 21:46by the end of February.
  • 21:49So Japan, Korea and Singapore saw some resurgence
  • 21:54of the epidemic later on, and eventually,
  • 21:57they did not publish detailed case reports.
  • 22:02But for our purposes, these locations all published
  • 22:07detailed reports before mid-February,
  • 22:11and that's about three weeks after the lockdown of Wuhan.
  • 22:15So it's pretty much enough to find out
  • 22:19all the Wuhan exported cases.
  • 22:24So just to give you a sense of the kind of data
  • 22:28that we collected, this is sort of all
  • 22:32the important columns in the dataset,
  • 22:36and the particularly important columns are marked in red.
  • 22:42So, we collected, there was a case ID,
  • 22:49where the case lived, the gender, the age,
  • 22:54whether they had known epidemiological contact
  • 22:57with other confirmed cases, whether it has
  • 23:02known relationship with other confirmed cases.
  • 23:07This is sort of an interesting column
  • 23:09that basically we like to find out what cases were
  • 23:15exported from Wuhan, but that's, of course, not recorded.
  • 23:20I mean you can only infer that from what has been published.
  • 23:27So this is an attempt to do that.
  • 23:28So this column, outside column means that,
  • 23:32whether we think the data collector thinks
  • 23:35this case is transmitted outside Wuhan.
  • 23:39So most of the time, this is relatively easy to fill.
  • 23:45For example, if you've never been to Wuhan,
  • 23:47this entry must be yes.
  • 23:50But sometimes, this can be a little bit tricky.
  • 23:52For example, this person, the fifth case in Hong Kong,
  • 23:56is the husband of the fourth case in Hong Kong,
  • 24:00and they traveled together from Wuhan to Hong Kong.
  • 24:04So it's unclear if this case is transmitted
  • 24:11in or outside Wuhan, so we put a "likely" there.
  • 24:16And the other information are some dates,
  • 24:20the beginning of stay in Wuhan, the end of stay in Wuhan,
  • 24:26the period of exposure, which would equal to
  • 24:30beginning to the end of stay in Wuhan,
  • 24:33for Wuhan exported cases,
  • 24:35but can be different for other cases.
  • 24:41When the person, when the case arrived at a final location
  • 24:44where they are confirmed a COVID-19 case.
  • 24:48When the person showed symptoms.
  • 24:51When did they first go to a hospital,
  • 24:54and when were they confirmed a COVID-19 case.
  • 24:59So we collected about 1,400 cases with all this information.
  • 25:05And overall, I think our dataset was relatively high
  • 25:11in quality, and most of the cases had known symptom onset
  • 25:18dates; only nine percent of them have that entry missing.
  • 25:27So,
  • 25:30so one important step after this is to find out
  • 25:33which cases are actually exported from Wuhan.
  • 25:37So I've been using this terminology from the beginning
  • 25:41of the talk, but basically the case is Wuhan exported
  • 25:45if they are infected, if they were infected in Wuhan.
  • 25:50And then confirmed elsewhere.
  • 25:53So we had a sample selection criterion
  • 25:58to discern a Wuhan exported case.
  • 26:03I'm not going to go over it in detail,
  • 26:06but basically the principle we followed
  • 26:09is that we would only consider a case as Wuhan exported
  • 26:14if it passed a beyond a reasonable doubt test.
  • 26:19So basically, if we think there is a reasonable doubt
  • 26:21that the case could be infected elsewhere,
  • 26:26then we would say: let's exclude that from the dataset.
  • 26:31So this eventually gives us 378 cases.
  • 26:39Next I'm gonna talk about the model we used.
  • 26:46So the model is called: BETS.
  • 26:48It's named after sort of four key epidemiological events.
  • 26:53The beginning of exposure, the end of exposure,
  • 26:56time of transmission, which is usually unobserved,
  • 27:01and the time of symptom onset, S.
  • 27:06So what we will do below is we'll first define the support
  • 27:13of these variables, so we call that P.
  • 27:17Which is basically represents the Wuhan exposed population.
  • 27:24So this is the population we would like to study.
  • 27:28We will then construct a generative model
  • 27:31for these random variables.
  • 27:34Basically, for everyone in the Wuhan exposed population.
  • 27:39Then, to consider the sample selection,
  • 27:42we'll define a sample selection set, D,
  • 27:46that corresponds to cases that are exported from Wuhan.
  • 27:51Then finally we will derive likelihood functions
  • 27:54to adjust for the sample selection.
  • 27:57So essentially, what we're trying to infer is
  • 28:01the disease dynamics in the population, P,
  • 28:05but we only have data from this sample, D.
  • 28:11So here's a lot of work that needs to be done
  • 28:14to correct for that sample selection.
  • 28:20So intuitively, this population P are just all people
  • 28:23who have stayed in Wuhan, between December first
  • 28:29and January 24, so anyone who has been in Wuhan
  • 28:36for maybe even just a few hours,
  • 28:39they would count as someone exposed to Wuhan.
  • 28:45And I'm going to make some conventions to simplify
  • 28:51this set, P, a little bit.
  • 28:54So B equals to zero has a special meaning.
  • 28:59So, so zero is the time zero,
  • 29:02which is 12 AM of December one.
  • 29:06And it means that they actually started their stay in Wuhan
  • 29:11before time zero, so they live in Wuhan essentially.
  • 29:16And B greater than zero means these other cases
  • 29:21visited Wuhan sometime in the middle of this period,
  • 29:25and then they left Wuhan.
  • 29:29So E equals to infinity means that the case did not arrive
  • 29:33in the 14 locations we are considering
  • 29:36before this lockdown time, L.
  • 29:41So for the purpose of our study,
  • 29:42we did not need to differentiate between people who
  • 29:45have always stayed in Wuhan past time L,
  • 29:49or people who left Wuhan before time L,
  • 29:52but went to a different location
  • 29:55other than the ones we are considering.
  • 29:58So T equals to infinity means that the cases
  • 30:02were not infected during their stay in Wuhan.
  • 30:06So this could be infected outside Wuhan,
  • 30:08or it could be they were never infected.
  • 30:12And S equals to infinity means that the case
  • 30:16did not show symptoms of COVID-19,
  • 30:19and it can simply be, they were never infected.
  • 30:22Or the case was actually tested positive for COVID-19,
  • 30:27but never showed symptoms, so it's, they're asymptomatic.
  • 30:34So under these conventions, this is the set,
  • 30:38this is the support for this population, P.
  • 30:41So B is between zero and L,
  • 30:44E is between B and L or infinity,
  • 30:47T is between B and E, which means that they are,
  • 30:51in fact, in Wuhan, or infinity.
  • 30:54And S is between T and infinity,
  • 30:56and S can be equal to infinity.
  • 31:00So now we have defined this population, P.
  • 31:04And now let's look at a general model,
  • 31:09a data-generated model for this population.
  • 31:15So, by the basic law of probability,
  • 31:18we could decompose the joint distribution
  • 31:21of BETS into these four, and the first two
  • 31:25are the distribution of B and E.
  • 31:27They are related to travel.
  • 31:30The second one, sorry, the third one is the distribution
  • 31:32of T given B and E.
  • 31:35So that's about the disease transmission.
  • 31:38And the last one is the distribution of S,
  • 31:41given BET, and that's related to disease progression.
  • 31:47So we need to make two basic assumptions,
  • 31:50and they are important because we would like to infer
  • 31:54what's going on in the population P,
  • 31:57from the sample T, from these Wuhan exported cases.
  • 32:02So we need to sort of make assumptions
  • 32:05so we can make that extrapolation.
  • 32:08So the first assumption, we assume it's about
  • 32:11this disease transmission, and it basically means
  • 32:14that the disease transmission is independent of travel.
  • 32:18So there is a basic sort of function that's independent
  • 32:22of the travel that's growing over time.
  • 32:27And then there's the rest of the points mass at infinity.
  • 32:33This T function, so, it will appear later on.
  • 32:37It's the epidemic growth function.
  • 32:40The second assumption is that the disease progression
  • 32:43is also independent of travel.
  • 32:46So, what's assumed here is basically
  • 32:49that there is one minus mu of the infections,
  • 32:56that are asymptomatic in that they didn't show symptoms.
  • 33:00The amount of people who showed symptoms,
  • 33:03the incubation period, which is just S minus T,
  • 33:07follows this distribution, H.
  • 33:11Okay, so H is the density of the incubation period,
  • 33:14for symptomatic cases.
  • 33:17And this whole distribution does not depend on B and E.
  • 33:24So these are sort of the two basic assumptions
  • 33:26that we relied on.
  • 33:30There are two further parametric assumptions
  • 33:32that were useful to simplify the interpretation,
  • 33:37but they can be relaxed.
  • 33:41So the next, one assumption is the epidemic
  • 33:45was growing exponentially before the lockdown.
  • 33:51And then that, the other assumption is that the incubation
  • 33:54period is gamma-distributed, okay?
  • 33:58So there's some parameters, kappa, R and alpha, beta.
  • 34:05So, don't worry about nuisance parameter mu,
  • 34:09which is the proportion of asymptomatic cases.
  • 34:12And kappa, which is some baseline transmission.
  • 34:16So it turns out that they would be canceled
  • 34:19in the likelihood function, so they won't appear
  • 34:23in the likelihood function.
  • 34:25And (muttering) these parametric assumptions,
  • 34:28they can be relaxed and they will be relaxed
  • 34:32in the Bayesian parametric analysis, if I can get to there.
  • 34:38But essentially, these are very useful assumptions
  • 34:42that allow us to derive formulas explicitly.
  • 34:50So I have covered the full data BETS model
  • 34:56for the population P.
  • 34:58Now we need to look at what we can observe.
  • 35:02So what we can observe are people in B
  • 35:07that satisfy three additional restrictions.
  • 35:12The first restriction is that the transmission
  • 35:15is between their exposure to Wuhan.
  • 35:23The second restriction is that the case needs to leave
  • 35:27Wuhan before the lockdown time, L.
  • 35:31The third restriction is that the case
  • 35:33needs to show symptoms.
  • 35:36So S is less than infinity.
  • 35:39So some of the locations we considered
  • 35:41did report a few asymptomatic cases, but overall,
  • 35:46asymptomatic ascertainment was very inconsistent.
  • 35:50So we only considered cases who showed symptoms.
  • 35:56So this gives us the set of samples
  • 36:01that we can observe in our data.
  • 36:09So, which likelihood function should we use?
  • 36:15For a moment, let's just pretend that the time
  • 36:17of transmission, T, is observed.
  • 36:20So if we had samples, ID samples from the population, P,
  • 36:25then we could just use this product of the density
  • 36:29of BETS as a likelihood function.
  • 36:34But this is not something we should use,
  • 36:36because we actually don't have samples from P.
  • 36:40We have samples from D, so what we should do is to condition
  • 36:46on the selection set, D, and use this likelihood function,
  • 36:52which is basically just the density divided by the
  • 36:56probability that someone is selected in this set, D.
  • 37:04Okay, this is called unconditional likelihood,
  • 37:07to contrast with the conditional likelihood.
  • 37:11So, in unconditional likelihood,
  • 37:14we consider the joint distribution of B, E, T, and S.
  • 37:18But in the conditional likelihood,
  • 37:20we consider the conditional distribution of T and S,
  • 37:25given B and E.
  • 37:26So this is the conditional distribution of the disease
  • 37:29transmission and progression, given the travel.
  • 37:32So this treats travel as fixed.
  • 37:35So to compute this conditional likelihood,
  • 37:38we need further conditions on B and E, okay?
  • 37:48But in reality, the time of transmission, T, is unobserved,
  • 37:52so we cannot directly use the likelihood function,
  • 37:55as on the last slide, so one possibility is to treat T
  • 38:01as a latent variable and use, for example, an EM algorithm.
  • 38:07The way we chose is to use an integrated likelihood.
  • 38:11That just sort of marginalized
  • 38:14over this unobserved variable, T.
  • 38:19So, the unconditional likelihood is the product
  • 38:23over the cases of the integral
  • 38:26of the density function over T.
  • 38:31And the conditional likelihood is just a product
  • 38:34of the integral of the conditional distribution of T and S,
  • 38:40over T.
  • 38:45So, the reason we sort of considered both
  • 38:49the unconditional likelihood and conditional likelihood
  • 38:51is that the unconditional likelihood is a little bit
  • 38:55more efficient, because it also uses information
  • 39:00in this density, BE, given your selected.
  • 39:06So that contains a little bit of information.
  • 39:09But a conditional likelihood is more robust.
  • 39:12So, because it does not need to specify how people traveled,
  • 39:18so it is robust to misspecifying those distributions.
  • 39:24So I'll stop here and take any questions up to now.
  • 39:36Is this clear to everyone?
  • 39:40If so, I'm gonna proceed.
  • 39:45Okay, so under these four assumptions
  • 39:49that I introduced earlier, you can sort of compute
  • 39:53the explicit forms of the conditional likelihood functions.
  • 39:57I'm not gonna go over the detailed forms,
  • 39:59but I just want to point out that first of all,
  • 40:02as I mentioned earlier, this does not depend on
  • 40:04the two nuisance parameters, mu and kappa.
  • 40:08And second of all, this actually reduces to a likelihood
  • 40:12function that's previously derived in this paper in 2009
  • 40:19by setting this R equals to zero.
  • 40:22So R equals to zero means that the epidemic
  • 40:24was not growing, so it's mostly a stationary epidemic.
  • 40:30So that's reasonable for maybe influenza, but not for COVID.
  • 40:40So for unconditional likelihood, we need to make
  • 40:42further assumptions about how people traveled,
  • 40:46the assumption we used was just a very simple,
  • 40:49sort of a uniform assumption,
  • 40:51uniform distribution assumption,
  • 40:52that assumes that the travel was stable
  • 40:55in the period that we considered.
  • 40:59And we use those assumptions,
  • 41:00we can derive the closed form unconditional likelihood.
  • 41:06There's a little bit of approximation that's needed,
  • 41:09but that's very, very reasonable in this case.
  • 41:18So, I'd like to show you the results
  • 41:22that fit in these parametric models.
  • 41:24So what we did is we obtained point estimates
  • 41:28of the parameters by maximizing the likelihood functions
  • 41:32I just showed you, and then we obtained 95 percent
  • 41:36confidence intervals, by a likelihood ratio test.
  • 41:41So, what you can see is broadly, over different locations,
  • 41:46the estimated doubling time was very consistent.
  • 41:52Also cross-conditional and unconditional likelihood,
  • 41:55so the doubling time was about two to 2.5 days.
  • 42:01And the median incubation period is about four days,
  • 42:07but there is a little bit of variability
  • 42:11in the estimates.
  • 42:14It turns out that the variability is mostly
  • 42:16because of the parametric assumptions that we used.
  • 42:21And then the 95 percent quantile is about,
  • 42:2712 to 14 days.
  • 42:29Or if you consider the sampling variability,
  • 42:31that is about 11 to 15 days.
  • 42:35Okay, but broadly speaking, across the different locations,
  • 42:40they seem to suggest very similar answers.
  • 42:47So, just to summarize, the initial doubling time
  • 42:51seems to be between two to 2.5 days.
  • 42:55Median incubation period is about four days,
  • 42:57and 95 percent quantile is about 11 to 15 days.
  • 43:03So, those sort of were our results,
  • 43:05using the parametric model.
  • 43:08And next I'm going to compare it with some other
  • 43:12earlier analysis, and give you a demonstration,
  • 43:18or an argument of why some of the other early analysis
  • 43:21were severely biased.
  • 43:23So first, let's look at this Lancet paper that I mentioned
  • 43:27in the beginning of the talk that estimated doubling time.
  • 43:30So the doubling time they estimated was 6.4. days.
  • 43:37So, what happened is these authors used a modified
  • 43:44SEIR model, so the SEIR model is very common
  • 43:48in epidemic modeling, so the modified that model
  • 43:51to account for traveling, but they did not account
  • 43:55for the travel ban.
  • 43:58So, basically to sort of simplify what's going on,
  • 44:05what they essentially did is they used the density
  • 44:09of the symptoms as in the population P,
  • 44:15so they fitted this density, but they fit it using, ah,
  • 44:20samples from the set D.
  • 44:26So it is quite reasonable to assume that the incidence
  • 44:29of symptom onset was growing exponentially in the population
  • 44:34that is exposed to Wuhan.
  • 44:37So given P, this distribution, margin distribution of S,
  • 44:42was perhaps growing exponentially before the lockdown.
  • 44:47But we don't actually have samples from P.
  • 44:49We have a sample from D.
  • 44:52So, we actually can derive the density of S and D,
  • 44:59and that looked very different from exponential growth.
  • 45:03So, basically the intuition is that if you look at
  • 45:06the distribution of the transmission, T,
  • 45:09it is growing exponentially, but it also has this effect,
  • 45:13this exponential RT times L minus T.
  • 45:17So basically, if you are transmitted on time T,
  • 45:20then you only have the time between T to L
  • 45:25to leave Wuhan and be observed by us.
  • 45:29Okay, so that's why it's not only exponential growth,
  • 45:32but there's also a decreasing trend, L minus T,
  • 45:39for the distribution of the time of transmission.
  • 45:43So from the time of symptom onset,
  • 45:45it's just the time of transmission,
  • 45:48convolved with the distribution of the incubation period.
  • 45:52And that has this form that is approximately
  • 45:56an exponential growth, and then times this term,
  • 46:00that is L plus some quantity that depends
  • 46:03on the incubation period and the epidemic growth, minus S.
  • 46:10So this is a term that is not considered,
  • 46:13in this simple exponential growth model.
  • 46:18Which is basically what's used in that Lancet paper.
  • 46:23Okay, so to illustrate this,
  • 46:26what I'm showing you here is a histogram
  • 46:29of the symptom onset of all the Wuhan exported cases,
  • 46:35who are also residents of Wuhan.
  • 46:37So they stayed from December first to January 23.
  • 46:43What you see is that it was kind of growing very fast,
  • 46:46perhaps exponentially in the beginning,
  • 46:49but then it slows down around the time of the lockdown.
  • 46:55Okay, so the orange curve is the theoretical fit
  • 47:00that we obtained in the last slide,
  • 47:04using the maximum likelihood estimator of the parameters.
  • 47:08So it fits the data quite will.
  • 47:12So what happened, I think, with the Lancet paper is,
  • 47:17so the basically stopped about January 28th,
  • 47:20so it's about here, and they essentially tried to fit
  • 47:23an exponential growth from the beginning to January 28.
  • 47:29And that would lead to much faster growth
  • 47:33than fitting the whole model to account for the selection.
  • 47:41Okay.
  • 47:44So that's about epidemic growth.
  • 47:46Next I will talk about several studies
  • 47:49of the incubation period.
  • 47:52So, these studies are susceptible to two kinds of biases.
  • 47:57One is that some of them ignore the epidemic growth,
  • 48:01so instead of using this likelihood function,
  • 48:04this conditional likelihood function,
  • 48:06to just fit this R is equal to zero,
  • 48:08and then they use this likelihood function
  • 48:10that was derived in the early paper.
  • 48:15The other bias is sort of right-truncation.
  • 48:20And basically, they kind of stopped
  • 48:22the data collection early and only used cases
  • 48:24confirmed by then, so people with long incubation periods
  • 48:29are less likely to be included in the data,
  • 48:33so that leads to underestimation of the incubation period.
  • 48:38And a solution to this is you can actually derive
  • 48:40the likelihood with additional conditioning events,
  • 48:43that S is equal, sorry,
  • 48:45less than or equal to some threshold, M.
  • 48:48Suppose you stop the data collection a week after M,
  • 48:52and you say: perhaps we have all, find out all the cases
  • 48:56who showed symptoms beforehand.
  • 48:59We can use this likelihood function.
  • 49:02I'm not gonna show you the exact form,
  • 49:04but basically you need to further divide by, ah,
  • 49:10the probability of S less than or equal to M,
  • 49:14and you can obtain closed-form expression for this
  • 49:18under our parametric assumptions.
  • 49:22Using integration by parts.
  • 49:25So, I'd like to show you an experiment
  • 49:29to illustrate this selection bias.
  • 49:33So in this experiment, we kind of stop the data collection
  • 49:38between any day from January 23 to February 18,
  • 49:43and we fitted sort of this parametric BETS model,
  • 49:48using one of the following likelihood.
  • 49:51So this is the likelihood that treats R equals to zero,
  • 49:54so it's adjusted for nothing,
  • 49:56and this is the likelihood derived earlier
  • 49:59and used in other studies.
  • 50:02This is the likelihood function that adjusts for the growth,
  • 50:05so R is treated as an unknown parameter.
  • 50:08And this is the likelihood on the last slide that adjusted
  • 50:12for both the growth and the right-truncation,
  • 50:16as less than or equal to M.
  • 50:21So the point estimates are obtained by MLEs,
  • 50:23and the confidence intervals are obtained
  • 50:25by nonparametric Bootstrap,
  • 50:28and we compared our results with three previous studies.
  • 50:36So this is, basically summarizes this experiment.
  • 50:42This is a little bit complicated,
  • 50:43so let me walk you through slowly.
  • 50:48So there are three likelihood functions we used.
  • 50:50One adjusts for nothing; that's the orange.
  • 50:54The one is adjusted only for growth,
  • 50:57and the ones that adjusted for both growth and truncation.
  • 51:02Okay, so what you can immediately see
  • 51:04is that if we adjusted for, ah,
  • 51:08if we adjusted for nothing, then this is much larger
  • 51:14than the other estimates.
  • 51:18So actually, if you adjusted for nothing,
  • 51:20and if you sort of used our entire data set,
  • 51:23the median incubation period would be about nine days.
  • 51:27And the 95 percent quantile would be about 25 days.
  • 51:31So that's just way too large.
  • 51:35And if you ignored right-truncation, for example,
  • 51:38if you used this likelihood function we derived earlier,
  • 51:43that only accounts for growth, you underestimate
  • 51:48the incubation period in the beginning, as expected,
  • 51:51but you slowly converge to this final estimate.
  • 51:57And if you use this likelihood function and adjust for both
  • 52:00growth and truncation, you actually get
  • 52:03some quite sensible results by the end of January.
  • 52:09So, it has large uncertainty, but it's roughly unbiased,
  • 52:14and it kind of eventually converges to that estimate.
  • 52:18The same estimate that we obtained
  • 52:23using the blue curve, but using the full data.
  • 52:28Okay.
  • 52:30So, for the sake of time, I think I'll skip the part
  • 52:36about Bayesian nonparametric inference.
  • 52:40One thing that's a little bit interesting, I think,
  • 52:43is there seems to be some difference between men
  • 52:48and women in their incubation period.
  • 52:51So these are sort of the posterior mean
  • 52:54and posterior credible intervals for nonparametric
  • 53:01incubation period, and you can see that men
  • 53:04seem to develop symptoms quicker than women.
  • 53:11So, that's a little bit interesting,
  • 53:14and maybe, I mean, I'm not a doctor,
  • 53:18but it could be related to the observation
  • 53:22that men seem to be more susceptible,
  • 53:24and die more frequently than women.
  • 53:31So let's, let me conclude this talk.
  • 53:34So these are some conclusions we found about COVID-19,
  • 53:40using our dataset and our model.
  • 53:43Initial doubling time in Wuhan was about two to 2.5 days.
  • 53:50The median incubation period is about four days,
  • 53:52and the proportion of incubation period
  • 53:55above 14 days is about five percent.
  • 54:00There are a number of limitations for our study.
  • 54:03For example, we used the symptom onset reported
  • 54:07by the patients and they are not always accurate.
  • 54:11There could be behavioral reasons for people
  • 54:13to report a later symptom onset.
  • 54:18Even though these locations are intensive in their testing
  • 54:21and contact tracing, some degree of under-ascertainment
  • 54:25is perhaps inevitable.
  • 54:28As I have shown you, in our dataset collection,
  • 54:34discerning the Wuhan exported case
  • 54:36is not a black and white decision.
  • 54:39We used this beyond a reasonable doubt kind of criterion,
  • 54:43but that's one criterion you can apply.
  • 54:47And the crucial assumptions, we put the first
  • 54:51two assumptions, which means that the travel
  • 54:53and disease are independent, and that can be violated.
  • 54:57For example, if I, if people tend to cancel
  • 55:02their travel plans when feeling sick.
  • 55:09Nevertheless, I think I have demonstrated some very
  • 55:12compelling evidence for selection bias in early studies.
  • 55:17Some of the biases you can correct by designing the study
  • 55:25more carefully, some require more sophisticated
  • 55:29statistical adjustments.
  • 55:33And basically, I think the conclusion is:
  • 55:37you should make un-calculated BETS.
  • 55:41So, we should always carefully design the study
  • 55:44and adhere to our sample inclusion criteria.
  • 55:48And the statistical inference should not be based
  • 55:53on some intuitive calculations,
  • 55:55but should be based on first principles.
  • 55:58So in this study, we kind of went back all the way
  • 56:00to defining the support of random variables.
  • 56:05So that's sort of statistics 101.
  • 56:08But that's actually, it's extremely important.
  • 56:11So I found it really helpful to start all the way
  • 56:15from the beginning and develop a generative model.
  • 56:20And that avoids a lot of potential selection biases.
  • 56:25So the final lesson I'd like to share from this whole study
  • 56:29is that I think this demonstrates the data quality
  • 56:34and better design are much more important
  • 56:38than data quantity and better modeling,
  • 56:42in many real data studies.
  • 56:46Thanks for the attention,
  • 56:47and I'll take any questions from here.
  • 56:51- Thanks to you for the nice talk.
  • 56:54Does anyone have questions for Qingyuan?
  • 57:00So Qing, I think someone, ah,
  • 57:04yeah, Joe sent you a question.
  • 57:07- Okay.
  • 57:09- Are there any information in datasets of whether patient
  • 57:12is healthcare worker?
  • 57:15- No, these are not usually healthcare workers.
  • 57:19These are exported from Wuhan, so they're usually
  • 57:21just people who traveled maybe for sightseeing,
  • 57:24or for the Chinese New Year, they traveled from Wuhan
  • 57:28to other places and were diagnosed there.
  • 57:34- Right, so also he has another question,
  • 57:38Joe has another question also: how can we evaluate
  • 57:41the effectiveness of social distancing and mask guidelines?
  • 57:49- I think this study we did was not designed
  • 57:53to answer those questions.
  • 57:57We did have a very, ah,
  • 58:01sort of preliminary analysis.
  • 58:03So we broke the study period into two parts.
  • 58:08So on January 20, it was confirmed publicly
  • 58:12that the disease was human-to-human transmissible,
  • 58:16so we broke the period into two parts:
  • 58:20those before January 20 and those after January 20.
  • 58:25But the after period is just three days.
  • 58:27So January 21, 22, 23, and we found that if we fit
  • 58:33different growths to these two periods, the second period,
  • 58:36it seemed that the growth was substantially slower.
  • 58:42The growth, the exponent R is not quite zero,
  • 58:48but it's close.
  • 58:50So it seems that the knowledge of sort
  • 58:52of human-to-human transmissibility and the fact that,
  • 58:58I think, masks are probably much more,
  • 59:01were much more available in Wuhan,
  • 59:03people started to do some social distancing
  • 59:08right after January 20.
  • 59:11I think that seemed to play a role.
  • 59:14But that's very, very preliminary,
  • 59:17and I think there are a lot of good studies about this now.
  • 59:25- Donna has a question.
  • 59:26Donna, do you want to say what your question is?
  • 59:32- [Donna] Yeah, sure, thanks.
  • 59:33That was a very interesting and clear talk.
  • 59:36I really appreciated the way you carefully went through,
  • 59:40step by step, to show-- (audio distorting)
  • 59:47Who aren't doing that, I feel.
  • 59:50But my question was, it was still hard for me to tell
  • 59:54to what extent your estimates were identifiable
  • 59:59due to assumptions and to what extent the data
  • 01:00:04made the estimates fairly identifiable.
  • 01:00:09- Yeah so essentially, I mean, selection bias,
  • 01:00:12usually you cannot always avoid it, unless you
  • 01:00:17make some kind of missing at random type of assumption.
  • 01:00:22Here, we don't have a random selection.
  • 01:00:25It's more like a deterministic selection,
  • 01:00:27and we can quantify that selection event,
  • 01:00:30but still, as you said, I think these are great questions
  • 01:00:37to sort of disentangle the nonparametric assumptions
  • 01:00:41needed for identification and the parametric assumptions
  • 01:00:44needed for sort of better and easier inference.
  • 01:00:51I don't have a formal result,
  • 01:00:53but my feeling is the first two assumptions
  • 01:00:56that are assumed, sort of the independence of travel
  • 01:01:00and disease, that's sort of essential to the identification.
  • 01:01:07And then later on, the assumptions are perhaps relaxable.
  • 01:01:14So we did try to relax those
  • 01:01:15in the Bayesian nonparametric analysis.
  • 01:01:19But that's not a proof, so that's my, ah,
  • 01:01:25best guess at this point.
  • 01:01:28- [Donna] Thank you.
  • 01:01:32- From Casey, said, ah, the estimates,
  • 01:01:39people have estimated about five to 80 percent
  • 01:01:41of asymptomatic infections, and isn't that a limitation
  • 01:01:46of your model that you did not account
  • 01:01:48for asymptomatic carriers?
  • 01:01:50And if so, how can we possibly model for it,
  • 01:01:52given the large range of estimates?
  • 01:01:55So this is actually a feature of our study,
  • 01:01:59because we actually had a, let's see,
  • 01:02:05we had a term for the asymptomatic transmission.
  • 01:02:14So, but that's just that parameter was canceled.
  • 01:02:19So this parameter, mu, or one minus mu,
  • 01:02:22is the proportion of asymptomatic infections.
  • 01:02:29But then because we only observed cases who are,
  • 01:02:36who showed symptoms, so actually in likelihood,
  • 01:02:39this parameter mu got canceled.
  • 01:02:43So, of course the reason we could cancel that mu
  • 01:02:46is because of this assumption, too,
  • 01:02:48that S is independent of the travel.
  • 01:02:53So that's important.
  • 01:02:55But once you assume that you actually, ah,
  • 01:03:00sort of don't need to worry about asymptomatic transmission,
  • 01:03:04and on the other hand, this dataset, or this whole method
  • 01:03:08also provides more information about the proportion
  • 01:03:11of asymptomatic infection.
  • 01:03:15Hopefully that'll answer your question.
  • 01:03:17- [Casey] Yeah, thanks; so you account for it
  • 01:03:19by saying it's not really significant, in your estimate?
  • 01:03:24- Yeah, so in the likelihood, you will get canceled.
  • 01:03:26So it doesn't appear in the likelihood.
  • 01:03:28So the likelihood of the data does not depend
  • 01:03:30on how much are asymptomatic, because we only look
  • 01:03:36at cases who are symptomatic.
  • 01:03:39So this incubation period that we estimated
  • 01:03:41are also the incubation period among
  • 01:03:44those people who showed symptoms.
  • 01:03:47- [Casey] So it's an elegant way of sidestepping
  • 01:03:49the question, (laughing) in a way.
  • 01:03:52- Well, it's not a sidestep, it's sort of,
  • 01:03:56it's a limitation of this design.
  • 01:04:00So the whole design should be robust
  • 01:04:04to asymptomatic transmission, and it also gives
  • 01:04:07no information about asymptomatic transmission.
  • 01:04:12- [Casey] Yeah, I was really impressed at the way
  • 01:04:13you took on that Lancet article and just really, ah,
  • 01:04:18it was really impressive; what a great talk.
  • 01:04:20Thank you so much.
  • 01:04:22- Well thank you.
  • 01:04:27- Hi Qing I have a question.
  • 01:04:29So you mentioned before that because the measurements
  • 01:04:33inside of Wuhan are the, or the, ah,
  • 01:04:37the measurements that we have inside Wuhan,
  • 01:04:39the numbers aren't very accurate due to various reasons.
  • 01:04:42So I'm wondering that if you calculate the doubling time
  • 01:04:46using the data for Wuhan city,
  • 01:04:50and then take into, that uses the measurements
  • 01:04:53before they changed the criterion for when it's counted
  • 01:04:58as a confirmed case, and using the data before, say,
  • 01:05:02you locked down, but taking into consideration
  • 01:05:04that the data, you only looked at data.
  • 01:05:07So you only looked at the confirmed cases before that date.
  • 01:05:10Will you get a similar measurement,
  • 01:05:13a similar estimate as if you're using the traveling data,
  • 01:05:16or it is much worse?
  • 01:05:19- Yeah, people have done an analysis on the data from Wuhan.
  • 01:05:26What I would like to point out is that this figure
  • 01:05:29is only the number of new, confirmed cases.
  • 01:05:34So what is usually done in epidemic analysis
  • 01:05:36is they don't look at the number of confirmed cases,
  • 01:05:40but the number of cases who showed symptoms on a certain day
  • 01:05:45because that's usually less variable, less noisy,
  • 01:05:51than this sort of confirmation,
  • 01:05:55because of the problem about confirmation.
  • 01:05:59So people have done that, and I don't see a doubling time
  • 01:06:06estimation from that; there was a journal paper on that.
  • 01:06:13And there was also a very interesting comment on it
  • 01:06:18that criticized some of its methodology.
  • 01:06:21I didn't see a doubling time estimate.
  • 01:06:25So they seemed to focus on the R-naught of the epidemic.
  • 01:06:31I actually had thought about that as well,
  • 01:06:34and we, in this study I have presented,
  • 01:06:37I intentionally avoided to estimate R-naught.
  • 01:06:41Because I think there was a lot of issues with, ah,
  • 01:06:47finding out the unbiased estimate of the serial interval,
  • 01:06:52which is very important in estimating R-naught.
  • 01:06:56So, this estimate we found is not directly comparable
  • 01:07:05to that journal paper, I guess.
  • 01:07:08But so what happened, I think, is around late January,
  • 01:07:12early February, all of people have tried to estimate
  • 01:07:17the R-naught and the doubling time of the epidemic,
  • 01:07:21and what I've found interesting was
  • 01:07:23there were kind of two modes.
  • 01:07:25There's several papers estimated that the doubling time
  • 01:07:29was about six to seven days, and there were several papers
  • 01:07:31that estimated doubling times of about two to four days.
  • 01:07:37And I think, ah,
  • 01:07:41at least I have shown that the Lancet paper,
  • 01:07:45that their whole method seems to be very flawed.
  • 01:07:50But whether this means that our estimate is very close
  • 01:07:54to the truth, it doesn't necessarily mean so.
  • 01:07:58Because we also have a lot of limitations.
  • 01:08:02- Okay, thanks.
  • 01:08:09Any more question for Qingyuan?
  • 01:08:14Okay, thanks Qing.
  • 01:08:16I guess that's all for today, and it's a great talk.
  • 01:08:20If you have any more questions for Qing,
  • 01:08:21you can send him an email, and you can find his email
  • 01:08:25on his website, okay?
  • 01:08:29- Okay.
  • 01:08:30(muttering)
  • 01:08:32All right, okay, thank you everyone.
  • 01:08:35- Thank you, oh, we got a new message?
  • 01:08:38(muttering)
  • 01:08:40- It's just a, Keyong said thank you.
  • 01:08:43- Okay, okay, bye!
  • 01:08:45- [Qingyuan] All right, bye.