

COVID Tracking Project Enhancements to Johns Hopkins Case/Fatality Data
Data VisualizationModelingCoronavirusCOVID 19posted by Steve Miller December 7, 2020 Steve Miller

Like many analytics geeks, I’ve been tracking data on the Covid pandemic since early spring. My main source is the Center for Systems Science and Engineering at Johns Hopkins University, with files for download made available at midnight Central time. I’ve established a pretty significant R infrastructure in JupyterLab to summarize daily developments. The combination of R data.table and tidyverse programming capabilities have made the COVID tracking project computation straightforward.
Much as I’ve been successful using the daily Hopkins data on new cases and fatalities with granularity county within the state to fuel graphs and dashboards, I recently started seeking hospitalization and ICU figures as well. My hope was to relate cases and hospitalizations to predict fatalities. I was able to identify a website COVID Tracking Project with such hospitalization data, though my confidence was more with date-aggregated alone rather than state within date, which I initially sought. Turns out COVID hospitalization data summarized by date is much better behaved than the more granular with state added, so I made a concession.
The code that follows first scrapes and munges daily hospitalization data from the COVID Tracking website, then reads a previously built data.table from the Johns Hopkins Covid data portal, ultimately joining the two on date. I next embellish the data with several covariates and set about to forecasting.
The modeling challenges the combined data purports to address involves predicting fatalities as a function of cases and hospitalizations lagged in time. And the model I investigate is akin to those based on “theories” I’ve seen in the news: fatalities(t) = fnct(cases(t-21) + hospitalizations(t-14) + ICU(t-7)). In this instance, the explanatory attributes are considered distributed lag (DL) for which the predictors are all exogenous. Now one could reasonably argue that both hospitalizations and ICU should be considered endogenous to cases, but I won’t address that here, my intention being to simply show the data integration and modeling steps. I also do not mine the data to find optimal lag levels, wary of overfitting (which, alas, is happening regardless). I simply fit one specification for each modeling function.
In the end, I consider two R functions here: the first, linear models with standby lm, the second, gradient boosting from the xgboost package. The precise model specification for both lm and xgboost is fatalities(t) = dayofweek(t) + aftermay(t) + cases(t-21) + hospitalizations(t-14) + ICU(t-7). After each model is fit, I then forecast next week fatalities with both models, settling finally on an average of the two predictions ensemble.
What I hope readers take away from this COVID tracking project is the thinking behind building multiple data.tables from web downloads/scrapings, combining the data via join/merge functions, then finally starting the exploration and modeling processes.
The software used below is JupyterLab 2.1.2 and R 4.0.2. R’s splendid tidyverse and data.table packages are featured for data analysis, while lm and xgboost are used for modeling.
Confirm current R library folders.
.libPaths()
‘C:/R/4.0.2/library’ . ‘C:/Program Files/R/R-4.0.2/library’
Load libraries and include personal functions.
In [2]:
options(warn=-1)
options(scipen = 20)
options(datatable.print.topn=100)
options(datatable.showProgress=FALSE)
usualsuspects <- c(
'tidyverse', 'data.table', 'pryr', 'plyr','readxl', 'bit', 'grid', 'gridExtra',
'rvest', 'magrittr','lubridate','rlist', 'tictoc', 'skimr','pracma','TTR',
'fst','feather','rio', 'Rcpp','tidyquant', 'Matrix', 'arrow',
'knitr', 'kableExtra', 'microbenchmark', "Metrics",
'ggplot2','RColorBrewer',
'magick',"webshot","IRdisplay",
'incidence','outbreaks','EpiEstim', 'modeltime',
'forecast', 'timetk', 'h2o','xgboost', 'catboost', 'lightgbm'
)
suppressMessages(invisible(lapply(usualsuspects, library, character.only = TRUE)))
funcsdir <- "/steve/r/functions"
funcsfile <- "rfunctions.r"
setwd(funcsdir)
source(funcsfile)
cppdir <- "c:/steve/c++/rstudio"
setwd(cppdir)
blanks(1)
lsf.str()
set.seed(543)
blanks(2)
allfreqs : function (dtn, catlim = 100)
blanks : function (howmany)
colsize : function (dt)
dtmeta : function (df)
freqsdt : function (DTstr, xstr, percent = TRUE)
meta : function (df, data = FALSE, dict = TRUE)
mksumfreq : function (freqalldt)
mksumfreq2 : function (dt)
mykab : function (dt)
obj_sz : function (obj)
prhead : function (df, howmany = 6)
prheadtail : function (df, howmany = 6)
Define type-saving functions for the lazy analyst.
diffna <- function (var) c(NA,diff(var))
prodna <- function (var) prod(var,na.rm=TRUE)
sumna <- function (var) sum(var,na.rm=TRUE)
meanna <- function (var) round(mean(var,na.rm=TRUE))
freadfac <- function (var) data.table::fread(var,stringsAsFactors=TRUE)
fstrddt <- function(var) fst::read_fst(var,as.data.table=TRUE)
movavg <- function (var, lags) round(data.table::frollmean(var,lags))
meancov <- function(var) round(mean(var))
wkday <- function(dt) lubridate::wday(dt,label=TRUE)
blanks(2)
Scrape CovidTracking hospitalization data using the tidyverse rvest library.
url <- "https://covidtracking.com/data/national/hospitalization"
css <- "td"
html <- (read_html(url) %>% html_nodes(css) %>% html_text())
head(html,20)
blanks(2)
‘DateDec 6, 2020’
‘Currently hospitalized/Now hospitalized101,487’
‘Currently in ICU/Now in ICU20,145’
‘Currently on ventilator/Now on ventilator7,094’
‘DateDec 5, 2020’
‘Currently hospitalized/Now hospitalized101,190’
‘Currently in ICU/Now in ICU19,950’
‘Currently on ventilator/Now on ventilator7,005’
‘DateDec 4, 2020’
‘Currently hospitalized/Now hospitalized101,276’
‘Currently in ICU/Now in ICU19,858’
‘Currently on ventilator/Now on ventilator6,999’
‘DateDec 3, 2020’
‘Currently hospitalized/Now hospitalized100,755’
‘Currently in ICU/Now in ICU19,723’
‘Currently on ventilator/Now on ventilator6,867’
‘DateDec 2, 2020’
‘Currently hospitalized/Now hospitalized100,322’
‘Currently in ICU/Now in ICU19,680’
‘Currently on ventilator/Now on ventilator6,855’
Wrangle said data with tidyverse chaining to produce a data.table suitable for downstream processing.
In [5]:
html <- html %>% gsub("Currently on ventilator/Now on ventilator","",.) %>% gsub("Currently in ICU/Now in ICU","",.) %>%
gsub("Currently hospitalized/Now hospitalized","",.) %>% gsub("Date","",.)
dt <- data.table(matrix(html,ncol=4,byrow=TRUE)) %>% setnames(.,c("date","hospitalized","ICU",'ventilator'))
meta(dt)
blanks(1)
mykab(head(dt))
blanks(2)
|name|class |rows|columns|size | |:---|:---------------------|:---|:------|:-------| |dt |data.table, data.frame|320 |4 |73.16 KB| Classes 'data.table' and 'data.frame': 320 obs. of 4 variables: $ date : chr "Dec 6, 2020" "Dec 5, 2020" "Dec 4, 2020" "Dec 3, 2020" ... $ hospitalized: chr "101,487" "101,190" "101,276" "100,755" ... $ ICU : chr "20,145" "19,950" "19,858" "19,723" ... $ ventilator : chr "7,094" "7,005" "6,999" "6,867" ... - attr(*, ".internal.selfref")=<externalptr> NULL
|date |hospitalized|ICU |ventilator|
|:———-|:———–|:—–|:———|
|Dec 6, 2020|101,487 |20,145|7,094 | |Dec 5, 2020|101,190 |19,950|7,005 | |Dec 4, 2020|101,276 |19,858|6,999 | |Dec 3, 2020|100,755 |19,723|6,867 | |Dec 2, 2020|100,322 |19,680|6,855 | |Dec 1, 2020|98,777 |19,295|6,649 |
Transform character data to numeric with data.table chaining.
In [6]:
invisible(dt[,c("date","hospitalized","ICU","ventilator"):=.(mdy(date),gsub(",","",hospitalized),gsub(",","",ICU),gsub(",","",ventilator))][,
hospitalized:=ifelse(hospitalized=="N/A",NA,as.integer(hospitalized))][,ICU:=ifelse(ICU=="N/A",NA,as.integer(ICU))][,
ventilator:=ifelse(ventilator=="N/A",NA,as.integer(ventilator))])
dt <- dt[order(date)]
setkey(dt,date)
meta(dt)
blanks(1)
prheadtail(dt,howmany=10)
blanks(2)
|name|class |rows|columns|size |
|:—|:———————|:—|:——|:——|
|dt |data.table, data.frame|320 |4 |7.96 KB|
Classes ‘data.table’ and ‘data.frame’: 320 obs. of 4 variables:
$ date : Date, format: “2020-01-22” “2020-01-23” …
$ hospitalized: int NA NA NA NA NA NA NA NA NA NA …
$ ICU : int NA NA NA NA NA NA NA NA NA NA …
$ ventilator : int NA NA NA NA NA NA NA NA NA NA …
– attr(*, “.internal.selfref”)=<externalptr>
– attr(*, “sorted”)= chr “date”
NULL
date hospitalized ICU ventilator
1: 2020-01-22 NA NA NA
2: 2020-01-23 NA NA NA
3: 2020-01-24 NA NA NA
4: 2020-01-25 NA NA NA
5: 2020-01-26 NA NA NA
6: 2020-01-27 NA NA NA
7: 2020-01-28 NA NA NA
8: 2020-01-29 NA NA NA
9: 2020-01-30 NA NA NA
10: 2020-01-31 NA NA NA
date hospitalized ICU ventilator
1: 2020-11-27 89950 18061 6030
2: 2020-11-28 91762 18249 6148
3: 2020-11-29 93357 18437 6245
4: 2020-11-30 96149 18801 6520
5: 2020-12-01 98777 19295 6649
6: 2020-12-02 100322 19680 6855
7: 2020-12-03 100755 19723 6867
8: 2020-12-04 101276 19858 6999
9: 2020-12-05 101190 19950 7005
10: 2020-12-06 101487 20145 7094
Save the finished table to an fst file.
wdir <- "/data/covidtracking"
setwd(wdir)
fname <- "hospitalizations.fst"
write_fst(dt,fname)
blanks(2)
Starting date with all hospitalization attributes non-missing.
prheadtail(dt[date>"2020-03-25"])
blanks(2)
date hospitalized ICU ventilator
1: 2020-03-26 7805 1299 258
2: 2020-03-27 10978 1792 324
3: 2020-03-28 12409 2174 390
4: 2020-03-29 14055 2456 439
5: 2020-03-30 15917 3087 451
6: 2020-03-31 18155 3487 507
date hospitalized ICU ventilator
1: 2020-12-01 98777 19295 6649
2: 2020-12-02 100322 19680 6855
3: 2020-12-03 100755 19723 6867
4: 2020-12-04 101276 19858 6999
5: 2020-12-05 101190 19950 7005
6: 2020-12-06 101487 20145 7094
Load the saved John Hopkins case/fatality data.
wd <- "c:/steve/covid/magrsave"
setwd(wd)
fname <- tail(list.files( pattern = "mfinal*"),1)
print(fname)
mfinalus <- import(fname,setclass="data.table")
meta(mfinalus)
print(max(mfinalus$date))
blanks(2)
|name |class |rows |columns|size |
|:——-|:———————|:——|:——|:——-|
|mfinalus|data.table, data.frame|1068800|17 |86.03 MB|
Classes ‘data.table’ and ‘data.frame’: 1068800 obs. of 17 variables:
$ uid : int 84045001 84045001 84045001 84045001 84045001 84045001 84045001 84045001 84045001 84045001 …
$ iso2 : Factor w/ 6 levels “AS”,”GU”,”MP”,..: 5 5 5 5 5 5 5 5 5 5 …
$ iso3 : Factor w/ 6 levels “ASM”,”GUM”,”MNP”,..: 5 5 5 5 5 5 5 5 5 5 …
$ code3 : int 840 840 840 840 840 840 840 840 840 840 …
$ fips : num 45001 45001 45001 45001 45001 …
$ admin2 : Factor w/ 1979 levels “”,”Abbeville”,..: 2 2 2 2 2 2 2 2 2 2 …
$ state : Factor w/ 58 levels “Alabama”,”Alaska”,..: 47 47 47 47 47 47 47 47 47 47 …
$ country_region: Factor w/ 1 level “US”: 1 1 1 1 1 1 1 1 1 1 …
$ lat : num 34.2 34.2 34.2 34.2 34.2 …
$ long_ : num -82.5 -82.5 -82.5 -82.5 -82.5 …
$ combined_key : Factor w/ 3340 levels “Abbeville, South Carolina, US”,..: 1 1 1 1 1 1 1 1 1 1 …
$ date : Date, format: “2020-01-22” “2020-01-23” …
$ cumcases : int 0 0 0 0 0 0 0 0 0 0 …
$ daycases : int NA 0 0 0 0 0 0 0 0 0 …
$ population : int 24527 24527 24527 24527 24527 24527 24527 24527 24527 24527 …
$ cumdeaths : int 0 0 0 0 0 0 0 0 0 0 …
$ daydeaths : int NA 0 0 0 0 0 0 0 0 0 …
– attr(*, “.internal.selfref”)=<externalptr>
NULL
[1] “2020-12-06”
Demonstrate the need for a few “covariates” to aid in forecasting. The first shows variation in cases/fatalities by day of the week. This is simply a reporting artifact.
mykab(mfinalus[,.(daycases=sumna(daycases),daydeaths=sumna(daydeaths)),.(date)][
,.(.N,daycases=sumna(daycases),daydeaths=sumna(daydeaths)),.(wkday(date))][
,`:=`(pctcases=round(100*daycases/sum(daycases),2),pctdeaths=round(100*daydeaths/sum(daydeaths),2))])
blanks(2)
|wkday|N |daycases|daydeaths|pctcases|pctdeaths|
|:—-|:–|:——-|:——–|:——-|:——–|
|Wed |46 |2120080 |51271 |14.37 |18.16 |
|Thu |46 |2237665 |46612 |15.16 |16.51 |
|Fri |46 |2453760 |45750 |16.63 |16.21 |
|Sat |46 |2216696 |38970 |15.02 |13.80 |
|Sun |46 |1880176 |24106 |12.74 |8.54 |
|Mon |45 |1817724 |26898 |12.32 |9.53 |
|Tue |45 |2030898 |48692 |13.76 |17.25 |
This one demonstrates the difference in the case mortality rate before and after the last day in May.
In [35]:
mykab(mfinalus[,.(daycases=sumna(daycases),daydeaths=sumna(daydeaths)),.(aftermay=date>"2020-05-31")][
,.(aftermay,fatalrate=round(100*(daydeaths/daycases),2))])
blanks(2)
|aftermay|fatalrate|
|:——-|:——–|
|FALSE |6.00 | |TRUE |1.35 |
Now join the case/fatality and hospitalization/ICU data.tables, adding dayofweek and aftermay covariates.
In [12]:
covidjhu <- mfinalus[,.(daycases=sumna(daycases),daydeaths=sumna(daydeaths)),.(date)]
covidfinal <- dt[covidjhu][,dayofweek:=wkday(date)][,aftermay:=date>"2020-05-31"][,idx:=.N:1]
meta(covidfinal)
blanks(1)
prheadtail(covidfinal)
blanks(2)
|name |class |rows|columns|size |
|:———|:———————|:—|:——|:——-|
|covidfinal|data.table, data.frame|320 |9 |15.77 KB|
Classes ‘data.table’ and ‘data.frame’: 320 obs. of 9 variables:
$ date : Date, format: “2020-01-22” “2020-01-23” …
$ hospitalized: int NA NA NA NA NA NA NA NA NA NA …
$ ICU : int NA NA NA NA NA NA NA NA NA NA …
$ ventilator : int NA NA NA NA NA NA NA NA NA NA …
$ daycases : int 0 0 1 0 3 0 0 1 0 2 …
$ daydeaths : int 0 0 0 0 0 0 0 0 0 0 …
$ dayofweek : Ord.factor w/ 7 levels “Sun”<“Mon”<“Tue”<..: 4 5 6 7 1 2 3 4 5 6 …
$ aftermay : logi FALSE FALSE FALSE FALSE FALSE FALSE …
$ idx : int 320 319 318 317 316 315 314 313 312 311 …
– attr(*, “sorted”)= chr “date”
– attr(*, “.internal.selfref”)=<externalptr>
NULL
date hospitalized ICU ventilator daycases daydeaths dayofweek aftermay
1: 2020-01-22 NA NA NA 0 0 Wed FALSE
2: 2020-01-23 NA NA NA 0 0 Thu FALSE
3: 2020-01-24 NA NA NA 1 0 Fri FALSE
4: 2020-01-25 NA NA NA 0 0 Sat FALSE
5: 2020-01-26 NA NA NA 3 0 Sun FALSE
6: 2020-01-27 NA NA NA 0 0 Mon FALSE
idx
1: 320
2: 319
3: 318
4: 317
5: 316
6: 315
date hospitalized ICU ventilator daycases daydeaths dayofweek
1: 2020-12-01 98777 19295 6649 180637 2597 Tue
2: 2020-12-02 100322 19680 6855 200055 2804 Wed
3: 2020-12-03 100755 19723 6867 217664 2879 Thu
4: 2020-12-04 101276 19858 6999 227885 2607 Fri
5: 2020-12-05 101190 19950 7005 213875 2254 Sat
6: 2020-12-06 101487 20145 7094 175663 1113 Sun
aftermay idx
1: TRUE 6
2: TRUE 5
3: TRUE 4
4: TRUE 3
5: TRUE 2
6: TRUE 1
Save the joined data in fst, feather, and parquet formats.
wdir <- "/data/covidtracking"
setwd(wdir)
fname <- "coviddaily.fst"
write_fst(covidfinal,fname)
fname <- "coviddaily.feather"
write_feather(covidfinal,fname)
fname <- "coviddaily.parquet"
write_parquet(covidfinal,fname)
blanks(2)
Build the data.table to be used for forecasting.
In [15]:
ncovidfinal <- covidfinal[date>"2020-03-25"][,c("daycases_lag21","hospitalized_lag14","ICU_lag7")
:=.(shift(daycases, 21L, fill = NA, type = "lag"),
shift(hospitalized, 14L, fill = NA, type = "lag"),
shift(ICU, 7L, fill = NA, type = "lag"))]
nr <- nrow(ncovidfinal)
meta(ncovidfinal)
blanks(2)
|name |class |rows|columns|size |
|:———-|:———————|:—|:——|:——-|
|ncovidfinal|data.table, data.frame|256 |12 |16.61 KB|
Classes ‘data.table’ and ‘data.frame’: 256 obs. of 12 variables:
$ date : Date, format: “2020-03-26” “2020-03-27” …
$ hospitalized : int 7805 10978 12409 14055 15917 18155 20906 22997 25777 30268 …
$ ICU : int 1299 1792 2174 2456 3087 3487 3937 4513 4928 5500 …
$ ventilator : int 258 324 390 439 451 507 561 574 623 656 …
$ daycases : int 17840 18680 19607 18879 22082 26316 25883 30387 31970 33015 …
$ daydeaths : int 412 518 633 592 696 1092 1180 1503 1393 1541 …
$ dayofweek : Ord.factor w/ 7 levels “Sun”<“Mon”<“Tue”<..: 5 6 7 1 2 3 4 5 6 7 …
$ aftermay : logi FALSE FALSE FALSE FALSE FALSE FALSE …
$ idx : int 256 255 254 253 252 251 250 249 248 247 …
$ daycases_lag21 : int NA NA NA NA NA NA NA NA NA NA …
$ hospitalized_lag14: int NA NA NA NA NA NA NA NA NA NA …
$ ICU_lag7 : int NA NA NA NA NA NA NA 1299 1792 2174 …
– attr(*, “sorted”)= chr “date”
– attr(*, “.internal.selfref”)=<externalptr>
NULL
A few visuals showing how the pandemic behaved differently before and after the end of May.
ggplot(ncovidfinal,aes(x=daycases,y=ICU, color=aftermay)) + geom_point() + geom_smooth(method='loess', se=FALSE)
ggplot(ncovidfinal[22:nr],aes(x=daycases_lag21,y=daydeaths,color=aftermay)) + geom_point() + geom_smooth(method='loess', se=FALSE)
ggplot(ncovidfinal[15:nr],aes(x=hospitalized_lag14,y=daydeaths,color=aftermay)) + geom_point() + geom_smooth(method='loess', se=FALSE)
ggplot(ncovidfinal,aes(x=ICU_lag7,y=daydeaths,color=aftermay)) + geom_point() + geom_smooth(method='loess', se=FALSE)
Now set up two models, a linear and a gradient boosting, to forecast a coming week of Covid fatalities.
depvar <- c("daydeaths")
indvars <- c("aftermay","dayofweek","daycases_lag21","hospitalized_lag14","ICU_lag7")
vars <- c(depvar,indvars)
blanks(2)
lst <- lm(as.formula(paste(depvar,"~",paste(indvars,collapse="+"),sep="")),data=ncovidfinal[22:nr,..vars])
summary(lst)
blanks(2)
Call:
lm(formula = as.formula(paste(depvar, “~”, paste(indvars, collapse = “+”), sep = “”)), data = ncovidfinal[22:nr, ..vars])
Residuals:
Min 1Q Median 3Q Max
-859.79 -125.32 -7.31 106.44 861.08
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 435.253915 130.749765 3.329 0.001019 **
aftermayTRUE -305.859254 85.929592 -3.559 0.000454 ***
dayofweek.L 393.326841 41.241091 9.537 < 0.0000000000000002 ***
dayofweek.Q -548.783281 39.272653 -13.974 < 0.0000000000000002 ***
dayofweek.C 4.239009 39.700356 0.107 0.915063
dayofweek^4 172.516526 39.434147 4.375 0.00001863 ***
dayofweek^5 -209.854809 39.440293 -5.321 0.00000025 ***
dayofweek^6 25.593071 39.524384 0.648 0.517955
daycases_lag21 0.002410 0.001399 1.723 0.086338 .
hospitalized_lag14 -0.007833 0.002501 -3.132 0.001967 **
ICU_lag7 0.120052 0.012290 9.768 < 0.0000000000000002 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Residual standard error: 227.6 on 224 degrees of freedom
Multiple R-squared: 0.8391, Adjusted R-squared: 0.8319
F-statistic: 116.8 on 10 and 224 DF, p-value: < 0.00000000000000022
v <- var(ncovidfinal[22:nr]$daydeaths)
m <- mse(ncovidfinal[22:nr]$daydeaths,predict(lst))
r2 <- 1-m/v
v
m
r2
blanks(2)
Next, gradient boosting from xgboost. First, though, create a model matrix from the analysis data.table.
trainspm <- xgb.DMatrix(sparse.model.matrix(as.formula(paste0(depvar,"~.")), data = ncovidfinal[22:nr,..vars]), label = ncovidfinal[22:nr][[depvar]])
str(trainspm)
bst <- xgb.train(data = trainspm, max_depth = 6, eta = .3, nthread = 2, nrounds = 10, subsample = .3, tree_method="exact",
gamma=5,verbose=1, subsample=.5,lambda=2,alpha=2,objective = "reg:squarederror",eval_metric = "rmse")
blanks(2)
Class ‘xgb.DMatrix’ <externalptr>
– attr(*, “.Dimnames”)=List of 2
..$ : NULL
..$ : chr [1:11] “(Intercept)” “aftermayTRUE” “dayofweek.L” “dayofweek.Q” …
Look at the results. ICU_lag7 is by far the most important “regressor”.
In [26]:
v <- var(ncovidfinal[22:nr]$daydeaths)
m <- mse(ncovidfinal[22:nr]$daydeaths,predict(bst,newdata=trainspm))
r2 <- 1-m/v
v
m
r2
blanks(2)
options(repr.plot.width=8,repr.plot.height=6)
importance <- xgb.importance(feature_names = colnames(trainspm), model = bst)
xgb.plot.importance(importance, rel_to_first = TRUE, xlab = "Relative Importance")
blanks(2)
Build the “newdata” data.table used for forecasting the next week.
newdt <- data.table(date=ncovidfinal[idx==1]$date+1:7,
aftermay=rep(TRUE,7),
dayofweek=ncovidfinal[between(idx,1,7),dayofweek],
daycases_lag21=ncovidfinal[between(idx,15,21),daycases],
hospitalized_lag14=ncovidfinal[between(idx,8,14),hospitalized],
ICU_lag7=ncovidfinal[between(idx,1,7),ICU]
)
meta(newdt)
blanks(1)
mykab(newdt)
blanks(2)
|name |class |rows|columns|size |
|:—-|:———————|:—|:——|:——|
|newdt|data.table, data.frame|7 |6 |2.87 KB|
Classes ‘data.table’ and ‘data.frame’: 7 obs. of 6 variables:
$ date : Date, format: “2020-12-07” “2020-12-08” …
$ aftermay : logi TRUE TRUE TRUE TRUE TRUE TRUE …
$ dayofweek : Ord.factor w/ 7 levels “Sun”<“Mon”<“Tue”<..: 2 3 4 5 6 7 1
$ daycases_lag21 : int 157820 160574 170513 188033 195518 178097 142807
$ hospitalized_lag14: int 85979 88174 90041 90564 89950 91762 93357
$ ICU_lag7 : int 18801 19295 19680 19723 19858 19950 20145
– attr(*, “.internal.selfref”)=<externalptr>
NULL
|date |aftermay|dayofweek|daycases_lag21|hospitalized_lag14|ICU_lag7|
|:———|:——-|:——–|:————-|:—————–|:——-|
|2020-12-07|TRUE |Mon |157820 |85979 |18801 |
|2020-12-08|TRUE |Tue |160574 |88174 |19295 |
|2020-12-09|TRUE |Wed |170513 |90041 |19680 |
|2020-12-10|TRUE |Thu |188033 |90564 |19723 |
|2020-12-11|TRUE |Fri |195518 |89950 |19858 |
|2020-12-12|TRUE |Sat |178097 |91762 |19950 |
|2020-12-13|TRUE |Sun |142807 |93357 |20145 |
newspm <- xgb.DMatrix(sparse.model.matrix(as.formula(paste0("~.")), data = newdt[,..indvars]))
dimnames(newspm)
blanks(2)
- NULL
- ‘(Intercept)’
- ‘aftermayTRUE’
- ‘dayofweek.L’
- ‘dayofweek.Q’
- ‘dayofweek.C’
- ‘dayofweek^4’
- ‘dayofweek^5’
- ‘dayofweek^6’
- ‘daycases_lag21’
- ‘hospitalized_lag14’
- ‘ICU_lag7’
round(predict(lst,newdata=newdt))
sum(round(predict(lst,newdata=newdt)))
blanks(1)
round(predict(bst,newdata=newspm))
sum(round(predict(bst,newdata=newspm)))
blanks(1)
round((predict(lst,newdata=newdt)+predict(bst,newdata=newspm))/2)
sum(round((predict(lst,newdata=newdt)+predict(bst,newdata=newspm))/2))
blanks(2)
1: 1753 2: 2390 3: 2504 4: 2405 5: 2416 6: 2180 7: 1702
vars = c("aftermay","dayofweek","daycases_lag21","hospitalized_lag14","ICU_lag7","daydeaths")
vars <- c("date",vars)
blanks(2)
newdtt <- rbind(tail(ncovidfinal[,..vars],7),
cbind(newdt,predicted=round((predict(lst,newdata=newdt)+predict(bst,newdata=newspm))/2))
,use.names=F)
mykab(newdtt)
blanks(2)
|:———|:——-|:——–|:————-|:—————–|:——-|:——–|
|2020-11-30|TRUE |Mon |121210 |73377 |17080 |1172 |
|2020-12-01|TRUE |Tue |137694 |77079 |17314 |2597 |
|2020-12-02|TRUE |Wed |143649 |79517 |17740 |2804 |
|2020-12-03|TRUE |Thu |161147 |80682 |18019 |2879 |
|2020-12-04|TRUE |Fri |177568 |82279 |18061 |2607 |
|2020-12-05|TRUE |Sat |166750 |83346 |18249 |2254 |
|2020-12-06|TRUE |Sun |135941 |83882 |18437 |1113 |
|2020-12-07|TRUE |Mon |157820 |85979 |18801 |1591 |
|2020-12-08|TRUE |Tue |160574 |88174 |19295 |2321 |
|2020-12-09|TRUE |Wed |170513 |90041 |19680 |2434 |
|2020-12-10|TRUE |Thu |188033 |90564 |19723 |2301 |
|2020-12-11|TRUE |Fri |195518 |89950 |19858 |2156 |
|2020-12-12|TRUE |Sat |178097 |91762 |19950 |1997 |
|2020-12-13|TRUE |Sun |142807 |93357 |20145 |1394 |
wd <- "c:/steve/covid/modeling"
setwd(wd)
ofile <- paste0("newdtt",newdtt[7]$date,".png")
setnames(newdtt,c("date","aftermay","weekday","cases_l21","hosp_l14","ICU_l7","deaths"))
kbl(newdtt[,-c("aftermay")]) %>%
kable_paper("striped", full_width = FALSE) %>%
column_spec(6, color = "white",
background = spec_color(c(rep(1,7),rep(3,7)), begin = 0.4, end = 0.7)) %>%
as_image(width=10,file=ofile)
[1] “newdtt2020-12-06.png”
attr(,”class”)
[1] “knit_image_paths” “knit_asis”
attr(,”dpi”)
[1] 32.6
setnames(newdtt,c("date","aftermay","weekday","cases_l21","hosp_l14","ICU_l7","deaths"))
kbl(newdtt[,-c("aftermay")]) %>%
kable_paper("striped", full_width = FALSE) %>%
column_spec(6, color = "white",
background = spec_color(c(rep(1,7),rep(3,7)), begin = 0.4, end = 0.7)) %>%
as.character() %>%
IRdisplay::display_html()