用哪个代码可以查出前一天有多少人死亡?

最后发布: 2020-07-09


问题

在提供每日信息的数据集中,如何选择日期来访问昨天的信息。

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

我想打印昨天有多少人死亡。

df1 <- aggregate(death~countryName, subset(df, region =="Europe"), sum)

每天收集这个代码,并给出它。我不想要他们的总和。用哪个代码可以查出前一天有多少人死亡?

r
回答

写一个辅助函数 yesterday 并用它来分集数据。

yesterday <- function() Sys.Date() - 1L
yesterday()
# [1] "2020-05-02"

df1 <- aggregate(death ~ countryName, subset(df, region =="Europe" & day == yesterday()), sum)

A dplyr 解决方案。

library(dplyr)

df %>% 
  filter(day == yesterday(), region == "Europe") %>%
  group_by(countryName) %>%
  summarise(death = sum(death))

数据

以下 r2evans的评论,这里是数据读取和日期转换的代码。

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')
df$day <- as.Date(df$day, "%Y/%m/%d")


回答

昨天的 (不是单个日数)。

df1 <- subset(df, region =="Europe" & day == '2020/05/02')
head(df1)
#             day countryCode            countryName region      lat      lon confirmed recovered death
# 102  2020/05/02          AD                Andorra Europe 42.50000  1.50000       747       472    44
# 612  2020/05/02          AL                Albania Europe 41.00000 20.00000       789       519    31
# 1020 2020/05/02          AT                Austria Europe 47.33333 13.33333     15558     13180   596
# 1428 2020/05/02          BA Bosnia and Herzegovina Europe 44.00000 18.00000      1839       779    72
# 1734 2020/05/02          BE                Belgium Europe 50.83333  4.00000     49517     12211  7765
# 1938 2020/05/02          BG               Bulgaria Europe 43.00000 25.00000      1594       287    72

我以前用过你的 aggregate 代码,但意识到其假设中的一些缺陷。

  1. 由于某一国家的每一行都是一个连续的总数,因此使用 sum 作为一种汇总技术,在看多个日子时,逻辑上是不正确的;以及
  2. 因为你只需要一天的数据,所以没有必要再使用 aggregate,我们可以只 subset 的数据。

"证明"。

tail(sort(df$day), n=1)
# [1] 2020/05/02
# 102 Levels: 2020/01/22 2020/01/23 2020/01/24 2020/01/25 ... 2020/05/02
head( subset(df, region == "Europe" & day == "2020/05/02") )
#             day countryCode            countryName region      lat      lon confirmed recovered death
# 102  2020/05/02          AD                Andorra Europe 42.50000  1.50000       747       472    44
# 612  2020/05/02          AL                Albania Europe 41.00000 20.00000       789       519    31
# 1020 2020/05/02          AT                Austria Europe 47.33333 13.33333     15558     13180   596
# 1428 2020/05/02          BA Bosnia and Herzegovina Europe 44.00000 18.00000      1839       779    72
# 1734 2020/05/02          BE                Belgium Europe 50.83333  4.00000     49517     12211  7765
# 1938 2020/05/02          BG               Bulgaria Europe 43.00000 25.00000      1594       287    72
head( subset(df, region == "Europe" & day == "2020/05/01") )
#             day countryCode            countryName region      lat      lon confirmed recovered death
# 101  2020/05/01          AD                Andorra Europe 42.50000  1.50000       745       468    43
# 611  2020/05/01          AL                Albania Europe 41.00000 20.00000       782       488    31
# 1019 2020/05/01          AT                Austria Europe 47.33333 13.33333     15531     13110   589
# 1427 2020/05/01          BA Bosnia and Herzegovina Europe 44.00000 18.00000      1781       755    70
# 1733 2020/05/01          BE                Belgium Europe 50.83333  4.00000     49032     11892  7703
# 1937 2020/05/01          BG               Bulgaria Europe 43.00000 25.00000      1555       276    68

如果你只需要 death 列,您可以随时 select= 的列。

df1 <- subset(df, region =="Europe" & day == '2020/05/02', select = c(countryName, death))
head(df1)
#                 countryName death
# 102                 Andorra    44
# 612                 Albania    31
# 1020                Austria   596
# 1428 Bosnia and Herzegovina    72
# 1734                Belgium  7765
# 1938               Bulgaria    72

如果你想找昨天和上一个报告号码的区别(应该是 "前一天",但没有任何验证),那么就需要一个 dplyr 可以

library(dplyr)
as_tibble(df) %>%
  arrange(day) %>%
  group_by(countryCode) %>%
  mutate_at(vars(confirmed, recovered, death), list(~ c(NA, diff(.)))) %>%
  slice(n())
# Warning: Factor `countryCode` contains implicit NA, consider using `forcats::fct_explicit_na`
# Warning: Factor `countryCode` contains implicit NA, consider using `forcats::fct_explicit_na`
# # A tibble: 212 x 9
# # Groups:   countryCode [212]
#    day        countryCode countryName          region     lat   lon confirmed recovered death
#    <fct>      <fct>       <fct>                <fct>    <dbl> <dbl>     <int>     <int> <int>
#  1 2020/05/02 AD          Andorra              Europe    42.5   1.5         2         4     1
#  2 2020/05/02 AE          United Arab Emirates Asia      24    54         561       121     8
#  3 2020/05/02 AF          Afghanistan          Asia      33    65         134        21     4
#  4 2020/05/02 AG          Antigua and Barbuda  Americas  17.0 -61.8         0         0     0
#  5 2020/05/02 AI          Anguilla             Americas  18.2 -63.2         0         0     0
#  6 2020/05/02 AL          Albania              Europe    41    20           7        31     0
#  7 2020/05/02 AM          Armenia              Asia      40    45         125        33     0
#  8 2020/05/02 AO          Angola               Africa   -12.5  18.5         5         0     0
#  9 2020/05/02 AR          Argentina            Americas -34   -64           0        28     4
# 10 2020/05/02 AT          Austria              Europe    47.3  13.3        27        70     7
# # ... with 202 more rows

我认为这是一个安全的假设,但我依靠对非格式化的 day 这里。我们可以随时转换为 Date 明确地与 mutate(day = as.Date(day, format = "%Y/%m/%d")) 在...之前 arrange 要 "完整"。

因为几个月前我就挑战过自己,要更熟练地使用。data.table,这里有一个用该方言的替代解决方案。(注意,我用的是 magrittr's %>% 操作符在这里将处理的每个阶段分解出来;这可以很容易地完成,而不需要作为一个更传统的 data.table-链处理)。)

library(data.table)
cols <- c("confirmed", "recovered", "death")
as.data.table(df) %>%
  .[, (cols) := lapply(.SD, function(a) c(NA, diff(a))), by = .(countryName), .SDcols = cols] %>%
  .[, .SD[.N,], by = .(countryName) ]
#               countryName        day countryCode   region       lat       lon confirmed recovered death
#   1:              Andorra 2020/05/02          AD   Europe  42.50000   1.50000         2         4     1
#   2: United Arab Emirates 2020/05/02          AE     Asia  24.00000  54.00000       561       121     8
#   3:          Afghanistan 2020/05/02          AF     Asia  33.00000  65.00000       134        21     4
#   4:  Antigua and Barbuda 2020/05/02          AG Americas  17.05000 -61.80000         0         0     0
#   5:             Anguilla 2020/05/02          AI Americas  18.25000 -63.16667         0         0     0
#  ---                                                                                                   
# 208:                Yemen 2020/05/02          YE     Asia  15.00000  48.00000         3         0     0
# 209:              Mayotte 2020/05/02          YT   Africa -12.83333  45.16667         0         0     0
# 210:         South Africa 2020/05/02          ZA   Africa -29.00000  24.00000       385       167     7
# 211:               Zambia 2020/05/02          ZM   Africa -15.00000  30.00000        10         1     0
# 212:             Zimbabwe 2020/05/02          ZW   Africa -20.00000  30.00000        -6         0     0