English 中文(简体)
使用最后一行合并两个数据框
原标题:Join two data frames using the last row

我们的气象站每周记录每天的天气数据(大约7行/观察),我们每周收集一次疾病数据(每周一次观察/记录),我如何能加入 weather_df 最后一行的weather_df ,同时保持其他单元格的空白?我试过使用左join,但它错误地将一个数值从disation_df 添加到每周的所有天数,而不是在周末记录疾病数据。

可复制示例

weather_df <- structure(list(week = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1", "2"), class = "factor"), 
    date = structure(c(1401062400, 1401148800, 1401235200, 1401321600, 
    1401408000, 1401494400, 1401580800, 1401667200, 1402272000, 
    1402358400, 1402444800, 1402531200, 1402617600, 1402704000, 
    1402790400, 1402876800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    rainfall = c(0.8, 0, 1.4, 3, 0, 1, 0, 0, 3, 0, 2.4, 1.2, 
    0, 0, 0, 0), temperature = c(23.6, 21.9, 22.6, 20.1, 21.9, 
    20.3, 17.3, 15.5, 23.1, 22.4, 21.1, 20.4, 21.2, 21.5, 20.2, 
    20.4)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", 
"data.frame"))


disease_df <- structure(list(week = structure(1:2, levels = c("1", "2"), class = "factor"), 
    disease_intensity = c(0.4, 0.3)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))



combine_df <- left_join(weather_df, disease_df, by = "week")

以下是输出

正如你所见,第1周的所有天数中加0.4,第2周的所有天数中加0.3,第2周的所有天数中加0.3,我只想在这两个星期的最后几天中加这些,同时保持其他单元格的空白。

最佳回答

你可以使用许多联合技术,但在这种情况下,它更容易加强联合标准。我添加了两个标准,即每周的一天和当日的累计计算,因为每周的计算包括每周的两天。

从那里开始,正常的左派加入工作

library(tidyverse)

weather_augmented_tbl <- weather_df |> 
  group_by(week) |> 
  mutate(
    wday=wday(date)
    ,n_wday=cumsum(if_else(wday==2,1,0))
  )

disease_augmented_tbl <- disease_df |> 
  mutate(
    wday=2
    ,n_wday=2
  )

left_join(
  weather_augmented_tbl
  ,disease_augmented_tbl
  ,by=join_by(
    week,wday,n_wday
  )
)
问题回答

您可以将 disacise_df weather_df 与“ 最后一次匹配” 合并, 并将结果加入 weather_df

library(dplyr)

left_join(disease_df, weather_df, by = "week", multiple = "last") %>%
  left_join(weather_df, .)

另一个选项是在 weather_df 中创建一个 flag 列,标明每个星期的最后一天,然后合并到 disacise_df

weather_df %>%
  mutate(flag = row_number() == which.max(date), .by = week) %>%
  left_join(mutate(disease_df, flag = TRUE), by = join_by(week, flag)) %>%
  select(-flag)

Output
# Joining with `by = join_by(week, date, rainfall, temperature)`
# # A tibble: 16 × 5
#    week  date                rainfall temperature disease_intensity
#    <fct> <dttm>                 <dbl>       <dbl>             <dbl>
#  1 1     2014-05-26 00:00:00      0.8        23.6              NA  
#  2 1     2014-05-27 00:00:00      0          21.9              NA  
#  3 1     2014-05-28 00:00:00      1.4        22.6              NA  
#  4 1     2014-05-29 00:00:00      3          20.1              NA  
#  5 1     2014-05-30 00:00:00      0          21.9              NA  
#  6 1     2014-05-31 00:00:00      1          20.3              NA  
#  7 1     2014-06-01 00:00:00      0          17.3              NA  
#  8 1     2014-06-02 00:00:00      0          15.5               0.4
#  9 2     2014-06-09 00:00:00      3          23.1              NA  
# 10 2     2014-06-10 00:00:00      0          22.4              NA  
# 11 2     2014-06-11 00:00:00      2.4        21.1              NA  
# 12 2     2014-06-12 00:00:00      1.2        20.4              NA  
# 13 2     2014-06-13 00:00:00      0          21.2              NA  
# 14 2     2014-06-14 00:00:00      0          21.5              NA  
# 15 2     2014-06-15 00:00:00      0          20.2              NA  
# 16 2     2014-06-16 00:00:00      0          20.4               0.3




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签