English 中文(简体)
R: 对data.frame进行分组和扩展,以便在一列中只包括可能的名称对
原标题:R: grouping and expanding the data.frame to include only possible pairs of names in a column

我的目标是R中扩展我的data.frame,以包括R中一列中可能的组合(但不是所有可能的组合)。类似于扩展.grid命令,但该函数为您提供了所有可能的组合,而不仅仅是现有的组合。

首先,我需要按第1列中的每个因素进行分组,并保留第2列中包含的信息。在第3列中,我有动物名称的字符串。我想逐行查找该列中出现的每个可能的对(但不是所有可能的对)。例如,如果我在前两排有恐惧之翼和Scorcher,那将是一对:恐惧之翼-Scorcher——它不应该包括Scorcher-恐惧之翼。然而,如果第4行和第5行是霸王龙和霸王龙,这对应该出现一次:霸王龙-T-Rex,因为霸王龙出现在动物栏的两个单独的行中。如果霸王龙出现在三排,那么这对霸王龙应该出现三次,等等。

最后,对应该将data.frame扩展2列以存储对。换言之,“恐惧之翼”和“Scorcher”应该各自在各自独立的列中,但在同一行中。

我已经手动将这张图片放在一起,以显示我的输出应该来自我所拥有的数据帧(注意:Area_1和Area_2被分开只是为了在一个屏幕截图中显示结果)。左边:我在第一排放了箭头,显示了想要的组合,无畏翼。右边:所有Area_1和Area_2的期望结果。

对于期望的结果,对于Area_1,不应出现Dreadwing-Dreadwing对,因为对于Area_2,Dreadwings不出现在任何其他行中。然而,霸王龙出现在两个单独的行中,因此霸王龙-T-Rex的组合应该存在,以及每行霸王龙与每行水翼的组合。因此,4个T-Rex-Waterwing组合。

可复制数据

创建数据帧

v <- c(rep("Area_1", 7), rep("Area_2", 7))
w <- c(rep("Forest", 7), rep("Cave", 7))
y <- c("Waterwing", "Scorcher", "Snapmaw", "T-Rex", "T-Rex", "Dreadwing", 
"Waterwing", "Snake", "T-Rex", "T-Rex", "Dreadwing", "Snapmaw", "Scorcher", 
"Waterwing")

stack_df <- data.frame(Area = v, Location = w, Animals = y)
stack_df <- stack_df[order(stack_df$Area, stack_df$Location, stack_df$Animals), ]
row.names(stack_df) <- 1:nrow(stack_df)

使用tidyR指南,我发现命令expand嵌套命令(仅保留数据中出现的组合)结合使用是不起作用的。例如:

library(tidyr)    
stack_df %>%
    dplyr::group_by(Area) %>%
    expand(nesting(Location, Animals, Animals))

将仅返回11/14行。

我已经尝试了多种使用expandcrossing命令的方法。但是,与expand.grid命令一样,这些命令提供了所有可能的组合。

尽管如此,使用expand命令是我最接近目标的方法。

stack_df %>%
dplyr::group_by(Area) %>%
expand(Location, Animals, Animals)

正如你所看到的,所有的可能性都包括在内,这不是想要的结果。

关于我该怎么做有什么想法吗?

问题回答

在我看来,你似乎想找到成对动物的所有组合(在区域/位置组内),其中成对动物中的第一只动物出现在成对动物中第二只动物之前的一排。

我们可以通过添加行号索引和对行号进行不等式约束的自联接来实现这一点。(这需要dplyr版本>;1.1.0)

library(dplyr)
stack_df = stack_df |>
  mutate(group_i = row_number(), .by = c(Area, Location))

stack_df |>
  inner_join(
    stack_df,
    by = join_by(Area, Location, group_i < group_i),
    suffix = c("..2", "..3")  
  ) |>
  select(-starts_with("group_"))
#      Area Location Animals..2 Animals..3
# 1  Area_1   Forest  Dreadwing   Scorcher
# 2  Area_1   Forest  Dreadwing    Snapmaw
# 3  Area_1   Forest  Dreadwing      T-Rex
# 4  Area_1   Forest  Dreadwing      T-Rex
# 5  Area_1   Forest  Dreadwing  Waterwing
# 6  Area_1   Forest  Dreadwing  Waterwing
# 7  Area_1   Forest   Scorcher    Snapmaw
# 8  Area_1   Forest   Scorcher      T-Rex
# 9  Area_1   Forest   Scorcher      T-Rex
# 10 Area_1   Forest   Scorcher  Waterwing
# 11 Area_1   Forest   Scorcher  Waterwing
# 12 Area_1   Forest    Snapmaw      T-Rex
# 13 Area_1   Forest    Snapmaw      T-Rex
# 14 Area_1   Forest    Snapmaw  Waterwing
# 15 Area_1   Forest    Snapmaw  Waterwing
# 16 Area_1   Forest      T-Rex      T-Rex
# 17 Area_1   Forest      T-Rex  Waterwing
# 18 Area_1   Forest      T-Rex  Waterwing
# 19 Area_1   Forest      T-Rex  Waterwing
# 20 Area_1   Forest      T-Rex  Waterwing
# 21 Area_1   Forest  Waterwing  Waterwing
# 22 Area_2     Cave  Dreadwing   Scorcher
# 23 Area_2     Cave  Dreadwing      Snake
# 24 Area_2     Cave  Dreadwing    Snapmaw
# 25 Area_2     Cave  Dreadwing      T-Rex
# 26 Area_2     Cave  Dreadwing      T-Rex
# 27 Area_2     Cave  Dreadwing  Waterwing
# 28 Area_2     Cave   Scorcher      Snake
# 29 Area_2     Cave   Scorcher    Snapmaw
# 30 Area_2     Cave   Scorcher      T-Rex
# 31 Area_2     Cave   Scorcher      T-Rex
# 32 Area_2     Cave   Scorcher  Waterwing
# 33 Area_2     Cave      Snake    Snapmaw
# 34 Area_2     Cave      Snake      T-Rex
# 35 Area_2     Cave      Snake      T-Rex
# 36 Area_2     Cave      Snake  Waterwing
# 37 Area_2     Cave    Snapmaw      T-Rex
# 38 Area_2     Cave    Snapmaw      T-Rex
# 39 Area_2     Cave    Snapmaw  Waterwing
# 40 Area_2     Cave      T-Rex      T-Rex
# 41 Area_2     Cave      T-Rex  Waterwing
# 42 Area_2     Cave      T-Rex  Waterwing




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签