English 中文(简体)
如何使用单独表格过滤数据
原标题:How to use a separate table to filter data
  • 时间:2012-05-18 18:21:32
  •  标签:
  • r

我有一些数据,我不敢肯定,如何分析。 我确信,它目前处于边缘地位,需要干预才能在R工作。 我有一套目标、规模和肤色。 我每个目标都有用户、条件及其分数。

因此,第一张表格认为:

Target, 1, 2, 3, 4, 5 ...
Size,   L, M, L, S, L ...
Color   R, B, G, B, R ...

之后,我拥有所有用户数据,其中一列是用户id,一列是该装置的,然后是每个目标分数的一列。

User, Condition, 1, 2, 3, ...
1     A          5, 2, 8, ...
1     D          2, 4, 6, ...
2     A          1, 4, 6, ...
2     B          5, 8, 3, ...

我主要想在4个条件之间运行一个“无核电网”,这样就可以看出,在L具体目标或R具体目标方面,平均分数是否相同。

我从未使用第2次表格来过滤或研究类似数据。 我如何这样做?

最佳回答

快速和 d脏的解决办法(因为我相信,有人肯定会提出更明智的解决办法,避免 lo):):

tab1 <- list(Target=1:5, Size=c("L","M","L","S","L"), Color=c("R","B","G","B","R"))
tab2 <- data.frame(rep(1:2, each=2), c("A","D","A","B"),
                   c(5,2,1,5), c(2,4,4,8), c(8,6,6,3))
names(tab2) <- c("User", "Condition", 1:3)

library(reshape)
tab2.melt <- melt(tab2, measure.vars=3:5)

for (i in 1:nrow(tab2.melt)) {
  tab2.melt$Size[i] <- tab1$Size[tab1$Target==as.numeric(tab2.melt$variable[i])]
  tab2.melt$Color[i] <- tab1$Color[tab1$Target==as.numeric(tab2.melt$variable[i])]    
}

我假设你能够将数据输入R,但如果数据结构是你在摘录中显示的,你可能希望修改上述代码。 基本上,这一想法是考虑将<条码>Target 代码作为索引(<条码>、<条码>和<条码>、Color的等级,我们在最后的<条码>数据.frame中为每一次重复测量(关于该主题)所需要。

The updated data.frame looks like:

> head(tab2.melt)
  User Condition variable value Size Color
1    1         A        1     5    L     R
2    1         D        1     2    L     R
3    2         A        1     1    L     R
4    2         B        1     5    L     R
5    1         A        2     2    M     B
6    1         D        2     4    M     B

From there, you can perform a 3-way ANOVA or study specific contrasts.

问题回答

一种可能的替代办法是将研究表与数据框架结合起来:

<>1>。 一些实例数据(在其答复中使用的是@chl,但有一个数据框架,而不是一个研究值清单):

lut <- data.frame(Target=1:5, Size=c("L","M","L","S","L"), Color=c("R","B","G","B","R"))
df1 <- data.frame(rep(1:2, each=2), c("A","D","A","B"),
                   c(5,2,1,5), c(2,4,4,8), c(8,6,6,3))
names(df1) <- c("user", "condition", 1:3)

2. with the data.table package you can transform the dataframe to a data.table and to long format (which works the same as with reshape2)

dt.melt <- melt(setDT(df1), id=c("user","condition"),
                variable.factor = FALSE)[, variable := as.numeric(variable)]

<>3>>加入研究表,以便添加<代码>Size和Colordata.table:

dt.melt[lut, on = c("variable" = "Target"), nomatch=0]

or:

lut[dt.melt, on = c("Target" = "variable")]

两者都导致:

    user condition variable value Size Color
 1:    1         A        1     5    L     R
 2:    1         D        1     2    L     R
 3:    2         A        1     1    L     R
 4:    2         B        1     5    L     R
 5:    1         A        2     2    M     B
 6:    1         D        2     4    M     B
 7:    2         A        2     4    M     B
 8:    2         B        2     8    M     B
 9:    1         A        3     8    L     G
10:    1         D        3     6    L     G
11:    2         A        3     6    L     G
12:    2         B        3     3    L     G

你们也可以用一个声音把这一点联系在一起:

dt.melt <- melt(setDT(df1), id=c("user","condition"),
                variable.factor = FALSE)[, variable := as.numeric(variable)
                                         ][lut, on = c("variable" = "Target"), nomatch=0]

With the combination of dplyr and tidyr you can achieve the same:

library(dplyr)
library(tidyr)

df.new <- df1 %>% 
  gather(variable, value, -c(1:2)) %>% 
  mutate(variable = as.numeric(as.character(variable))) %>% 
  left_join(., lut, by = c("variable" = "Target"))

它将得出同样的结果:

> df.new
   user condition variable value Size Color
1     1         A        1     5    L     R
2     1         D        1     2    L     R
3     2         A        1     1    L     R
4     2         B        1     5    L     R
5     1         A        2     2    M     B
6     1         D        2     4    M     B
7     2         A        2     4    M     B
8     2         B        2     8    M     B
9     1         A        3     8    L     G
10    1         D        3     6    L     G
11    2         A        3     6    L     G
12    2         B        3     3    L     G




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签