English 中文(简体)
R 组合多个观测
原标题:Combine multiple observations in R

我有一个平坦的推文档案,

例如:

user1, hashtag1, hashtag2 
user1, hashtag3, hashtag4 
user2, hashtag5, hashtag6 
user2, hashtag7, hashtag8

我想把它变成:

user1, hashtag1, hashtag2, hashtag3, hashtag4
user2, hashtag5, hashtag6, hashtag7, hashtag8 

有没有一种优雅的方法来做到这一点?

最佳回答

除非每个用户的标签页数始终相同, 否则我将把结果汇总到列表中。 列表的每个元素将是一个用户标签页的矢量( 可能是变量长度 ) 。

# Read in your example data
df <- read.table(text="user1, hashtag1, hashtag2 
user1, hashtag3, hashtag4 
user2, hashtag5, hashtag6 
user2, hashtag7, hashtag8", sep=",", header=FALSE, stringsAsFactors=FALSE)


lapply(split(df[-1], df[1]), function(X) unname(unlist(X)))
# $user1
# [1] " hashtag1"  " hashtag3"  " hashtag2 " " hashtag4 "
# 
# $user2
# [1] " hashtag5"  " hashtag7"  " hashtag6 " " hashtag8" 
问题回答

您正在寻找重塑。 要么是 < code> reshape 命令( 它含有痛苦的语法, 但基本上您想要从“ long” 到“ load”, 以“ user” 作为您的 id 变量), 要么是 < code> reshape2 套件, 配有 < code> melt , 后加上 < code> dcast , 会做你想做的事 。

或者,既然标签的数量可能不同,你可以使用plyr :

> colnames(x) <- c("user","tag1","tag2")
> 
> library(plyr)
> extract.hashtags <- function(x) {
+   x <- subset(x,select=c(-user))
+   mat <- as.matrix(x)
+   dim(mat) <- c(1,length(mat))
+   as.data.frame(mat)
+ }
> ddply(x, .(user), extract.hashtags )
   user       V1       V2       V3       V4
1 user1 hashtag1 hashtag3 hashtag2 hashtag4
2 user2 hashtag5 hashtag7 hashtag6 hashtag8

一种方法是使用 aggagate () 函数。 从 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form

首先,在数据中读取(你今后应在你的问题中这样做,以提供一个可复制的例子,见::

txt <- "user1, hashtag1, hashtag2 
user1, hashtag3, hashtag4 
user2, hashtag5, hashtag6 
user2, hashtag7, hashtag8"

x <- read.delim(file = textConnection(txt), header = F, sep = ",", 
        strip.white = T, stringsAsFactors = F)

然后,使用 agnetate () 将数据分成子集,并将每个子集转换为一维数组 :

aggregate(x[-1], by = x[1], function(z)
        {
            dim(z) <- c(length(z)) # Change dimensions of z to 1-dimensional array
            z
        })
#      V1     V2.1     V2.2     V3.1     V3.2
# 1 user1 hashtag1 hashtag3 hashtag2 hashtag4
# 2 user2 hashtag5 hashtag7 hashtag6 hashtag8

< 强力 > 编辑 < /强 >

只有所有用户都有相同数量的标签, 才能使用这个方法。 @Josh O Briens回答是更好的方法。





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...