Question

我有一个平坦的推文档案,

例如:

user1, hashtag1, hashtag2 
user1, hashtag3, hashtag4 
user2, hashtag5, hashtag6 
user2, hashtag7, hashtag8

我想把它变成:

user1, hashtag1, hashtag2, hashtag3, hashtag4
user2, hashtag5, hashtag6, hashtag7, hashtag8

有没有一种优雅的方法来做到这一点?

Answer 1

除非每个用户的标签页数始终相同, 否则我将把结果汇总到列表中。列表的每个元素将是一个用户标签页的矢量( 可能是变量长度 ) 。

# Read in your example data
df <- read.table(text="user1, hashtag1, hashtag2 
user1, hashtag3, hashtag4 
user2, hashtag5, hashtag6 
user2, hashtag7, hashtag8", sep=",", header=FALSE, stringsAsFactors=FALSE)


lapply(split(df[-1], df[1]), function(X) unname(unlist(X)))
# $user1
# [1] " hashtag1"  " hashtag3"  " hashtag2 " " hashtag4 "
# 
# $user2
# [1] " hashtag5"  " hashtag7"  " hashtag6 " " hashtag8"

Answer 2

您正在寻找重塑。要么是 < code> reshape 命令( 它含有痛苦的语法, 但基本上您想要从“ long” 到“ load”, 以“ user” 作为您的 id 变量), 要么是 < code> reshape2 套件, 配有 < code> melt , 后加上 < code> dcast , 会做你想做的事。

或者,既然标签的数量可能不同,你可以使用plyr :

> colnames(x) <- c("user","tag1","tag2")
> 
> library(plyr)
> extract.hashtags <- function(x) {
+   x <- subset(x,select=c(-user))
+   mat <- as.matrix(x)
+   dim(mat) <- c(1,length(mat))
+   as.data.frame(mat)
+ }
> ddply(x, .(user), extract.hashtags )
   user       V1       V2       V3       V4
1 user1 hashtag1 hashtag3 hashtag2 hashtag4
2 user2 hashtag5 hashtag7 hashtag6 hashtag8

Answer 3

一种方法是使用 aggagate () 函数。从 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\




  Splits the data into subsets, computes summary statistics for each, 
  and returns the result in a convenient form


首先,在数据中读取(你今后应在你的问题中这样做,以提供一个可复制的例子,见::


txt <- "user1, hashtag1, hashtag2 
user1, hashtag3, hashtag4 
user2, hashtag5, hashtag6 
user2, hashtag7, hashtag8"

x <- read.delim(file = textConnection(txt), header = F, sep = ",", 
        strip.white = T, stringsAsFactors = F)


然后,使用 agnetate ()  将数据分成子集,并将每个子集转换为一维数组 :

aggregate(x[-1], by = x[1], function(z)
        {
            dim(z) <- c(length(z)) # Change dimensions of z to 1-dimensional array
            z
        })
#      V1     V2.1     V2.2     V3.1     V3.2
# 1 user1 hashtag1 hashtag3 hashtag2 hashtag4
# 2 user2 hashtag5 hashtag7 hashtag6 hashtag8


< 强力 > 编辑 < /强 >

只有所有用户都有相同数量的标签, 才能使用这个方法。 @Josh O Briens回答是更好的方法。

友情链接