一种方法是使用 aggagate ()
函数。 从 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Splits the data into subsets, computes summary statistics for each,
and returns the result in a convenient form
首先,在数据中读取(你今后应在你的问题中这样做,以提供一个可复制的例子,见::
txt <- "user1, hashtag1, hashtag2
user1, hashtag3, hashtag4
user2, hashtag5, hashtag6
user2, hashtag7, hashtag8"
x <- read.delim(file = textConnection(txt), header = F, sep = ",",
strip.white = T, stringsAsFactors = F)
然后,使用 agnetate ()
将数据分成子集,并将每个子集转换为一维数组 :
aggregate(x[-1], by = x[1], function(z)
{
dim(z) <- c(length(z)) # Change dimensions of z to 1-dimensional array
z
})
# V1 V2.1 V2.2 V3.1 V3.2
# 1 user1 hashtag1 hashtag3 hashtag2 hashtag4
# 2 user2 hashtag5 hashtag7 hashtag6 hashtag8
< 强力 > 编辑 < /强 >
只有所有用户都有相同数量的标签, 才能使用这个方法。 @Josh O Briens回答是更好的方法。