English 中文(简体)
将价值分类为布拉克谢
原标题:Grouping Values into Brackets
  • 时间:2012-01-11 23:47:07
  •  标签:
  • r

我对将数据归入特定类别有疑问。

一般来说,如果我有一个因素变数,我将做如下一些事情,将数据编为一种优先模式:

educ = NA
educ[educ2 %in% levels(educ2)[c(5,8)]] <- "HS or Some College"
educ[educ2 %in% levels(educ2)[2:3]] <- "College Degree"
educ[educ2 %in% levels(educ2)[c(4,6)]] <- "Advanced Degree" 
educ[educ2 %in% levels(educ2)[c(1,7,9)]] <- NA
educ = factor(educ)

然而,我竭力试图重新组合一个系数变量,即技术资源评估,这个比率为10 000+。 数据结构如下:

> levels(wj$time)
    [1] "0:00:05"  "0:00:07"  "0:00:08"  "0:00:10"  "0:00:13"  "0:00:15"  "0:00:18"  "0:00:23"  "0:00:31"  "0:00:34"  "0:00:36" 
   [12] "0:00:39"  "0:00:41"  "0:00:47"  "0:00:48"  "0:00:54"  "0:00:55"  "0:00:56"  "0:00:59"  "0:01:01"  "0:01:02"  "0:01:03" 
   [23] "0:01:13"  "0:01:17"  "0:01:31"  "0:01:33"  "0:01:41"  "0:01:44"  "0:01:48"  "0:01:50"  "0:01:52"  "0:01:53"  "0:01:55" 
   [34] "0:02:08"  "0:02:12"  "0:02:13"  "0:02:21"  "0:02:26"  "0:02:27"  "0:02:30"  "0:02:32"  "0:02:33"  "0:02:36"  "0:02:37" 
   [45] "0:02:38"  "0:02:43"  "0:02:45"  "0:02:53"  "0:02:56"  "0:03:07"  "0:03:15"  "0:03:19"  "0:03:21"  "0:03:22"  "0:03:24" 
   [56] "0:03:30"  "0:03:36"  "0:03:39"  "0:03:41"  "0:03:49"  "0:03:56"  "0:03:59"  "0:04:02"  "0:04:04"  "0:04:07"  "0:04:10" 
   [67] "0:04:11"  "0:04:12"  "0:04:14"  "0:04:16"  "0:04:17"  "0:04:19"  "0:04:22"  "0:04:27"  "0:04:28"  "0:04:30"  "0:04:37" 
   [78] "0:04:39"  "0:04:41"  "0:04:49"  "0:04:51"  "0:04:52"  "0:04:53"  "0:04:54"  "0:05:05"  "0:05:06"  "0:05:20"  "0:05:22" 

在系数水平如此之多的情况下,我不敢确定如何迅速将数据排入具体的括号内。 我愿将其编为<代码>0:12:00-0:05:00和0:05:01至0:0>>:<>>。 有了这么多的系数,我就在如何确定何时开始和结束装.方面略微失去。 谁能提供任何帮助? 有了10 000加桶,就成了我如何传统上做事的问题。

感谢!

最佳回答

你们可以把时装分为其组成部分:然后,桶子很容易计算。

# Sample data
n <- 10
d <- data.frame(
  time = paste( 
    sample(0:23, n, replace=TRUE), 
    sample(0:59, n, replace=TRUE), 
    sample(0:59, n, replace=TRUE), 
    sep=":" 
  ),
  value = rnorm(n)
)

# Split the "time" column into its components
d$time <- as.character( d$time )
times <- strsplit( d$time, ":" )
times <- lapply( times, as.numeric )
times <- do.call(rbind, times)
colnames(times) <- c("hour", "minute", "second")
d <- cbind(times, d)

# Build the buckets
d$bucket <- paste(
  sprintf( "%02d:%02d:00", d$hour, floor( d$minute / 5 ) * 5 ),
  sprintf( "%02d:%02d:59", d$hour, floor( d$minute / 5 ) * 5 + 4 ),
  sep=" to "
)
问题回答

您正在讨论的问题是,你以作为因素储存的具体特征格式所代表的有效连续变量。 这里不宜考虑一个因素,因为这一水平只是代表你的数据中会出现哪些价值观,而不是预先确定的一套可能的价值观。 它是个性矢量的事实是,这代表了在数据类型(即时间)格式方面的具体公约。 我会认为时间是:分钟:秒钟,但鉴于你的例子,时间可能为天(?):小时:分钟。 如果时间是:分钟:秒,最好把时间作为<条码>。 载于<代码>chron的包裹。 如果你这样做,那么问题就变成了如何将连续的变数分为不同的群体。 采用<代码>cut功能。

• @Brian Diggs &@Vincent Zoonekynd,我建议履行以下职能:

?strptime
?POSIXlt
?cut.POSIXt


#create factorized time vector within data frame
n <- 10
d <- data.frame(
  time =  as.factor(paste( 
    sample(0:23, n, replace=TRUE), 
    sample(0:59, n, replace=TRUE), 
    sample(0:59, n, replace=TRUE), 
    sep=":" 
  )),
  value = rnorm(n)
)

#convert to time format, then apply cuts per hour
(d$time<- cut.POSIXt(strptime(d$time, format="%H:%M:%S"), breaks="hour"))

如果你不时浪费时间,你可以使用“日”或其他东西。 也可查到。 回答你的问题的链接,我通过探讨“向时铺设”的方法。

HTH.





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签