English 中文(简体)
将其他语文与R混为一谈
原标题:Mixing other languages with R
  • 时间:2011-10-25 16:13:31
  •  标签:
  • r
  • unix

我将R用于我的大部分统计分析。 然而,清洁/处理数据,特别是在处理1Gb+的大小时,非常麻烦。 因此,我使用通用的UNIX工具。 但是,我的问题是,可以说,能否在一次区域会议的中间以互动的方式处理这些议题? 例如:请说<代码>file1是来自一个R过程的产出数据集,有100个浏览。 从此,在我的下一个区域进程中,我需要一个第1和2栏的具体子组,file2,可通过<代码>cut和awk.轻易提取。 因此,工作流程类似:

Some R process => file1
cut --fields=1,2 <file1 | awk something something >file2
Next R process using file2

如果这是一个ool问题,预先道歉。

最佳回答

处理此事项(如有必要,可参考):

# 1
DF <- read.table(pipe("cut -fields=1,2 < data.txt| awk something_else"))

或纯R:

# 2
DF <- read.table("data.txt")[1:2]

或甚至读到不想要的领域,假设有4个领域:

# 3
DF <- read.table("data.txt", colClasses = c(NA, NA, "NULL", "NULL"))

如果我们知道我们想要头两个领域,但不知道还有多少其他领域:

# 3a
n <- count.fields("data.txt")[1]
read.table("data.txt", header = TRUE, colClasses = c(NA, NA, rep("NULL", n-2)))

可使用 package素包。 举例来说,我们假设一个文件,data.csv,所希望的领域称为ab。 如果它没有立案卷,则使用适当的论据以<代码>read.csv.sql具体指明其他分离者,等等:

# 4
library(sqldf)
DF <- read.csv.sql("data.csv", sql = "select a, b from file")
问题回答

I think you may be looking for littler which integrates R into the Unix command-line pipelines.

下面是计算<代码>/bin文档尺寸分布的简单例子:

edd@max:~/svn/littler/examples$ ls -l /bin/ | awk  {print $5}  | ./fsizes.r 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA s 
      4    5736   23580   61180   55820 1965000       1 

  The decimal point is 5 digit(s) to the right of the |

   0 | 00000000000000000000000000000000111111111111111111111111111122222222+36
   1 | 01111112233459
   2 | 3
   3 | 15
   4 | 
   5 | 
   6 | 
   7 | 
   8 | 
   9 | 5
  10 | 
  11 | 
  12 | 
  13 | 
  14 | 
  15 | 
  16 | 
  17 | 
  18 | 
  19 | 6

edd@max:~/svn/littler/examples$ 

and it takes for that is three lines:

edd@max:~/svn/littler/examples$ cat fsizes.r 
#!/usr/bin/r -i

fsizes <- as.integer(readLines())
print(summary(fsizes))
stem(fsizes)

See ?system for how to run shell commands from within R.

坚持扫盲方案的传统,如:org-mode和org-babel,将尽力做到:

您可将几种不同的方案拟定语文合并为一个文字,然后按顺序单独执行,输出结果或守则......

只靠 p、ash、 R、 R和许多其他东西,就算是徒劳的。 http://orgmode.org/“rel="nofollow”

除此之外,我认为,“org-mode”和“babel”是撰写甚至纯里面文字的完美方式。

在R中与它合作之前编制数据是相当常见的,我有许多用于“九”和“伯尔”前处理的文件,并且在不同时间为MySQL、MongoDB、Hadoop、C等保留了文字/表格。

然而,如果你在R进行某些类型的预处理,那么你就能够提高耐用性的里程。 你可以尝试提出新问题,重点放在其中一些细节上。 例如,为了将大量数据输入编篡的档案,我似乎将<代码>bigmemory。 另一个例子是答案(特别是JD Long s)。 问题





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签