English 中文(简体)
如何用文本Connection和图书馆(sqldf)在文本档案中读
原标题:How to read in a text file to R using textConnection and library(sqldf)
  • 时间:2010-02-23 20:09:02
  •  标签:
  • r

我正在尝试将一个文本文件读入R中,这样我就可以使用sqldf函数。我正在按照这个例子的步骤进行:https://stat.ethz.ch/pipermail/r-help/2008-January/152040.html,但是我有一个包含我的数据的文本文件,而不是像例子中粘贴的数据。我的文本文件如下:

#"test.table.1.0" file has this contents:
id  Source
1     A10
2     A32
3     A10
4     A25

我尝试着跟随这个例子。

test_table <- read.table(textConnection("test.table.1.0"))

I can see that the problem is that textConnection is supposed to take a character vector, and I m giving it a data.frame, but converting it via as.character also fails. Ultimately, I want to run a query like this:

sqldf("select test_table.source from test_table");
最佳回答

安尼科的评论几乎包含您所需的所有内容(以及 header=TRUE):

R> data <- read.table("test.table.1.0", header=TRUE)
R> data
  id Source
1  1    A10
2  2    A32
3  3    A10
4  4    A25
R> 

换句话说,如果你的数据在文件里,就从文件读取。如果你有数据且命令在同一处,如你提到的电子邮件,那么textConnection是很有用的。

问题回答

可以通过sqldf软件包中的read.csv.sql()或read.csv2.sql()直接进入SQLITE。

从在线手册:

链接

Example 13. read.csv.sql and read.csv2.sql read.csv.sql is an interface to sqldf that works like read.csv in R except that it also provides an sql= argument and not all of the other arguments of read.csv are supported. It uses (1) SQLite s import facility via RSQLite to read the input file into a temporary disk-based SQLite database which is created on the fly. (2) Then it uses the provided SQL statement to read the table so created into R. As the first step imports the data directly into SQLite without going through R it can handle larger files than R itself can handle as long as the SQL statement filters it to a size that R can handle. Here is Example 6c redone using this facility:

# Example 13a. 
library(sqldf) 

write.table(iris, "iris.csv", sep = ",", quote = FALSE, row.names = FALSE) 
iris.csv <- read.csv.sql("iris.csv",  
        sql = "select * from file where Sepal_Length > 5") 

# Example 13b.  read.csv2.sql.  Commas are decimals and ; is sep. 

library(sqldf) 
Lines <- "Sepal.Length;Sepal.Width;Petal.Length;Petal.Width;Species 
5,1;3,5;1,4;0,2;setosa 
4,9;3;1,4;0,2;setosa 
4,7;3,2;1,3;0,2;setosa 
4,6;3,1;1,5;0,2;setosa 
" 
cat(Lines, file = "iris2.csv") 

iris.csv2 <- read.csv2.sql("iris2.csv", sql = "select * from file where Sepal_Length > 5") 

如果你的数据不是非常大,那么使用read.table()是非常好的。如果你有几个GB的数据,你会发现read.table或read.csv可能会有点慢。在这种情况下,你可以使用sqldf包直接从R中读取数据到sqlite中。以下是一个例子:

library(sqldf)
f <- file(“test.table.1.0”)
bigdf <- sqldf(“select * from f”, dbname = tempfile(),
   file.format = list(header = T, row.names = F))

几个月前,我写了一篇有关我使用这种方法的个人经历的个人轶事

在我的经验中,将数据直接输入qlite比将数据输入R更快,但如果简单读到.csv()或读成.table()对你来说是好的,则不值得补充。





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签