English 中文(简体)
R 与多个没有顺序的分辩方对齐吗?
原标题:R strsplit with multiple unordered split arguments?
  • 时间:2012-05-24 13:45:51
  •  标签:
  • r
  • split

给定字符字符字符串

test_1<-"abc def,ghi klm"
test_2<-"abc, def ghi klm"

我希望获得

"abc"
"def"
"ghi"

然而,使用strsplit,人们必须知道弦中分解值的顺序,因为strsplit使用第一个值来进行第一个分解,第二个值来进行第二个分解,然后进行再循环。

但是,这并不:

strsplit(test_1, c(",", " "))
strsplit(test_2, c(" ", ","))

strsplit(test_2, split=c("[:punct:]","[:space:]"))[[1]]

我想在任何我发现我分裂价值的地方 一步一步地把弦分开

最佳回答

事实上, strsplit 也使用 grep 模式。 (一个逗号是一个regex 元字符, 而一个空格则不是; 因此在模式参数参数中需要双重排除逗号。 因此, 使用 < code>\s > 将比必要性更有助于改进可读性 :

> strsplit(test_1, "\, |\,| ")  # three possibilities OR ed
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_2, "\, |\,| ")
[[1]]
[1] "abc" "def" "ghi" "klm"

不同时使用 (注意SO没有显示的额外空间), 你就会得到一些字符( 0) 值。 如果我写了的话, 可能更清楚:

> strsplit(test_2, "\,\s|\,|\s")
[[1]]
[1] "abc" "def" "ghi" "klm"

@Fojtasek如此正确:使用字符类往往会简化任务,

> strsplit(test_2, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"

> strsplit(test_1, "[, ]+")
[[1]]
[1] "abc" "def" "ghi" "klm"
问题回答

如果您不喜欢正则表达式, 您可以多次调用 strsplit () :

strsplits <- function(x, splits, ...)
{
    for (split in splits)
    {
        x <- unlist(strsplit(x, split, ...))
    }
    return(x[!x == ""]) # Remove empty values
}

strsplits(test_1, c(" ", ","))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c(" ", ","))
# "abc" "def" "ghi" "klm"

添加示例的 < 强度 > 更新

strsplits(test_1, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"
strsplits(test_2, c("[[:punct:]]","[[:space:]]"))
# "abc" "def" "ghi" "klm"

但如果您要使用正则表达式, 您最好使用 @DWins 方法 :

strsplit(test_1, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"
strsplit(test_2, "[[:punct:][:space:]]+")[[1]]
# "abc" "def" "ghi" "klm"

您可以使用 strsplit( 测试_ 1, "\W")

 test_1<-"abc def,ghi klm"
 test_2<-"abc, def ghi klm"
 key_words <- c("abc","def","ghi")
 matches <- str_c(key_words, collapse ="|")
 str_extract_all(test_1, matches)
 str_extract_all(test_2, matches)




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签