English 中文(简体)
内部翻译(编码)错误
原标题:translation (recoding) error in r

这方面的例子不多:

X1 <- c("AC", "AC", "AC", "CA", "TA", "AT", "CC", "CC")
X2 <- c("AC", "AC", "AC", "CA", "AT", "CA", "AC", "TC")
X3 <- c("AC", "AC", "AC", "AC", "AA", "AT", "CC", "CA")
mydf1 <- data.frame(X1, X2, X3)

投入数据框架

  X1 X2 X3
1 AC AC AC
2 AC AC AC
3 AC AC AC
4 CA CA AC
5 TA AT AA
6 AT CA AT
7 CC AC CC
8 CC TC CA

职能

# Function 
atgc <- function(x) {
 xlate <- c( "AA" = 11, "AC" = 12, "AG" = 13, "AT" = 14,
"CA"= 12, "CC" = 22, "CG"= 23,"CT"= 24,
 "GA" = 13, "GC" = 23, "GG"= 33,"GT"= 34,
 "TA"= 14,  "TC" = 24, "TG"= 34,"TT"=44,
"ID"= 56, "DI"= 56, "DD"= 55, "II"= 66
 )
  x =   xlate[x]
 }
outdataframe <- sapply (mydf1, atgc)
outdataframe
   X1 X2 X3
AA 11 11 12
AA 11 11 12
AA 11 11 12
AG 13 13 12
CA 12 12 11
AC 12 13 13
AT 14 11 12
AT 14 14 14

问题,在产出而不是11方面,AC不等于12,对他人也是如此。 Just!

( Exta: Also I do not know how to get rid of the rownames.)

最佳回答

Just use apply and transpose:

t(apply (mydf1, 1, atgc))

使用<条码>应用

  1. www.un.org/Depts/DGACM/index_spanish.htm 在建立您的数据框架时,即:

    mydf1 <- data.frame(X1, X2, X3, stringsAsFactors=FALSE)
    

  2. 将您的最后一行改为:x = xlate[as.vector(x)]

问题回答

“组合功能”可使用目标对应媒介即“组合”类的系数论据:

atgc <- function(fac){ c(11, 12, 13, 14, 
12, 22, 23, 24, 
13, 23, 33, 34, 
14, 24, 34,44, 
56, 56, 55, 66 )[ 
match(fac, 
  c("AA", "AC", "AG", "AT",
    "CA", "CC", "CG","CT",
    "GA", "GC", "GG","GT" ,
    "TA",  "TC", "TG","TT",
    "ID", "DI", "DD", "II") )
                ]}
#The match function returns an index that is designed to pull from a vector.
 sapply(mydf1, atgc)
     X1 X2 X3
[1,] 12 12 12
[2,] 12 12 12
[3,] 12 12 12
[4,] 12 12 12
[5,] 14 14 11
[6,] 14 12 14
[7,] 22 12 22
[8,] 22 24 12

这样,你只得在矩阵中提供每封信的替换值,而不必进行双重核对,以确保你考虑所有组合并正确匹配,尽管你的例子表明,组合有限。

1. 界定具有价值及其替代品的清单:

trans <- list(c("A","1"),c("C","2"),c("G","3"),c("T","4"),
  c("I","6"),c("D","5"))

Define replacement function using gsub()

atgc2 <- function(myData, x) gsub(x[1], x[2], myData)

创建matrix,替换价值(在此情况下,将mydf1转换为gsub(<>>所希望的矩阵回归特性值),但您在进行之前要核对这一数值是否与任何其他数据一致)

mymat <- Reduce(atgc2, trans, init = as.matrix(mydf1))

<代码>mymat中的数值仍按原样的顺序排列,因此AC” = “12” and CA” = “21”,从而重新排列(并将其转换为数字数值)。

ansVec <- sapply( strsplit( mymat, split = ""),
  function(x) as.numeric( paste0( sort( as.numeric(x) ), collapse = "")))

标语ansVec 这是一种病媒,因此将其转化为数据。 框架

( mydf2 <- data.frame( matrix( ansVec, nrow = nrow(mydf1) ) ) )
#   X1 X2 X3
# 1 12 12 12
# 2 12 12 12
# 3 12 12 12
# 4 12 12 12
# 5 14 14 11
# 6 14 12 14
# 7 22 12 22
# 8 22 24 12

对于这种情况,其他答案肯定会更快。 然而,由于替代行动变得更加复杂,我认为这一解决办法可能带来一些好处。 但是,这一方法的一个方面是核对<编码>“ATTGCG”,用于TTG>

实际上,我认为你希望代表你们的原始病媒作为因素,因为它们代表了一套明确的等级(DNA dinucleotides),而不是任意的特性价值。

lvls = c("AA", "AC", "AG", "AT", "CA", "CC", "CG", "CT", "GA", "GC", 
         "GG", "GT", "TA", "TC", "TG", "TT", "ID", "DI", "DD", "II")
X1 <- factor(c("AC", "AC", "AC", "CA", "TA", "AT", "CC", "CC"), levels=lvls)
X2 <- factor(c("AC", "AC", "AC", "CA", "AT", "CA", "AC", "TC"), levels=lvls)
X3 <- factor(c("AC", "AC", "AC", "AC", "AA", "AT", "CC", "CA"), levels=lvls)
mydf1 <- data.frame(X1, X2, X3)

同样,“11”是一个因素,而不是11个因素。 因此,各层次之间的分布图

xlate <- c("AA" = "11", "AC" = "12", "AG" = "13", "AT" = "14",
           "CA"= "12", "CC" = "22", "CG"= "23","CT"= "24",
           "GA" = "13", "GC" = "23", "GG"= "33","GT"= "34",
           "TA"= "14",  "TC" = "24", "TG"= "34","TT"="44",
           "ID"= "56", "DI"= "56", "DD"= "55", "II"= "66")

并重新确定单一变量

levels(X1) <- xlate

重新确定数据框架的所有栏目,

as.data.frame(lapply(mydf1, `levels<-`, xlate))

使用<条码> 这样做是适当的,因为即使你将其命名为<编码>outdataframe,这仍创造了一个矩阵(特性)。 这一区别对国家警察的数据可能很重要,因为每1 000个样本中,有数百万名国家警校作为矩阵将使用一个比最长的病媒R罐库长的单一病媒(在R-devel中引入大型病媒支持),而数据框架只是每个元素的病媒清单。





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签