English 中文(简体)
Remove variable labels attached with foreign/Hmisc SPSS import functions
原标题:

As usual, I got some SPSS file that I ve imported into R with spss.get function from Hmisc package. I m bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.

labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?

I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don t want to run SPSS and remove labels manually (don t like SPSS, nor care to install it)!

Thanks!

最佳回答

A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.

You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.

w <- spss.get( /tmp/my.sav , use.value.labels=FALSE, datevars=c( birthdate , deathdate ))

The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:

class(x[[i]]) <- NULL  # no error from assignment of empty vector

The error you report can be reproduced with this code:

> b <- 4:6
> label(b) <-  B Label 
> str(b)
Class  labelled   atomic [1:3] 4 5 6
  ..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] : 
  invalid replacement object to be a class string
问题回答

Here s how I get rid of the labels altogether. Similar to Jyotirmoy s solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]),  labelled ) 
    for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
  }
  else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

Use as follows:

my.unlabelled.df <- clear.labels(my.labelled.df)

EDIT

Here s a bit of a cleaner version of the function, same results:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]),  labelled ) 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

You can try out the read.spss function from the foreign package.

A rough and ready way to get rid of the labelled class created by spss.get

for (i in 1:ncol(x)) {
    z<-class(x[[i]])
    if (z[[1]]== labelled ){
       class(x[[i]])<-z[-1]
       attr(x[[i]], label )<-NULL
    }
}

But can you please give an example where labelled causes problems?

If I have a variable MAED in a data frame x created by spss.get, I have:

> class(x$MAED)
[1] "labelled" "factor"  
> is.factor(x$MAED)
[1] TRUE

So well-written code that expects a factor (say) should not have any problems.

Suppose:

library(Hmisc)
w <- spss.get( ... )

You could remove the labels of a variable called "var1" by using:

attributes(w$var1)$label <- NULL

If you also want to remove the class "labbled", you could do:

class(w$var1) <- NULL 

or if the variable has more than one class:

class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]

Hope this helps!

Well, I figured out that unclass function can be utilized to remove classes (who would tell, aye?!):

library(Hmisc)
# let s presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"

It s not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I ll check it as an answer...





相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签