As you allude to, the default behavior in R is to treat character columns in data frames as a special data type, called a factor
. This is a feature, not a bug, but like any useful feature if you re not expecting it and don t know how to properly use it, it can be quite confusing.
<代码>要素代码>系指在统计中经常出现的分类(而不是数字或数量)变量。
您所用的分级业务实际上通常都有效。 也就是说,它们将回复到你数据框架的正确分类。 然而,该变量的<代码>级代码>属性保持不变,其原有水平仍然all<>。
This means that any method written in R that is designed to take advantage of factors
will treat that column as a categorical variable with a bunch of levels, many of which just aren t present. In statistics, one often wants to track the presence of missing levels of categorical variables.
我实际上也宁愿与<代码>stringsAsctors = FALSE合作,但许多人因能够减少编码可携带性而fr。 (TRUE
是缺省,因此,与他人分享你的代码可能会有风险,除非你在每一封面上打上电话options
。)
一种可能更方便的解决办法,特别是在数据框架方面,是将<条码>下 次<>>>>>> 和<条码>>级代码>功能结合起来:
subsetDrop <- function(...){
droplevels(subset(...))
}
并且利用这一职能,以可保证的方式提取任何未使用的数据基。