Question

我正在利用R清理数据集。我的部分数据集涉及:

record_id | organization | other_work_loc
1               12            CCC
2               12            AMG
3               12            TAO
4                1
5                2
6                7

其他工作地点是一个自由回答的列，其输入变化非常大。只有当组织= 12时才有数据。我想将组织和其他工作地点数据重新分类为一个列（org_cat），其中包括三个类别（1、2、3）。大多数其他工作地点数据将被重新分类为3。

dataset<- dataset %>% mutate(org_cat = case_when (organization == 1 | organization == 2 ~  1 ,
                                                            organization >= 3 & organization <12 ~  2 ,
                                                            other_work_loc ==  CCC  | other_work_loc == AMG ~  3 ))

这个代码是有效的，但在other_work_loc中有100个自由回应。大多数将被重新归类为3。然而，22个需要分类为1或2，我想知道是否有比编写每个单独回应的重新编码更优雅的方法？

Answer 1

使用Excel或类似工具创建一个数据框，其中包含列、和，其中最后两个是您的自由回答答案及其对应的数值替换值 - 基本上是一个查找表。我把它命名为，它看起来像这样：

organization    other_work_loc  newvar
12              CCC             3
12              AMG             3
12              TAO             2
1                               1

我指定了以下数据组:df.csv,在装载tidyverse后,使用left_join<>/code>进行替换:


df <- read_csv( df.csv ) %>% print()
lut <- read_csv( lut.csv ) %>% print()

left_join(df, lut)

Joining with `by = join_by(organization, other_work_loc)`
# A tibble: 6 x 4
  record_id organization other_work_loc newvar
      <dbl>        <dbl> <chr>           <dbl>
1         1           12 CCC                 3
2         2           12 AMG                 3
3         3           12 TAO                 2
4         4            1 NA                  1
5         5            2 NA                 NA
6         6            7 NA                 NA

关键点：

Even though I left other_work_loc blank in the LUT for organization #1, it was able to successfully match to that line of your original file, just based on organization.
I didn t fill out the entire LUT, so organizations #2 and #7 ended up with NA for newvar.
For organization #12, you much more easily edit the LUT file to add additional free responses and their corresponding newvar entries, than write additional lines of case_when code.

友情链接