English 中文(简体)
更新栏,有不同数据集的信息
原标题:Update columns with a information from a different dataset in tidyverse

我正试图获得R(tidyverse)以检查国家(Abreviations在两栏中都需要更新)。 以国家总清单为标题,以国家全名取代。 我尝试了<代码>ifelse的声明,但正在取得令人信服的成果。 数据集可以是:,在此可见。 任何建议都非常有益。

# head of df1 
df1 <- structure(list(CountryCode = c("BF", "BG", "BM", "BR", "CA", 
"CE", "CH", "GH", "GM", "HA", "IC", "IN", "IR", "IT", "IZ", "JO", 
"KE", "KS", "LE", "MX", "NI", "NL", "NP", "PK", "QA", "SA", "SF", 
"SP", "TC", "TD", "TU", "TW", "UK", "US", "VM", "JA", "EI"), 
    CountryName = c("BAHAMAS, THE", "BANGLADESH", "MYANMAR", 
    "BRAZIL", "CANADA", "SRI LANKA", "CHINA", "GHANA", "GERMANY", 
    "HAITI", "ICELAND", "INDIA", "IRAN", "ITALY", "SYRIA", "JORDAN", 
    "KENYA", "KOREA, REPUBLIC OF (SOUTH )", "LEBANON", "MEXICO", 
    "NIGERIA", "NETHERLANDS, THE", "NEPAL", "PAKISTAN", "QATAR", 
    "SAUDI ARABIA", "SOUTH AFRICA", "SPAIN", "UNITED ARAB EMIRATES", 
    "TRINIDAD AND TOBAGO", "TURKEY", "CHINA (TAIWAN)", "UNITED KINGDOM", 
    "UNITED STATES", "VIETNAM", "JAPAN", "IRELAND")), row.names = c(NA, 
-37L), class = c("tbl_df", "tbl", "data.frame"))

# df2 (with the NAs removed)
df2 <- structure(list(ID = c("E23531197", "Q07441087", "U79148472", 
"Y43292349", "A40257720", "Y64624318", "B97628594", "T06694322", 
"J67643839", "B11219391", "V72937405", "C22564030", "B90485180", 
"B56635832", "J44870077", "Y05510846", "X82045887", "V14380989", 
"J87108024", "X61041595", "A60573885", "Y23860927", "T74687928", 
"G60127163", "P45475749", "D40096957", "F73581752", "M76164536", 
"X57076671", "K30511805", "B41693626", "E50532024", "H47908538"
), `MA Nation` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "IN", 
NA, NA, "CA", NA, NA, NA, NA), `PR Nation` = c("PK", "BG", "MX", 
"PK", "IN", "CH", "JA", "EI", "UK", "CH", "UK", "IN", "TU", "BG", 
"IN", "CA", "CA", "PK", "CH", "BG", "LE", "IN", "IN", "TW", "BG", 
"IN", "CH", "BG", "CA", "BF", "CH", "CH", "CH")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -33L))
问题回答

页: 1 两个数据框架按国家代码分列。

library(tidyverse)
library(readxl)

df <- read_xlsx( ~Data SU23 Enroll R AY22-23 2023-08-23 2 Stack Overflow.xlsx )
countrydata <- read_xlsx( ~TBL Country codes.xlsx )

glimpse(df)
#> Rows: 542
#> Columns: 3
#> $ ID          <chr> "F31769765", "E23531197", "Q07441087", "Y92280507", "F2688…
#> $ `MA Nation` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ `PR Nation` <chr> NA, "PK", "BG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "M…
glimpse(countrydata)
#> Rows: 37
#> Columns: 2
#> $ CountryCode <chr> "BF", "BG", "BM", "BR", "CA", "CE", "CH", "GH", "GM", "HA"…
#> $ CountryName <chr> "BAHAMAS, THE", "BANGLADESH", "MYANMAR", "BRAZIL", "CANADA…

df %>%
  # Put all columns with country codes in a long layout
  pivot_longer(-ID) %>%
  filter(!is.na(value)) %>%
  # Join with the country code table
  left_join(countrydata,
            by = join_by(value == CountryCode)) %>%
  # Drop the country code column
  select(-value) %>%
  # Return to the two country columns layout
  pivot_wider(names_from = name,
              values_from = CountryName) %>%
  # Append the rows for IDs without country data
  bind_rows(df %>% filter(is.na(`PR Nation`) & is.na(`MA Nation`)))
#> # A tibble: 542 × 3
#>    ID        `PR Nation`    `MA Nation`
#>    <chr>     <chr>          <chr>      
#>  1 E23531197 PAKISTAN       <NA>       
#>  2 Q07441087 BANGLADESH     <NA>       
#>  3 U79148472 MEXICO         <NA>       
#>  4 Y43292349 PAKISTAN       <NA>       
#>  5 A40257720 INDIA          <NA>       
#>  6 Y64624318 CHINA          <NA>       
#>  7 B97628594 JAPAN          <NA>       
#>  8 T06694322 IRELAND        <NA>       
#>  9 J67643839 UNITED KINGDOM <NA>       
#> 10 B11219391 CHINA          <NA>       
#> # ℹ 532 more rows

Created on 2023-08-24 with reprex v2.0.2.

df2 |> 
  pivot_longer(-ID, names_to = "Type", values_to = "CountryCode", values_drop_na = TRUE)|>
  left_join(df1) |> 
  select(-CountryCode) |> 
  pivot_wider(values_from = "CountryName")

产出:

# A tibble: 33 × 3
   ID        `PR Nation`    `MA Nation`
   <chr>     <chr>          <chr>      
 1 E23531197 PAKISTAN       NA         
 2 Q07441087 BANGLADESH     NA         
 3 U79148472 MEXICO         NA         
 4 Y43292349 PAKISTAN       NA         
 5 A40257720 INDIA          NA         
 6 Y64624318 CHINA          NA         
 7 B97628594 JAPAN          NA         
 8 T06694322 IRELAND        NA         
 9 J67643839 UNITED KINGDOM NA         
10 B11219391 CHINA          NA         
# ℹ 23 more rows




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签