Question

I have a merged dataset that includes two longitudinal studies having recorded measurements from study1 and study2. I can provide an example as below -

data <- data.frame(
  weight_study1 = rnorm(100, mean = 60, sd = 5),  #numerical
  weight_study2 = rnorm(100, mean = 62, sd = 6),  #numerical
  gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE),  #categorical
  gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE)  #categorical
)

I want to create a table where I can have results from study1 one side and study2 to another side. like adjacent to each other. How could it be done?

我不知道我怎么能在这里就这个职位发言,而只是想更多地说明我所期待的表格是:

Characteristics Pilot study 1 Pilot study 2 
Age             34(3)         67(8)
Gender          Male (40%)    Male(30%)
                Female (60%)  Female (70%)

Answer 1

在表面上,解决你的问题似乎相对简单。但是,你的数据打破了。我鼓励你阅读这方面的信息,特别是如果你需要以任何方式修改这一守则。

生产你所期望产出的一种方法是,首先创建一种“简便”的数据集。举例来说,这意味着将(抄录)你的数据转换为长期格式,然后重新采用广泛的格式。你有两组100(n=200)和三个数字变量,价值很长的数据。换言之,100个浏览点变成600个。由于在一栏中你想要三个不同的变量,你的预期产出进一步复杂化。我在抽样数据中增加了“年龄”栏,以便完整。

要做到这一点,将有更多的合法途径,并且使用<条码>(<>>>>)来填满空洞。但我想到的是<代码>逆差的解决办法。根据你提供的数据,该数据库运行。这样做是不明智的,因为如果你想列入另一个变量,例如,那是很ky的(并非不可能)。但是,有许多关于SO的信息,以便在必要时帮助你实现这一目标。另一项考虑是,只有对研究组的学习时间相等,才能进行测试。可能需要对产生“df1”的法典作一些修改,以适应不同期限的群体。同样,SO是你的朋友。

用于协助<代码>imap(>>的信用额:solution

library(dplyr)
library(tidyr) 
library(purrr)
library(stringr)

# Generate sample data
set.seed(1)
df <- data.frame(weight_study1 = rnorm(100, mean = 60, sd = 5),
                 weight_study2 = rnorm(100, mean = 62, sd = 6),
                 gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE),
                 gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE),
                 age1 = sample(18:65, 100, replace = TRUE), # Added column
                 age2 = sample(18:65, 100, replace = TRUE)) # Added column

# Step 1: get gender counts by study; 
#         get all numeric variables into a single column;
#         create "study" column and define study groups.
df1 <- df %>%
  group_by(gender_study1) %>%
  mutate(count1 = n()) %>%
  group_by(gender_study2) %>%
  mutate(count2 = n()) %>%
  pivot_longer(cols = starts_with(c("weight", "age", "count"))) %>%
  mutate(study = paste0("Pilot_study", str_sub(name, start = -1))) %>%
  ungroup()

# Step 2: use imap() to create a single (tidy) "gender" column;
#         select only necessary columns;
#         group by study and derive "age" and "count" strings for final output;
#         group by study and gender and derive "weight" strings for final output;
#         ungroup, select only necessary columns, and  tidy  the "Category" names;
#         get distinct/unique rows, and pivot data to wide format.
df2 <- df1 %>%
  # If number at end of the gender column s name matches the end number of values
  # in "study" column, get the corresponding gender value
  mutate(gender = unlist(imap(study,
                              ~df1[.y,
                                   str_replace(.x, "Pilot_study", "gender_study")] %>%
                                ifelse(is.null(.), NA, .)))) %>%
  select(-starts_with("gender_")) %>% # Not "tidy" and no longer needed
  group_by(study) %>%
  mutate(temp = ifelse(str_detect(name, "age"),
                       paste0(round(mean(value), 1),
                              "(",
                              round(sd(value), 1),
                              ")"), NA),
         # Workaround needed to get count % per study because study group
         # count gets multiplied by the number of variables. n_distinct()
         # returns count of unqiue values
         temp = ifelse(str_detect(name, "count"),
                       paste0(as.integer(value),
                              "(",
                              round(100 / (n() / n_distinct(name)) * value , 1),
                              ")"), temp))  %>%
  group_by(study, gender) %>%
  mutate(temp = ifelse(str_detect(name, "weight"),
                       paste0(round(mean(value), 1),
                              "(",
                              round(sd(value), 1),
                              ")"), temp)) %>%
  ungroup() %>%
  select(-value) %>%
  mutate(name = str_sub(name, end = -2),
         name = ifelse(str_detect(name, "weight"), "weight mean(sd)", name),
         name = ifelse(str_detect(name, "count"), "count(%)", name),
         gender = ifelse(name == "age", "", gender),
         name = ifelse(name == "age", "age mean(sd)", name)) %>%
  distinct() %>%
  pivot_wider(names_from = study,
              values_from = temp) %>%
  arrange(name) %>%
  rename(Category = "name")

df2
# A tibble: 5 × 4
  Category       gender    Pilot_study1  Pilot_study2
  <chr>          <chr>     <chr>         <chr>       
1 age mean(sd)    ""        51.1(12.2)    50.9(12.1)  
2 count(%)        "Male"    55(55)        52(52)      
3 count(%)        "Female"  45(45)        48(48)      
4 weight mean(sd) "Male"    52.3(11.9)    51.2(12.2)  
5 weight mean(sd) "Female"  49.6(12.4)    50.5(12.1)

友情链接