在表面上,解决你的问题似乎相对简单。 但是,你的数据打破了。 我鼓励你阅读这方面的信息,特别是如果你需要以任何方式修改这一守则。
生产你所期望产出的一种方法是,首先创建一种“简便”的数据集。 举例来说,这意味着将(抄录)你的数据转换为长期格式,然后重新采用广泛的格式。 你有两组100(n=200)和三个数字变量,价值很长的数据。 换言之,100个浏览点变成600个。 由于在一栏中你想要三个不同的变量,你的预期产出进一步复杂化。 我在抽样数据中增加了“年龄”栏,以便完整。
要做到这一点,将有更多的合法途径,并且使用<条码>(<>>>>)来填满空洞。 但我想到的是<代码>逆差代码>的解决办法。 根据你提供的数据,该数据库运行。 这样做是不明智的,因为如果你想列入另一个变量,例如,那是很ky的(并非不可能)。 但是,有许多关于SO的信息,以便在必要时帮助你实现这一目标。 另一项考虑是,只有对研究组的学习时间相等,才能进行测试。 可能需要对产生“df1”的法典作一些修改,以适应不同期限的群体。 同样,SO是你的朋友。
用于协助<代码>imap(>>的信用额:solution
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
# Generate sample data
set.seed(1)
df <- data.frame(weight_study1 = rnorm(100, mean = 60, sd = 5),
weight_study2 = rnorm(100, mean = 62, sd = 6),
gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE),
gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE),
age1 = sample(18:65, 100, replace = TRUE), # Added column
age2 = sample(18:65, 100, replace = TRUE)) # Added column
# Step 1: get gender counts by study;
# get all numeric variables into a single column;
# create "study" column and define study groups.
df1 <- df %>%
group_by(gender_study1) %>%
mutate(count1 = n()) %>%
group_by(gender_study2) %>%
mutate(count2 = n()) %>%
pivot_longer(cols = starts_with(c("weight", "age", "count"))) %>%
mutate(study = paste0("Pilot_study", str_sub(name, start = -1))) %>%
ungroup()
# Step 2: use imap() to create a single (tidy) "gender" column;
# select only necessary columns;
# group by study and derive "age" and "count" strings for final output;
# group by study and gender and derive "weight" strings for final output;
# ungroup, select only necessary columns, and tidy the "Category" names;
# get distinct/unique rows, and pivot data to wide format.
df2 <- df1 %>%
# If number at end of the gender column s name matches the end number of values
# in "study" column, get the corresponding gender value
mutate(gender = unlist(imap(study,
~df1[.y,
str_replace(.x, "Pilot_study", "gender_study")] %>%
ifelse(is.null(.), NA, .)))) %>%
select(-starts_with("gender_")) %>% # Not "tidy" and no longer needed
group_by(study) %>%
mutate(temp = ifelse(str_detect(name, "age"),
paste0(round(mean(value), 1),
"(",
round(sd(value), 1),
")"), NA),
# Workaround needed to get count % per study because study group
# count gets multiplied by the number of variables. n_distinct()
# returns count of unqiue values
temp = ifelse(str_detect(name, "count"),
paste0(as.integer(value),
"(",
round(100 / (n() / n_distinct(name)) * value , 1),
")"), temp)) %>%
group_by(study, gender) %>%
mutate(temp = ifelse(str_detect(name, "weight"),
paste0(round(mean(value), 1),
"(",
round(sd(value), 1),
")"), temp)) %>%
ungroup() %>%
select(-value) %>%
mutate(name = str_sub(name, end = -2),
name = ifelse(str_detect(name, "weight"), "weight mean(sd)", name),
name = ifelse(str_detect(name, "count"), "count(%)", name),
gender = ifelse(name == "age", "", gender),
name = ifelse(name == "age", "age mean(sd)", name)) %>%
distinct() %>%
pivot_wider(names_from = study,
values_from = temp) %>%
arrange(name) %>%
rename(Category = "name")
df2
# A tibble: 5 × 4
Category gender Pilot_study1 Pilot_study2
<chr> <chr> <chr> <chr>
1 age mean(sd) "" 51.1(12.2) 50.9(12.1)
2 count(%) "Male" 55(55) 52(52)
3 count(%) "Female" 45(45) 48(48)
4 weight mean(sd) "Male" 52.3(11.9) 51.2(12.2)
5 weight mean(sd) "Female" 49.6(12.4) 50.5(12.1)