English 中文(简体)
在R中为纵向分析建立旁边表
原标题:Creating side by side table for longitudinal analysis in R
  • 时间:2023-11-30 01:27:51
  •  标签:
  • r
  • dplyr

I have a merged dataset that includes two longitudinal studies having recorded measurements from study1 and study2. I can provide an example as below -

data <- data.frame(
  weight_study1 = rnorm(100, mean = 60, sd = 5),  #numerical
  weight_study2 = rnorm(100, mean = 62, sd = 6),  #numerical
  gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE),  #categorical
  gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE)  #categorical
)

I want to create a table where I can have results from study1 one side and study2 to another side. like adjacent to each other. How could it be done?

我不知道我怎么能在这里就这个职位发言,而只是想更多地说明我所期待的表格是:

Characteristics Pilot study 1 Pilot study 2 
Age             34(3)         67(8)
Gender          Male (40%)    Male(30%)
                Female (60%)  Female (70%)
问题回答

在表面上,解决你的问题似乎相对简单。 但是,你的数据打破了。 我鼓励你阅读这方面的信息,特别是如果你需要以任何方式修改这一守则。

生产你所期望产出的一种方法是,首先创建一种“简便”的数据集。 举例来说,这意味着将(抄录)你的数据转换为长期格式,然后重新采用广泛的格式。 你有两组100(n=200)和三个数字变量,价值很长的数据。 换言之,100个浏览点变成600个。 由于在一栏中你想要三个不同的变量,你的预期产出进一步复杂化。 我在抽样数据中增加了“年龄”栏,以便完整。

要做到这一点,将有更多的合法途径,并且使用<条码>(<>>>>)来填满空洞。 但我想到的是<代码>逆差的解决办法。 根据你提供的数据,该数据库运行。 这样做是不明智的,因为如果你想列入另一个变量,例如,那是很ky的(并非不可能)。 但是,有许多关于SO的信息,以便在必要时帮助你实现这一目标。 另一项考虑是,只有对研究组的学习时间相等,才能进行测试。 可能需要对产生“df1”的法典作一些修改,以适应不同期限的群体。 同样,SO是你的朋友。

用于协助<代码>imap(>>的信用额:solution

library(dplyr)
library(tidyr) 
library(purrr)
library(stringr)

# Generate sample data
set.seed(1)
df <- data.frame(weight_study1 = rnorm(100, mean = 60, sd = 5),
                 weight_study2 = rnorm(100, mean = 62, sd = 6),
                 gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE),
                 gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE),
                 age1 = sample(18:65, 100, replace = TRUE), # Added column
                 age2 = sample(18:65, 100, replace = TRUE)) # Added column

# Step 1: get gender counts by study; 
#         get all numeric variables into a single column;
#         create "study" column and define study groups.
df1 <- df %>%
  group_by(gender_study1) %>%
  mutate(count1 = n()) %>%
  group_by(gender_study2) %>%
  mutate(count2 = n()) %>%
  pivot_longer(cols = starts_with(c("weight", "age", "count"))) %>%
  mutate(study = paste0("Pilot_study", str_sub(name, start = -1))) %>%
  ungroup()

# Step 2: use imap() to create a single (tidy) "gender" column;
#         select only necessary columns;
#         group by study and derive "age" and "count" strings for final output;
#         group by study and gender and derive "weight" strings for final output;
#         ungroup, select only necessary columns, and  tidy  the "Category" names;
#         get distinct/unique rows, and pivot data to wide format.
df2 <- df1 %>%
  # If number at end of the gender column s name matches the end number of values
  # in "study" column, get the corresponding gender value
  mutate(gender = unlist(imap(study,
                              ~df1[.y,
                                   str_replace(.x, "Pilot_study", "gender_study")] %>%
                                ifelse(is.null(.), NA, .)))) %>%
  select(-starts_with("gender_")) %>% # Not "tidy" and no longer needed
  group_by(study) %>%
  mutate(temp = ifelse(str_detect(name, "age"),
                       paste0(round(mean(value), 1),
                              "(",
                              round(sd(value), 1),
                              ")"), NA),
         # Workaround needed to get count % per study because study group
         # count gets multiplied by the number of variables. n_distinct()
         # returns count of unqiue values
         temp = ifelse(str_detect(name, "count"),
                       paste0(as.integer(value),
                              "(",
                              round(100 / (n() / n_distinct(name)) * value , 1),
                              ")"), temp))  %>%
  group_by(study, gender) %>%
  mutate(temp = ifelse(str_detect(name, "weight"),
                       paste0(round(mean(value), 1),
                              "(",
                              round(sd(value), 1),
                              ")"), temp)) %>%
  ungroup() %>%
  select(-value) %>%
  mutate(name = str_sub(name, end = -2),
         name = ifelse(str_detect(name, "weight"), "weight mean(sd)", name),
         name = ifelse(str_detect(name, "count"), "count(%)", name),
         gender = ifelse(name == "age", "", gender),
         name = ifelse(name == "age", "age mean(sd)", name)) %>%
  distinct() %>%
  pivot_wider(names_from = study,
              values_from = temp) %>%
  arrange(name) %>%
  rename(Category = "name")

df2
# A tibble: 5 × 4
  Category       gender    Pilot_study1  Pilot_study2
  <chr>          <chr>     <chr>         <chr>       
1 age mean(sd)    ""        51.1(12.2)    50.9(12.1)  
2 count(%)        "Male"    55(55)        52(52)      
3 count(%)        "Female"  45(45)        48(48)      
4 weight mean(sd) "Male"    52.3(11.9)    51.2(12.2)  
5 weight mean(sd) "Female"  49.6(12.4)    50.5(12.1)




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签