Question

I m basically working with a massive spreadsheet of over 5 million rows, and made 2 column mutations of 2 character columns to create factors and levels of factors in the dataset - a column with a factor of 3 levels, and a column with a factor of 2 levels. After filtering sets of data from this source, I saved them in separate .csv files to continue working on them later. Now, when reading any of the .csv files back into RStudio, it treats all of those adjusted columns in all the tables as characters again. Do I have to re-do the factor work every time I open up RStudio?

我在使用“现成”功能之前装上了所有以前的图书馆,但图书馆(电离层)除外,因为图书馆在试图管理数据时制造了一系列冲突。

Libraries currently loaded:

library(data.table)
library(readr)
library(tidyverse)
library(lubridate)
library(dplyr)

它仍然保留着 d、bl和一栏,它节省下来并正确地阅读了用斜体加起来的其他栏目,因此,我的折合因素栏为何回去? 我曾尝试过一些不同的方法,用读物读读读到《国际法》中,但我不知道在这些文件中读出的具体方式,以便回到我离开的地方,而不会沦为多余的工作。

I ve tried using different read.csv, read_csv, and data.table::fread import functions, but I feel like I m shooting in the dark here and thought that just importing a .csv file would get me right back to where I was when I left it. I use glimpse(df) to check if it s being read correctly but it s never as I left it or it gets warped with other import functions. If there s some special function to use in conjunction with "stringsAsFactors = FALSE, UTF - 8" or if there s a special way to initially write the .csv file that I didn t do maybe that s my answer. I m just trying NOT to have to run all my factor and levels of factors in my now separate data sets every time I open them.

Answer 1

Both Phil s and Onyambu make valid points, but I thought the question was how to properly read in CSV files that would be stacked and have some or all of the character valued columns converted to "stringsAsFactors" as you already appear to understand. The behavior of the read.* functions was formerly to bring in factors by default, but recent versions of R have changed the default controlling parameter that governed that behavior to FALSE and character valued columns are now read just as factors. If you are considering stacking the results of reading multiple csv files and converting to factors, then by all means do the stacking first and only after that is successful should you convert the columns to factors. Otherwise you will experience the grief of trying to concatenate factor columns that have different labels and numbering systems.

我承认我不知道数据。表格fread 在Rread.*功能发生变化的同时或之后发生违约情况。不应因试验而难以确定。

友情链接