Question

I am trying to import my data with fixed column widths. After the first special character appears, the number of characters per column changes. What could be the reason for this? File: https://gist.github.com/1902r/0b596431e15bde833e9d9a0640e12ba7

library(readr)
columns_widths <- c(3, 8, 7)
source_data <- "data.rpt"
source_data_raw <- read_fwf(source_data, fwf_widths(columns_widths))

我尝试用读到数据中。但是,没有任何理由支持固定列宽。我在座的脚石是这样做的。

Answer 1

以下代码应当使用<代码>utils:read.fwf(<>/code>从R基数中解决您的问题。如果(未引述)特性说明本身含有 b(例如“共同人”)的话,你需要做这项工作。否则,data.table:fread(),而无任何理由,则应予罚款。

UTF-8-BOM encodings should not be used (if possible). See https://cran.r-project.org/doc/manuals/r-patched/R-data.pdf and https://en.wikipedia.org/wiki/Byte_order_mark

。因此,含有多种特性的行文是流离失所的。

library(readr)
columns_widths <- c(3, 8, 7)
source_data <- "https://gist.githubusercontent.com/1902r/0b596431e15bde833e9d9a0640e12ba7/raw/215e50d1db79b4a7aeb3560680a55bba8c9f1503/data.rpt"
source_data_raw <- read_fwf(source_data, fwf_widths(columns_widths))
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> 
#> chr (3): X1, X2, X3
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
source_data_raw
#> # A tibble: 4 × 3
#>   X1    X2     X3    
#>   <chr> <chr>  <chr> 
#> 1 ID    Name   Amount
#> 2 1     Jo hn  100   
#> 3 2     Bílušk a 450 
#> 4 3     Jane   200

## first read colnames from file
datar_names <- read.fwf(source_data,
  widths = columns_widths,
  n = 1, fileEncoding = "UTF-8-BOM",
  strip.white = TRUE
)

## read data using names from above
datar <- read.fwf(source_data,
  widths = columns_widths,
  skip = 1, col.names = datar_names,
  fileEncoding = "UTF-8-BOM",
  strip.white = TRUE
)
datar
#>   ID    Name Amount
#> 1  1   Jo hn    100
#> 2  2 Bíluška    450
#> 3  3    Jane    200
str(datar)
#>  data.frame :    3 obs. of  3 variables:
#>  $ ID    : int  1 2 3
#>  $ Name  : chr  "Jo hn" "Bíluška" "Jane"
#>  $ Amount: int  100 450 200

^{Created on 2024-04-26 with reprex v2.1.0}

友情链接