English 中文(简体)
• 如何最有效地重复R栏中不同栏的同一合并(最好数据表)
原标题:How to most efficiently repeat the same merge on different columns in R (preferably data.table)

我有计划重复使用表A中同一栏的合并,但将表B中的一栏改为使用数据的格式。 表格:合并指挥一再相当缓慢,因此我很想知道,这样做是否要更快。

例如:

出版一份水果“A”表,分两栏:“fruit_name”和“price”

另一表“B”篮子,3栏,“fruit_1”、“fruit_2”和“fruit_3”

我要逐行获得表B中水果的总价。 我可以进行3次合并,所有使用水果桌上的“fruit_name”,第一个使用篮子桌上的“fruit_1”、“fruit_2”和“fruit_3”。

不过,单单价需要很长时间。 是否有办法更有效地计算? 在这里,《法典》树立了榜样,取得了预期的成果,但进展缓慢。

我一般使用数据。 表格和数据总体效率很高,因此我更喜欢数据,但是如果数据合并速度超过3份,就会向他人开放。

此外,我想象,我能够以长期格式获得数据,并做一次合并,ideally。 d 我避免了,由于我的数据具有广泛的意义,内容广泛,需要广泛出口。 但是,如果这肯定是最佳做法,那么我会猜测这些最佳做法是好的。

Thanks all for your time!

library(data.table)


fruits <- data.table(fruit_name = c( orange ,  apple ,  pear ,  kiwi ,  blueberry )
                     , price = c(1, 1.531, 2.1, 2.25, 3.03)
                     )

baskets <- data.table(fruit_1 = c( orange ,  apple ,  apple ,  pear )
                      ,fruit_2 = c( apple ,  pear ,  kiwi ,  kiwi )
                      ,fruit_3 = c( pear ,  kiwi ,  blueberry ,  blueberry ))

result <- copy (baskets)


result <- merge(result, fruits, by.x =  fruit_1 , by.y =  fruit_name )
setnames(result,  price ,  price_1 )

result <- merge(result, fruits, by.x =  fruit_2 , by.y =  fruit_name )
setnames(result,  price ,  price_2 )

result <- merge(result, fruits, by.x =  fruit_3 , by.y =  fruit_name )
setnames(result,  price ,  price_3 )

result[,price_total := price_1 + price_2 + price_3]
问题回答

我不太熟悉<代码>数据.table。 但是,以下可能给你一些想法(我正在使用<条码>dtplyr,以便<条码>dplyr在<条码>上施工。

我认为,用很长的形式和总结来做,应该给你以良好的业绩。 诸如<代码>DuckDB和rpolars的包裹甚至可以打到data.table,用于此类物品的性能(data.table.

请注意,我从各行各 the中看到,这些行文是“篮子”。

很难回到大范围。

library(data.table)
library(dplyr, warn.conflicts = FALSE)
library(dtplyr)
library(tidyr)

fruits <- data.table(fruit_name = c( orange ,  apple ,  pear ,  kiwi , 
                                     blueberry ), 
                     price = c(1, 1.531, 2.1, 2.25, 3.03)
)

baskets <- data.table(fruit_1 = c( orange ,  apple ,  apple ,  pear )
                      ,fruit_2 = c( apple ,  pear ,  kiwi ,  kiwi )
                      ,fruit_3 = c( pear ,  kiwi ,  blueberry ,  blueberry ))

baskets_long <-
    baskets |>
    mutate(basket_id = row_number()) |>
    pivot_longer(cols = -basket_id, 
                 values_to = "fruit_name") |>
    separate_wider_delim(name, "_", 
                         names = c("category", "pos")) 

merged <-
    baskets_long |>
    inner_join(fruits, by = "fruit_name")

vals <-
    merged |> 
    group_by(basket_id) |>
    summarize(price_total = sum(price))

baskets_long |> 
    pivot_wider(names_from = c("category", "pos"), 
                values_from = "fruit_name", 
                names_glue = "{category}_{pos}") |>
    inner_join(vals)
#> Joining with `by = join_by(basket_id)`
#> # A tibble: 4 × 5
#>   basket_id fruit_1 fruit_2 fruit_3   price_total
#>       <int> <chr>   <chr>   <chr>           <dbl>
#> 1         1 orange  apple   pear             4.63
#> 2         2 apple   pear    kiwi             5.88
#> 3         3 apple   kiwi    blueberry        6.81
#> 4         4 pear    kiwi    blueberry        7.38

Created on 2024-01-11 with reprex v2.0.2

直接使用<代码>数据。

baskets[, melt(.SD, measure.vars = patterns( fruit ),value.name =  fruit_name )
    ][fruits, on =  fruit_name 
    ][, price := sum(price), by = rowid(variable)
    ][, dcast(.SD, price~variable, value.var =  fruit_name )]

   price fruit_1 fruit_2   fruit_3
1: 4.631  orange   apple      pear
2: 5.881   apple    pear      kiwi
3: 6.811   apple    kiwi blueberry
4: 7.380    pear    kiwi blueberry




相关问题
What to look for in performance analyzer in VS 2008

What to look for in performance analyzer in VS 2008 I am using VS Team system and got the performance wizard and reports going. What benchmarks/process do I use? There is a lot of stuff in the ...

SQL Table Size And Query Performance

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema. Items - ItemID - ...

How to speed up Visual Studio 2008? Add more resources?

I m using Visual Studio 2008 (with the latest service pack) I also have ReSharper 4.5 installed. ReSharper Code analysis/ scan is turned off. OS: Windows 7 Enterprise Edition It takes me a long time ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

热门标签