English 中文(简体)
Julia: Return Minimum Date in DataFrame
原标题:

The question is fairly simple. How do I return the minimum purchase date for each customer using Tidier?

using Tidier, DataFrames, Plots, CSV


#params
f = "path"

df = CSV.File(f) |> DataFrame
df = @chain df begin
    @select(SHOPIFY_ORDER_ID, CUSTOMER_ID, SHIPMONTH, GROSS_REVENUE, Country)
    @rename(order_id = SHOPIFY_ORDER_ID,
            customer_id = CUSTOMER_ID,
            date = SHIPMONTH,
            revenue = GROSS_REVENUE,
            country = Country)
    @filter(country != "CA")
    @filter(!ismissing(date))
    @filter(revenue != 0.0)
end 


# logic to calculate summary stats
df_sum = @chain df begin
    @group_by(customer_id)
    @mutate(
        cohort = min(date)
    )
end

min(df[!, :date])

for df_sum I receive the following error:

ERROR: ArgumentError: argument is not a permutation Stacktrace: [1] invperm(a::Vector{Int64}) @ Base .combinatorics.jl:282 [2] groupby(df::DataFrame, cols::Cols{Tuple{Symbol}}; sort::Bool, skipmissing::Bool) @ DataFrames C:path.juliapackagesDataFramesLteElsrcgroupeddataframegroupeddataframe.jl:264 [3] top-level scope @ path.jl:453

When attemtping to identify the min date in the data.frame I receive the error:

ERROR: MethodError: no method matching min(::Vector{Union{Missing, Dates.Date}})

Closest candidates are: min(::Any, ::Missing) @ Base missing.jl:134 min(::Any, ::Any) @ Base operators.jl:481
min(::Any, ::Any, ::Any, ::Any...) @ Base operators.jl:578 ...

Stacktrace: [1] top-level scope @ c:pathscript.jl:28

Which indicates to me that min doesn t work where there is a Missing data type, but I m not sure how to solve from there.

问题回答

You probably need to use minimum instead of min. I do not see your data. If you have missing values then minimum should still just work, but if you wanted maximum you would need to skipmissing first.

Elaborating on Bogumił Kamiński’s answer, you could try the following code:

df_sum = @chain df begin
    @group_by(customer_id)
    @mutate(
        minimum_date = minimum(skipmissing(date))
    )
end

The other thing to consider is whether you want to add a column to your existing dataset, or whether you simply want to return the minimum date only for each customer.

Here are two alternative approaches:

The first one will return only the customer_id and the minimum date for each customer.

df_sum = @chain df begin
    @group_by(customer_id)
    @summarize(
        minimum_date = minimum(skipmissing(date))
    )
end

In case you want to return the whole row, here’s the second approach:

df_sum = @chain df begin
    @group_by(customer_id)
    @filter(
        date == minimum(skipmissing(date))
    )
    @ungroup
end

Without having access to the original dataset, it’s hard to confirm if these will work for you. If these don’t work, please let us know!

Thanks for using Tidier.jl, and congrats on asking the first-ever Tidier.jl question on StackOverflow!





相关问题
刚开始和结束的les

我想利用开端和终点拖车来构筑CartesianIndices。 例如,我想在开端=(2,3)和端=(4,5)的情况下建造Cartesian Indices((2,4,3:5)。

Julia symbolic differentiation

Though I ve looked through several pages documenting various differentiation tools in Julia, I have yet to find the following simple functionality. I want to define a function which is differentiable ...

Julia: Return Minimum Date in DataFrame

The question is fairly simple. How do I return the minimum purchase date for each customer using Tidier? using Tidier, DataFrames, Plots, CSV #params f = "path" df = CSV.File(f) |> ...

Reading hex string file in Julia

In python, I do import numpy as np data = np.memmap( mydata.bin , dtype= <i4 , mode= r ) How can I implement this in Julia? I was reading about readbytes and htol but couldn t figure out.

热门标签