English 中文(简体)
计算法
原标题:Calculate consecutive days in Python-pandas
  • 时间:2024-03-27 02:00:37
  •  标签:
  • pandas

有一个访问日期表,有不同的客户身份。 下表如下。 一些客户在一行走了几天,停留了几天,然后又回到了几天。 我想计算每个客户的连续天数。 就我而言,每个客户可能连续几天。 例如,客户1连续3天,连续4天。 我如何能够获得所有这些连续日的数字,然后获得最大的数字?

id  visit_date
1   1/2/2022
1   1/3/2022
1   1/4/2022
1   1/7/2022
1   1/8/2022
1   1/9/2022
1   1/10/2022
2   1/1/2022
2   1/2/2022
2   1/4/2022
2   1/6/2022
2   1/7/2022
2   1/8/2022
2   1/9/2022
2   1/10/2022
3   1/3/2022
3   1/4/2022
3   1/5/2022
4   1/3/2022
4   1/4/2022
4   1/8/2022

我尝试了不同的方法,我没有找到任何解决办法。 我期望:

id  consecutive_days
1   3
1   4
2   2
2   5
3   2
4   0

我赞赏你的帮助!

感谢。

问题回答

假设每个客户的收受日期都在定购单中,你确实可以发现各行之间有差异,然后将各行各行各行各占一席。

consecutive_days_per_id = df.groupby( id ).apply(lambda group: (pd.to_datetime(group[ date ]).dt.day.diff() == 1).sum())

注:如果你的栏目已经为时日,你可以忽略“定时”部分。

<><><>>>>

grp = pd.to_datetime(df[ visit_date ]).diff().ne( 1day ).cumsum()

out = (df.groupby([ id , grp]).size()[lambda x: x > 1]
       .droplevel(1).reset_index(name= consecutive_days )
)

注:

    id  consecutive_days
0   1   3
1   1   4
2   2   2
3   2   5
4   3   3
5   4   2
(df.assign(c = pd.to_datetime(df[ visit_date ]).diff().ne( 1 day ).cumsum())
   .groupby([ id ,  c ], as_index = False)
   .count().loc[lambda x: x.visit_date>1]
   .drop(columns =  c )
   .rename(columns = { visit_date : consecutive_days }))

   id  consecutive_days
0   1                 3
1   1                 4
2   2                 2
4   2                 5
5   3                 3
6   4                 2




相关问题
re-arrange data by pairs recursively

I have dataframe contains ACQ/REL pair recusively as below: import pandas as pd data = [ [ 2023-06-05 16:51:27.561 , ACQ , location ], [ 2023-06-05 16:51:27.564 , ACQ , location ], [ ...

Filling NAN values in Pandas by using previous values

I have a Pandas DataFrame in the following format. I am trying to fill the NaN value by using the most recent non-NaN value and adding one second to the time value. For example, in this case, the ...

Python/Pandas convert string to time only

I have the following Pandas dataframe in Python 2.7. import pandas as pd trial_num = [1,2,3,4,5] sail_rem_time = [ 11:33:11 , 16:29:05 , 09:37:56 , 21:43:31 , 17:42:06 ] dfc = pd.DataFrame(zip(*[...

Pretty-print an entire Pandas Series / DataFrame

I work with Series and DataFrames on the terminal a lot. The default __repr__ for a Series returns a reduced sample, with some head and tail values, but the rest missing. Is there a builtin way to ...

How to invert the x or y axis

I have a scatter plot graph with a bunch of random x, y coordinates. Currently the Y-Axis starts at 0 and goes up to the max value. I would like the Y-Axis to start at the max value and go up to 0. ...

热门标签