Question

我有一个数据框架:

sensor_id	timestamp	current	apparent_power
0	2024-02-01 00:00:00.944369920	1.550	101.5
1	2024-02-01 00:00:00.959425024	0.284	20.4
0	2024-02-01 00:00:01.945563136	1.549	101.6
1	2024-02-01 00:00:01.950393856	0.286	20.2
0	2024-02-01 00:00:02.944880896	1.547	100.6
1	2024-02-01 00:00:02.949997056	0.290	21.6
0	2024-02-01 00:00:03.944605184	1.547	100.9
1	2024-02-01 00:00:03.949342976	0.290	21.8
0	2024-02-01 00:00:04.944967936	1.548	100.9
1	2024-02-01 00:00:04.950715136	0.285	20.5
0	2024-02-01 00:00:05.944571904	1.547	100.7
1	2024-02-01 00:00:05.952841984	0.284	20.2
0	2024-02-01 00:00:06.945287936	1.548	100.8
1	2024-02-01 00:00:06.950289920	0.287	20.8
0	2024-02-01 00:00:07.944841984	1.547	100.7
1	2024-02-01 00:00:07.962969088	0.290	21.8
0	2024-02-01 00:00:08.945434112	1.550	103.1
1	2024-02-01 00:00:08.950361088	0.281	20.1
0	2024-02-01 00:00:09.944396032	1.551	103.8
1	2024-02-01 00:00:09.949807872	0.285	21.1

我最后的目的是将传感器1推向感官2线,把每个传感器的排位推到两行:

timestamp_0	timestamp_1	total_current	total_apparent_power
2024-02-01 00:00:00.944369920	2024-02-01 00:00:00.959425024	1.834	121.9
2024-02-01 00:00:01.945563136	2024-02-01 00:00:01.950393856	1.835	121.6
2024-02-01 00:00:02.944880896	2024-02-01 00:00:02.949997056	xxx	xxx
2024-02-01 00:00:03.944605184	2024-02-01 00:00:03.949342976	xxx	xxx
2024-02-01 00:00:04.944967936	2024-02-01 00:00:04.950715136	xxx	xxx
2024-02-01 00:00:05.944571904	2024-02-01 00:00:05.952841984	xxx	xxx
2024-02-01 00:00:06.945287936	2024-02-01 00:00:06.950289920	xxx	xxx
2024-02-01 00:00:07.944841984	2024-02-01 00:00:07.962969088	xxx	xxx
2024-02-01 00:00:08.945434112	2024-02-01 00:00:08.950361088	xxx	xxx
2024-02-01 00:00:09.944396032	2024-02-01 00:00:09.949807872	xxx	xxx

我花了时间,按次和小时将其分类:

sensor_id	timestamp	current	apparent_power	seconds
0	2024-02-01 00:00:00.944369920	1.550	101.5	0
1	2024-02-01 00:00:00.959425024	0.284	20.4	0
0	2024-02-01 00:00:01.945563136	1.549	101.6	1
1	2024-02-01 00:00:01.950393856	0.286	20.2	1
0	2024-02-01 00:00:02.944880896	1.547	100.6	2
1	2024-02-01 00:00:02.949997056	0.290	21.6	2
0	2024-02-01 00:00:03.944605184	1.547	100.9	3
1	2024-02-01 00:00:03.949342976	0.290	21.8	3
0	2024-02-01 00:00:04.944967936	1.548	100.9	4
1	2024-02-01 00:00:04.950715136	0.285	20.5	4
0	2024-02-01 00:00:05.944571904	1.547	100.7	5
1	2024-02-01 00:00:05.952841984	0.284	20.2	5
0	2024-02-01 00:00:06.945287936	1.548	100.8	6
1	2024-02-01 00:00:06.950289920	0.287	20.8	6
0	2024-02-01 00:00:07.944841984	1.547	100.7	7
1	2024-02-01 00:00:07.962969088	0.290	21.8	7
0	2024-02-01 00:00:08.945434112	1.550	103.1	8
1	2024-02-01 00:00:08.950361088	0.281	20.1	8
0	2024-02-01 00:00:09.944396032	1.551	103.8	9
1	2024-02-01 00:00:09.949807872	0.285	21.1	9

我有一些事情可以开始做,但如果数据缺乏传感器读物,则落在后面:

second_tracker = 0
a_row = 0
b_row = 0

for i, row in test_file_df.iterrows():
   
   if row[ seconds ] != second_tracker:
       second_tracker += 1
       # store totals
       a_cur = a_row[ current ]
       b_cur = b_row[ current ]
       total_current = a_cur + b_cur
       
       a_app_power = a_row[ apparent_power ]
       b_app_power = b_row[ apparent_power ]
       total_app_power = a_app_power + b_app_power
       
       new_row = { timestamp_0 : a_row[ timestamp ], timestamp_1 : b_row[ timestamp ],  total_current : total_current,  total_app_power : total_app_power}
       print(new_row)
       power_sum_df.loc[len(power_sum_df)] = new_row
       a_row = 0
       b_row = 0
   
   if row[ sensor_id ] == 0:
       a_row = row
   else:
       b_row = row

当由于传感器缺失而没有配对传感器电流时,该电网就停止了,即,下行是传感器0,下游也是传感器0,因为以前的传感器1丢失。

我试图分成两个编组,并试图将第1栏+第1行加起来,但可以理解,如何将这两行加进新的数据框架,只加一栏,同时抄送下一个浏览时间。

Can anyone help - even a better way of pulling out the alternate rows into one row. I tried to use unstack, group and agg to no avail.

更新***** 的数据显示,在单一时间段内记录不正规:

line no	sensor_id	timestamp	current	apparent_power
4	0	2024-01-25 00:00:02.977937920	1.502	370.9
5	1	2024-01-25 00:00:02.983158016	0.293	72.4
6	0	2024-01-25 00:00:03.977388032	1.503	371.1
7	1	2024-01-25 00:00:03.995464960	0.284	70.3
8	0	2024-01-25 00:00:04.978688000	1.500	370.6
9	1	2024-01-25 00:00:05.025767168	0.287	70.9
10	0	2024-01-25 00:00:05.980681984	1.495	369.0
11	1	2024-01-25 00:00:06.031785984	0.286	70.7
12	0	2024-01-25 00:00:06.977396992	1.497	369.7
13	1	2024-01-25 00:00:06.984870912	0.280	69.3
14	0	2024-01-25 00:00:07.979064832	1.495	369.1
15	1	2024-01-25 00:00:07.983921920	0.282	69.6

Desired Result: There is logic applied where there are more than 2 records in a single second - always taking one of sensor0 and one of sensor 1. If there is only a single sensor reading within the second, then use the previous second corresponding sensor value for the sum.

lines joined	sensor_1	timestamp_0	timestamp_1	total_current	total_apparent_power	Comment
4 & 5	1	2024-01-25 00:00:02.977937920	2024-01-25 00:00:02.983158016	1.795	443.2	Normal
6 & 7	1	2024-01-25 00:00:03.977388032	2024-01-25 00:00:03.995464960	1.787	441.4	Normal
8 & 7	1	2024-01-25 00:00:04.978688000	use line 7 sensor 1 as no sensor 1 in this second	1.784	440.9	use previous sensor 1
10 & 9	1	2024-01-25 00:00:05.980681984	2024-01-25 00:00:05.025767168	1.782	439.9	swap so ts_0 is sensor_0
12 & 13	1	2024-01-25 00:00:06.977396992	2024-01-25 00:00:06.984870912	1.777	439.0	use last sens0, and last sens1 of second
14 & 15	1	2024-01-25 00:00:07.979064832	2024-01-25 00:00:07.983921920	1.777	438.7	Normal

我也修改了这个小组,时间为小时、分钟和整段;其次,将其归入我之后的一组。

power_agg_df = (test_file_df.groupby([ts.dt.hour, ts.dt.minute, ts.dt.second], sort=False)
               .agg(** {
                   "sensor_0" : ("sensor_id", "first"),
                   "sensor_1" : ("sensor_id", "last"),
                   "timestamp_0" : ("timestamp", "first"),
                   "timestamp_1" : ("timestamp", "last"),
                   "total_current" : ("current", "sum"),
                   "total_apparent_power" : ("apparent_power", "sum"),
                   
               }))

Answer 1

www.un.org/Depts/DGACM/index_spanish.htm 拟议解决办法

grouped = (df.groupby(np.arange(len(df)) // 2)
             .agg({ timestamp :  first ,  current :  sum ,  apparent_power :  sum }))

grouped[ timestamp2 ] = grouped.groupby(level=0)[ timestamp ].transform( last )

grouped = (grouped[[ timestamp ,  timestamp2 ,  current ,  apparent_power ]]
             .rename(columns={ current :  total_current ,  apparent_power :  total_apparent_power })
             )

http://www.ohchr.org。

>>> grouped.columns
Index([ timestamp ,  timestamp2 ,  total_current ,  total_apparent_power ], dtype= object )

>>> grouped
                      timestamp  ... total_apparent_power
0 2024-02-01 00:00:00.944369920  ...                121.9
1 2024-02-01 00:00:01.945563136  ...                121.8
2 2024-02-01 00:00:02.944880896  ...                122.2
3 2024-02-01 00:00:03.944605184  ...                122.7
4 2024-02-01 00:00:04.944967936  ...                121.4
5 2024-02-01 00:00:05.944571904  ...                120.9
6 2024-02-01 00:00:06.945287936  ...                121.6
7 2024-02-01 00:00:07.944841984  ...                122.5
8 2024-02-01 00:00:08.945434112  ...                123.2
9 2024-02-01 00:00:09.944396032  ...                124.9

引自“无时能力”主张,另一种利用<条码>/在一行中减少该守则的方法:

grouped = (df.groupby(np.arange(len(df)) // 2)
             .agg(**{
            "timestamp_0": ("timestamp", "first"),
            "timestamp_1": ("timestamp", "last"),
            "total_current": ("current", "sum"),
            "total_apparent_power": ("apparent_power", "sum")
        })
             )

Answer 2

http://pandas.pydata.org/docs/ reference/api/pandas.Series.dt.hour.html 您需要的行动:

ts = pd.to_datetime(df["timestamp"])
grp = {"hour": ts.dt.hour, "second": ts.dt.second} # +more ?
# grp = {"time": ts.dt.floor("s").dt.time} # 3-components of time

#1 reduce the groups with more than two identical censors
uniq = (df.assign(**grp).sort_values([*grp, "sensor_id"]).loc[
        lambda x: ~x.duplicated(subset=[*grp, "sensor_id"], keep="last")])

#2 look for the previous cursors to complete the groups having a single one
miss = df.groupby(list(grp.values()))["sensor_id"].transform("nunique").eq(1)
exte = (uniq.mask(miss | uniq["sensor_id"].eq(uniq["sensor_id"].shift(-1)))
           .ffill().loc[miss].convert_dtypes())#.assign(timestamp=None))

#3 swap the censors in the groups with a leading censor-1
cln = (pd.concat([uniq, exte if not exte.empty else None])
         .sort_values(["sensor_id", *grp]))
out = (
        cln.astype({"line no": "str"})
        .groupby(list(grp.values()))
            .agg(**{
                # feel free to comment (#) any agg
            "lines_joined": ("line no", " & ".join),
            "sensor_0": ("sensor_id", "first"),
            "sensor_1": ("sensor_id", "last"),
            "timestamp_0": ("timestamp", "first"),
            "timestamp_1": ("timestamp", "last"),
            "total_current": ("current", "sum"),
            "total_apparent_power": ("apparent_power", "sum")
        })
     .reset_index(names=[*grp]) # to show up the groups
    #.reset_index(drop=True) # or just drop them if optional
)

友情链接