我有一个数据框架:
sensor_id | timestamp | current | apparent_power |
---|---|---|---|
0 | 2024-02-01 00:00:00.944369920 | 1.550 | 101.5 |
1 | 2024-02-01 00:00:00.959425024 | 0.284 | 20.4 |
0 | 2024-02-01 00:00:01.945563136 | 1.549 | 101.6 |
1 | 2024-02-01 00:00:01.950393856 | 0.286 | 20.2 |
0 | 2024-02-01 00:00:02.944880896 | 1.547 | 100.6 |
1 | 2024-02-01 00:00:02.949997056 | 0.290 | 21.6 |
0 | 2024-02-01 00:00:03.944605184 | 1.547 | 100.9 |
1 | 2024-02-01 00:00:03.949342976 | 0.290 | 21.8 |
0 | 2024-02-01 00:00:04.944967936 | 1.548 | 100.9 |
1 | 2024-02-01 00:00:04.950715136 | 0.285 | 20.5 |
0 | 2024-02-01 00:00:05.944571904 | 1.547 | 100.7 |
1 | 2024-02-01 00:00:05.952841984 | 0.284 | 20.2 |
0 | 2024-02-01 00:00:06.945287936 | 1.548 | 100.8 |
1 | 2024-02-01 00:00:06.950289920 | 0.287 | 20.8 |
0 | 2024-02-01 00:00:07.944841984 | 1.547 | 100.7 |
1 | 2024-02-01 00:00:07.962969088 | 0.290 | 21.8 |
0 | 2024-02-01 00:00:08.945434112 | 1.550 | 103.1 |
1 | 2024-02-01 00:00:08.950361088 | 0.281 | 20.1 |
0 | 2024-02-01 00:00:09.944396032 | 1.551 | 103.8 |
1 | 2024-02-01 00:00:09.949807872 | 0.285 | 21.1 |
我最后的目的是将传感器1推向感官2线,把每个传感器的排位推到两行:
timestamp_0 | timestamp_1 | total_current | total_apparent_power |
---|---|---|---|
2024-02-01 00:00:00.944369920 | 2024-02-01 00:00:00.959425024 | 1.834 | 121.9 |
2024-02-01 00:00:01.945563136 | 2024-02-01 00:00:01.950393856 | 1.835 | 121.6 |
2024-02-01 00:00:02.944880896 | 2024-02-01 00:00:02.949997056 | xxx | xxx |
2024-02-01 00:00:03.944605184 | 2024-02-01 00:00:03.949342976 | xxx | xxx |
2024-02-01 00:00:04.944967936 | 2024-02-01 00:00:04.950715136 | xxx | xxx |
2024-02-01 00:00:05.944571904 | 2024-02-01 00:00:05.952841984 | xxx | xxx |
2024-02-01 00:00:06.945287936 | 2024-02-01 00:00:06.950289920 | xxx | xxx |
2024-02-01 00:00:07.944841984 | 2024-02-01 00:00:07.962969088 | xxx | xxx |
2024-02-01 00:00:08.945434112 | 2024-02-01 00:00:08.950361088 | xxx | xxx |
2024-02-01 00:00:09.944396032 | 2024-02-01 00:00:09.949807872 | xxx | xxx |
我花了时间,按次和小时将其分类:
sensor_id | timestamp | current | apparent_power | seconds | hour |
---|---|---|---|---|---|
0 | 2024-02-01 00:00:00.944369920 | 1.550 | 101.5 | 0 | 0 |
1 | 2024-02-01 00:00:00.959425024 | 0.284 | 20.4 | 0 | 0 |
0 | 2024-02-01 00:00:01.945563136 | 1.549 | 101.6 | 1 | 0 |
1 | 2024-02-01 00:00:01.950393856 | 0.286 | 20.2 | 1 | 0 |
0 | 2024-02-01 00:00:02.944880896 | 1.547 | 100.6 | 2 | 0 |
1 | 2024-02-01 00:00:02.949997056 | 0.290 | 21.6 | 2 | 0 |
0 | 2024-02-01 00:00:03.944605184 | 1.547 | 100.9 | 3 | 0 |
1 | 2024-02-01 00:00:03.949342976 | 0.290 | 21.8 | 3 | 0 |
0 | 2024-02-01 00:00:04.944967936 | 1.548 | 100.9 | 4 | 0 |
1 | 2024-02-01 00:00:04.950715136 | 0.285 | 20.5 | 4 | 0 |
0 | 2024-02-01 00:00:05.944571904 | 1.547 | 100.7 | 5 | 0 |
1 | 2024-02-01 00:00:05.952841984 | 0.284 | 20.2 | 5 | 0 |
0 | 2024-02-01 00:00:06.945287936 | 1.548 | 100.8 | 6 | 0 |
1 | 2024-02-01 00:00:06.950289920 | 0.287 | 20.8 | 6 | 0 |
0 | 2024-02-01 00:00:07.944841984 | 1.547 | 100.7 | 7 | 0 |
1 | 2024-02-01 00:00:07.962969088 | 0.290 | 21.8 | 7 | 0 |
0 | 2024-02-01 00:00:08.945434112 | 1.550 | 103.1 | 8 | 0 |
1 | 2024-02-01 00:00:08.950361088 | 0.281 | 20.1 | 8 | 0 |
0 | 2024-02-01 00:00:09.944396032 | 1.551 | 103.8 | 9 | 0 |
1 | 2024-02-01 00:00:09.949807872 | 0.285 | 21.1 | 9 | 0 |
我有一些事情可以开始做,但如果数据缺乏传感器读物,则落在后面:
second_tracker = 0
a_row = 0
b_row = 0
for i, row in test_file_df.iterrows():
if row[ seconds ] != second_tracker:
second_tracker += 1
# store totals
a_cur = a_row[ current ]
b_cur = b_row[ current ]
total_current = a_cur + b_cur
a_app_power = a_row[ apparent_power ]
b_app_power = b_row[ apparent_power ]
total_app_power = a_app_power + b_app_power
new_row = { timestamp_0 : a_row[ timestamp ], timestamp_1 : b_row[ timestamp ], total_current : total_current, total_app_power : total_app_power}
print(new_row)
power_sum_df.loc[len(power_sum_df)] = new_row
a_row = 0
b_row = 0
if row[ sensor_id ] == 0:
a_row = row
else:
b_row = row
当由于传感器缺失而没有配对传感器电流时,该电网就停止了,即,下行是传感器0,下游也是传感器0,因为以前的传感器1丢失。
我试图分成两个编组,并试图将第1栏+第1行加起来,但可以理解,如何将这两行加进新的数据框架,只加一栏,同时抄送下一个浏览时间。
Can anyone help - even a better way of pulling out the alternate rows into one row. I tried to use unstack, group and agg to no avail.
更新***** 的数据显示,在单一时间段内记录不正规:
line no | sensor_id | timestamp | current | apparent_power |
---|---|---|---|---|
4 | 0 | 2024-01-25 00:00:02.977937920 | 1.502 | 370.9 |
5 | 1 | 2024-01-25 00:00:02.983158016 | 0.293 | 72.4 |
6 | 0 | 2024-01-25 00:00:03.977388032 | 1.503 | 371.1 |
7 | 1 | 2024-01-25 00:00:03.995464960 | 0.284 | 70.3 |
8 | 0 | 2024-01-25 00:00:04.978688000 | 1.500 | 370.6 |
9 | 1 | 2024-01-25 00:00:05.025767168 | 0.287 | 70.9 |
10 | 0 | 2024-01-25 00:00:05.980681984 | 1.495 | 369.0 |
11 | 1 | 2024-01-25 00:00:06.031785984 | 0.286 | 70.7 |
12 | 0 | 2024-01-25 00:00:06.977396992 | 1.497 | 369.7 |
13 | 1 | 2024-01-25 00:00:06.984870912 | 0.280 | 69.3 |
14 | 0 | 2024-01-25 00:00:07.979064832 | 1.495 | 369.1 |
15 | 1 | 2024-01-25 00:00:07.983921920 | 0.282 | 69.6 |
Desired Result: There is logic applied where there are more than 2 records in a single second - always taking one of sensor0 and one of sensor 1. If there is only a single sensor reading within the second, then use the previous second corresponding sensor value for the sum.
lines joined | sensor_0 | sensor_1 | timestamp_0 | timestamp_1 | total_current | total_apparent_power | Comment |
---|---|---|---|---|---|---|---|
4 & 5 | 0 | 1 | 2024-01-25 00:00:02.977937920 | 2024-01-25 00:00:02.983158016 | 1.795 | 443.2 | Normal |
6 & 7 | 0 | 1 | 2024-01-25 00:00:03.977388032 | 2024-01-25 00:00:03.995464960 | 1.787 | 441.4 | Normal |
8 & 7 | 0 | 1 | 2024-01-25 00:00:04.978688000 | use line 7 sensor 1 as no sensor 1 in this second | 1.784 | 440.9 | use previous sensor 1 |
10 & 9 | 0 | 1 | 2024-01-25 00:00:05.980681984 | 2024-01-25 00:00:05.025767168 | 1.782 | 439.9 | swap so ts_0 is sensor_0 |
12 & 13 | 0 | 1 | 2024-01-25 00:00:06.977396992 | 2024-01-25 00:00:06.984870912 | 1.777 | 439.0 | use last sens0, and last sens1 of second |
14 & 15 | 0 | 1 | 2024-01-25 00:00:07.979064832 | 2024-01-25 00:00:07.983921920 | 1.777 | 438.7 | Normal |
我也修改了这个小组,时间为小时、分钟和整段;其次,将其归入我之后的一组。
power_agg_df = (test_file_df.groupby([ts.dt.hour, ts.dt.minute, ts.dt.second], sort=False)
.agg(** {
"sensor_0" : ("sensor_id", "first"),
"sensor_1" : ("sensor_id", "last"),
"timestamp_0" : ("timestamp", "first"),
"timestamp_1" : ("timestamp", "last"),
"total_current" : ("current", "sum"),
"total_apparent_power" : ("apparent_power", "sum"),
}))