English 中文(简体)
如何将数据框架减为两个数据框架,然后将每个数据框架的浏览量分成新的数据框架
原标题:How to reduce a DataFrame to two DataFrames then sum rows of each data frame into new data frame

我有一个数据框架:

sensor_id timestamp current apparent_power
0 2024-02-01 00:00:00.944369920 1.550 101.5
1 2024-02-01 00:00:00.959425024 0.284 20.4
0 2024-02-01 00:00:01.945563136 1.549 101.6
1 2024-02-01 00:00:01.950393856 0.286 20.2
0 2024-02-01 00:00:02.944880896 1.547 100.6
1 2024-02-01 00:00:02.949997056 0.290 21.6
0 2024-02-01 00:00:03.944605184 1.547 100.9
1 2024-02-01 00:00:03.949342976 0.290 21.8
0 2024-02-01 00:00:04.944967936 1.548 100.9
1 2024-02-01 00:00:04.950715136 0.285 20.5
0 2024-02-01 00:00:05.944571904 1.547 100.7
1 2024-02-01 00:00:05.952841984 0.284 20.2
0 2024-02-01 00:00:06.945287936 1.548 100.8
1 2024-02-01 00:00:06.950289920 0.287 20.8
0 2024-02-01 00:00:07.944841984 1.547 100.7
1 2024-02-01 00:00:07.962969088 0.290 21.8
0 2024-02-01 00:00:08.945434112 1.550 103.1
1 2024-02-01 00:00:08.950361088 0.281 20.1
0 2024-02-01 00:00:09.944396032 1.551 103.8
1 2024-02-01 00:00:09.949807872 0.285 21.1

我最后的目的是将传感器1推向感官2线,把每个传感器的排位推到两行:

timestamp_0 timestamp_1 total_current total_apparent_power
2024-02-01 00:00:00.944369920 2024-02-01 00:00:00.959425024 1.834 121.9
2024-02-01 00:00:01.945563136 2024-02-01 00:00:01.950393856 1.835 121.6
2024-02-01 00:00:02.944880896 2024-02-01 00:00:02.949997056 xxx xxx
2024-02-01 00:00:03.944605184 2024-02-01 00:00:03.949342976 xxx xxx
2024-02-01 00:00:04.944967936 2024-02-01 00:00:04.950715136 xxx xxx
2024-02-01 00:00:05.944571904 2024-02-01 00:00:05.952841984 xxx xxx
2024-02-01 00:00:06.945287936 2024-02-01 00:00:06.950289920 xxx xxx
2024-02-01 00:00:07.944841984 2024-02-01 00:00:07.962969088 xxx xxx
2024-02-01 00:00:08.945434112 2024-02-01 00:00:08.950361088 xxx xxx
2024-02-01 00:00:09.944396032 2024-02-01 00:00:09.949807872 xxx xxx

我花了时间,按次和小时将其分类:

sensor_id timestamp current apparent_power seconds hour
0 2024-02-01 00:00:00.944369920 1.550 101.5 0 0
1 2024-02-01 00:00:00.959425024 0.284 20.4 0 0
0 2024-02-01 00:00:01.945563136 1.549 101.6 1 0
1 2024-02-01 00:00:01.950393856 0.286 20.2 1 0
0 2024-02-01 00:00:02.944880896 1.547 100.6 2 0
1 2024-02-01 00:00:02.949997056 0.290 21.6 2 0
0 2024-02-01 00:00:03.944605184 1.547 100.9 3 0
1 2024-02-01 00:00:03.949342976 0.290 21.8 3 0
0 2024-02-01 00:00:04.944967936 1.548 100.9 4 0
1 2024-02-01 00:00:04.950715136 0.285 20.5 4 0
0 2024-02-01 00:00:05.944571904 1.547 100.7 5 0
1 2024-02-01 00:00:05.952841984 0.284 20.2 5 0
0 2024-02-01 00:00:06.945287936 1.548 100.8 6 0
1 2024-02-01 00:00:06.950289920 0.287 20.8 6 0
0 2024-02-01 00:00:07.944841984 1.547 100.7 7 0
1 2024-02-01 00:00:07.962969088 0.290 21.8 7 0
0 2024-02-01 00:00:08.945434112 1.550 103.1 8 0
1 2024-02-01 00:00:08.950361088 0.281 20.1 8 0
0 2024-02-01 00:00:09.944396032 1.551 103.8 9 0
1 2024-02-01 00:00:09.949807872 0.285 21.1 9 0

我有一些事情可以开始做,但如果数据缺乏传感器读物,则落在后面:

second_tracker = 0
a_row = 0
b_row = 0

for i, row in test_file_df.iterrows():
   
   if row[ seconds ] != second_tracker:
       second_tracker += 1
       # store totals
       a_cur = a_row[ current ]
       b_cur = b_row[ current ]
       total_current = a_cur + b_cur
       
       a_app_power = a_row[ apparent_power ]
       b_app_power = b_row[ apparent_power ]
       total_app_power = a_app_power + b_app_power
       
       new_row = { timestamp_0 : a_row[ timestamp ], timestamp_1 : b_row[ timestamp ],  total_current : total_current,  total_app_power : total_app_power}
       print(new_row)
       power_sum_df.loc[len(power_sum_df)] = new_row
       a_row = 0
       b_row = 0
   
   if row[ sensor_id ] == 0:
       a_row = row
   else:
       b_row = row

当由于传感器缺失而没有配对传感器电流时,该电网就停止了,即,下行是传感器0,下游也是传感器0,因为以前的传感器1丢失。

我试图分成两个编组,并试图将第1栏+第1行加起来,但可以理解,如何将这两行加进新的数据框架,只加一栏,同时抄送下一个浏览时间。

Can anyone help - even a better way of pulling out the alternate rows into one row. I tried to use unstack, group and agg to no avail.

更新***** 的数据显示,在单一时间段内记录不正规:

line no sensor_id timestamp current apparent_power
4 0 2024-01-25 00:00:02.977937920 1.502 370.9
5 1 2024-01-25 00:00:02.983158016 0.293 72.4
6 0 2024-01-25 00:00:03.977388032 1.503 371.1
7 1 2024-01-25 00:00:03.995464960 0.284 70.3
8 0 2024-01-25 00:00:04.978688000 1.500 370.6
9 1 2024-01-25 00:00:05.025767168 0.287 70.9
10 0 2024-01-25 00:00:05.980681984 1.495 369.0
11 1 2024-01-25 00:00:06.031785984 0.286 70.7
12 0 2024-01-25 00:00:06.977396992 1.497 369.7
13 1 2024-01-25 00:00:06.984870912 0.280 69.3
14 0 2024-01-25 00:00:07.979064832 1.495 369.1
15 1 2024-01-25 00:00:07.983921920 0.282 69.6

Desired Result: There is logic applied where there are more than 2 records in a single second - always taking one of sensor0 and one of sensor 1. If there is only a single sensor reading within the second, then use the previous second corresponding sensor value for the sum.

lines joined sensor_0 sensor_1 timestamp_0 timestamp_1 total_current total_apparent_power Comment
4 & 5 0 1 2024-01-25 00:00:02.977937920 2024-01-25 00:00:02.983158016 1.795 443.2 Normal
6 & 7 0 1 2024-01-25 00:00:03.977388032 2024-01-25 00:00:03.995464960 1.787 441.4 Normal
8 & 7 0 1 2024-01-25 00:00:04.978688000 use line 7 sensor 1 as no sensor 1 in this second 1.784 440.9 use previous sensor 1
10 & 9 0 1 2024-01-25 00:00:05.980681984 2024-01-25 00:00:05.025767168 1.782 439.9 swap so ts_0 is sensor_0
12 & 13 0 1 2024-01-25 00:00:06.977396992 2024-01-25 00:00:06.984870912 1.777 439.0 use last sens0, and last sens1 of second
14 & 15 0 1 2024-01-25 00:00:07.979064832 2024-01-25 00:00:07.983921920 1.777 438.7 Normal

我也修改了这个小组,时间为小时、分钟和整段;其次,将其归入我之后的一组。

power_agg_df = (test_file_df.groupby([ts.dt.hour, ts.dt.minute, ts.dt.second], sort=False)
               .agg(** {
                   "sensor_0" : ("sensor_id", "first"),
                   "sensor_1" : ("sensor_id", "last"),
                   "timestamp_0" : ("timestamp", "first"),
                   "timestamp_1" : ("timestamp", "last"),
                   "total_current" : ("current", "sum"),
                   "total_apparent_power" : ("apparent_power", "sum"),
                   
               }))
问题回答

www.un.org/Depts/DGACM/index_spanish.htm 拟议解决办法

grouped = (df.groupby(np.arange(len(df)) // 2)
             .agg({ timestamp :  first ,  current :  sum ,  apparent_power :  sum }))

grouped[ timestamp2 ] = grouped.groupby(level=0)[ timestamp ].transform( last )

grouped = (grouped[[ timestamp ,  timestamp2 ,  current ,  apparent_power ]]
             .rename(columns={ current :  total_current ,  apparent_power :  total_apparent_power })
             )

http://www.ohchr.org。

>>> grouped.columns
Index([ timestamp ,  timestamp2 ,  total_current ,  total_apparent_power ], dtype= object )
>>> grouped
                      timestamp  ... total_apparent_power
0 2024-02-01 00:00:00.944369920  ...                121.9
1 2024-02-01 00:00:01.945563136  ...                121.8
2 2024-02-01 00:00:02.944880896  ...                122.2
3 2024-02-01 00:00:03.944605184  ...                122.7
4 2024-02-01 00:00:04.944967936  ...                121.4
5 2024-02-01 00:00:05.944571904  ...                120.9
6 2024-02-01 00:00:06.945287936  ...                121.6
7 2024-02-01 00:00:07.944841984  ...                122.5
8 2024-02-01 00:00:08.945434112  ...                123.2
9 2024-02-01 00:00:09.944396032  ...                124.9

引自“无时能力”主张,另一种利用<条码>/在一行中减少该守则的方法:

grouped = (df.groupby(np.arange(len(df)) // 2)
             .agg(**{
            "timestamp_0": ("timestamp", "first"),
            "timestamp_1": ("timestamp", "last"),
            "total_current": ("current", "sum"),
            "total_apparent_power": ("apparent_power", "sum")
        })
             )

http://pandas.pydata.org/docs/ reference/api/pandas.Series.dt.hour.html 您需要的行动:

ts = pd.to_datetime(df["timestamp"])
grp = {"hour": ts.dt.hour, "second": ts.dt.second} # +more ?
# grp = {"time": ts.dt.floor("s").dt.time} # 3-components of time

#1 reduce the groups with more than two identical censors
uniq = (df.assign(**grp).sort_values([*grp, "sensor_id"]).loc[
        lambda x: ~x.duplicated(subset=[*grp, "sensor_id"], keep="last")])

#2 look for the previous cursors to complete the groups having a single one
miss = df.groupby(list(grp.values()))["sensor_id"].transform("nunique").eq(1)
exte = (uniq.mask(miss | uniq["sensor_id"].eq(uniq["sensor_id"].shift(-1)))
           .ffill().loc[miss].convert_dtypes())#.assign(timestamp=None))

#3 swap the censors in the groups with a leading censor-1
cln = (pd.concat([uniq, exte if not exte.empty else None])
         .sort_values(["sensor_id", *grp]))
out = (
        cln.astype({"line no": "str"})
        .groupby(list(grp.values()))
            .agg(**{
                # feel free to comment (#) any agg
            "lines_joined": ("line no", " & ".join),
            "sensor_0": ("sensor_id", "first"),
            "sensor_1": ("sensor_id", "last"),
            "timestamp_0": ("timestamp", "first"),
            "timestamp_1": ("timestamp", "last"),
            "total_current": ("current", "sum"),
            "total_apparent_power": ("apparent_power", "sum")
        })
     .reset_index(names=[*grp]) # to show up the groups
    #.reset_index(drop=True) # or just drop them if optional
)

“entertext





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签