我愿合并两个数据框架,第一个与栏目<代码> 时间_1和 时间_2
(和其他),第二个与栏目<代码> 时间_3,即,一般而言,第二个数据范围比第一个更长。
我想合并这两个数据框架,即第二个数据框架的栏目 时间_3
在 时间_1
和 时间_2
之间,并在第二栏<代码> 时间_3之间重复每一条目第一数据轨道中的条目,在<代码> 时间_1和<代码>之间。
For example, if the first data frame had the following format
time_1 |
time_2 |
dummy_data |
---|---|---|
2023-10-01 04:02:00 | 2023-10-01 08:29:00 | -245.669907 |
2023-10-01 04:03:00 | 2023-10-01 08:49:00 | -1772.948571 |
... | ... | ... |
页: 1
time_3 |
dummy_data2 |
---|---|
2023-10-01 06:21:13.238024 | -131.367901 |
2023-10-01 06:47:19.796628 | -236.277444 |
2023-10-01 07:37:06.438740 | 5.915493 |
2023-10-01 08:16:16.995256 | -134.032433 |
2023-10-01 08:33:53.081095 | -103.733212 |
然后,预期以下产出:
time_1 |
time_2 |
dummy_data |
time_3 |
dummy_data2 |
---|---|---|---|---|
2023-10-01 04:02:00 | 2023-10-01 08:29:00 | -245.669907 | 2023-10-01 06:21:13.238024 | -131.367901 |
2023-10-01 04:02:00 | 2023-10-01 08:29:00 | -245.669907 | 2023-10-01 06:47:19.796628 | -236.277444 |
2023-10-01 04:02:00 | 2023-10-01 08:29:00 | -245.669907 | 2023-10-01 07:37:06.438740 | 5.915493 |
2023-10-01 04:02:00 | 2023-10-01 08:29:00 | -245.669907 | 2023-10-01 08:16:16.995256 | -134.032433 |
2023-10-01 04:03:00 | 2023-10-01 08:49:00 | -1772.948571 | 2023-10-01 06:21:13.238024 | -131.367901 |
2023-10-01 04:03:00 | 2023-10-01 08:49:00 | -1772.948571 | 2023-10-01 06:47:19.796628 | -236.277444 |
2023-10-01 04:03:00 | 2023-10-01 08:49:00 | -1772.948571 | 2023-10-01 07:37:06.438740 | 5.915493 |
2023-10-01 04:03:00 | 2023-10-01 08:49:00 | -1772.948571 | 2023-10-01 08:16:16.995256 | -134.032433 |
2023-10-01 04:03:00 | 2023-10-01 08:49:00 | -1772.948571 | 2023-10-01 08:33:53.081095 | -103.733212 |
I can make this work by "cheating" and iterating through each row and the list and then joining everything back up later as shown in the code below -- but I m wondering if I there s a more "pandas-y" way to do this that doesn t require the nested loops and dictionary of indexes?
# Load the data
df =pd.read_csv("datetime_list.csv")
df[ time_3 ] = pd.to_datetime(datetime_list[ time_3 ])
df2 = pd.read_csv( dataframe.csv )
indexes = {}
# Record which indexes of `df` are between which indexes of `df2`
for i in df2.index:
s = df2[ time_3 ].between(df.loc[i][ time_1 ],
df.loc[i][ time_2 ],
inclusive = left )
friends = list(s[s == True].index)
indexes[i] = friends
output_df = pd.DataFrame()
# Merge them all together, duplicating rows in df where necessary
for key in indexes.keys():
for idx in indexes[key]:
output_df = output_df.append(pd.concat([df.loc[key],
df2.loc[idx]]),
ignore_index = True)
output_df
你可能期望,这一解决办法非常缓慢。 任何建议都值得高度赞赏。