English 中文(简体)
每天汇总的小时数据
原标题:Aggregating hourly data into daily aggregates

我有按以下格式每小时天气数据:

Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
...
...
12/31/2000 23:00,25

我所需要的是,天天天总会,像这样:

Date,MaxDBT,MinDBT,AveDBT
01/01/2000,36,23,28
01/02/2000,34,22,29
01/03/2000,32,25,30
...
...
12/31/2000,35,9,20

R如何做到这一点?

最佳回答

1 P-4, 1 P-3, 1 FS, 1 NS 可以用动物群进行:

L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"

library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)

因此:

> z
           min max     mean
2000-01-01  30  33 31.33333
2000-12-31  25  25 25.00000

2) 这里的解决办法只是使用核心R:

DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution 

最后一行是:

> ag
        Date  DBT.min  DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000

EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package. (2) minor improvements.

问题回答

利用strptime(),trunc(dply(>,从纸浆包中取:

#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)

# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")

require(plyr)

ddply(dataframe,.(day),
      summarize,
      aveDBT=mean(DBT),
      maxDBT=max(DBT),
      minDBT=min(DBT)
)

gives

         day   aveDBT maxDBT minDBT
1 2000-01-01 31.33333     33     30
2 2000-12-31 25.00000     25     25

澄清:

<代码>strptime根据格式将特性改为日期。 如欲了解你如何具体说明格式,见?strptime.trunc,然后将这些日期时间限定在具体单位,即该单位为时日。

<编码>dp将在数据框架内根据<编码>日对<> summarize进行分类后评价该功能。

There is also a nice package called hydroTSM. It uses zoo objects and can convert to other aggregates in time

The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...

A couple of options:

1. Timetk

如果您有数据框架(或易碎),则<代码>即按时间(<>条码>功能>从<条码>改为 可使用:

library(tidyverse)
library(timetk)

# Collect Data
text <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"

df <- read_csv(text, col_types = cols(Date = col_datetime("%m/%d/%Y %H:%M")))
df
#> # A tibble: 4 x 2
#>   Date                  DBT
#>   <dttm>              <dbl>
#> 1 2000-01-01 01:00:00    30
#> 2 2000-01-01 02:00:00    31
#> 3 2000-01-01 03:00:00    33
#> 4 2000-12-31 23:00:00    25

# Summarize
df %>%
  summarise_by_time(
    .date_var = Date, 
    .by       = "day",
    min       = min(DBT),
    max       = max(DBT),
    mean      = mean(DBT)
  )
#> # A tibble: 2 x 4
#>   Date                  min   max  mean
#>   <dttm>              <dbl> <dbl> <dbl>
#> 1 2000-01-01 00:00:00    30    33  31.3
#> 2 2000-12-31 00:00:00    25    25  25

Created on 2021-05-21 by reprex Pack (v2.0.0)

2. Tidyquant

您可为此使用<代码>tdyquant。 这一过程涉及使用<代码>tq_transmute的功能,以恢复使用xts汇总功能修改的数据框架,apply.daily。 我们适用一种习俗:stat_fun,而这种习俗又回归了敏锐、最大和平均。 但是,您可以运用任何病媒功能,如<代码>、。


library(tidyquant)

df
#> # A tibble: 4 x 2
#>                  Date   DBT
#>                <dttm> <dbl>
#> 1 2000-01-01 01:00:00    30
#> 2 2000-01-01 02:00:00    31
#> 3 2000-01-01 03:00:00    33
#> 4 2000-12-31 23:00:00    25

stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))

df %>%
    tq_transmute(select     = DBT,
                 mutate_fun = apply.daily,
                 FUN        = stat_fun)
# A tibble: 2 x 4
#>                 Date   min   max     mean
#>                <dttm> <dbl> <dbl>    <dbl>
#> 1 2000-01-01 03:00:00    30    33 31.33333
#> 2 2000-12-31 23:00:00    25    25 25.00000

鉴于你有固定的时间格式,你可以这样做。 缩略语(时间)是指你所需要的全部削减和合计。

尝试:

split_hour = cut(as.POSIXct(temp$time), breaks = "60 mins") # summrise given mins
temp$hour = split_hour # make hourly vaiable
a) = a)gregate(. ~ hour, temp, mean)

In this case, temp is like this temp

1  0.6 0.6 0.0 0.350 0.382 0.000 2020-04-13 18:30:42
2  0.0 0.5 0.5 0.000 0.304 0.292 2020-04-13 19:56:02
3  0.0 0.2 0.2 0.000 0.107 0.113 2020-04-13 20:09:10
4  0.6 0.0 0.6 0.356 0.000 0.376 2020-04-13 20:11:57
5  0.0 0.3 0.2 0.000 0.156 0.148 2020-04-13 20:12:07
6  0.0 0.4 0.4 0.000 0.218 0.210 2020-04-13 22:02:49
7  0.2 0.2 0.0 0.112 0.113 0.000 2020-04-13 22:31:43
8  0.3 0.0 0.3 0.155 0.000 0.168 2020-04-14 03:19:03
9  0.4 0.0 0.4 0.219 0.000 0.258 2020-04-14 03:55:58
10 0.2 0.0 0.0 0.118 0.000 0.000 2020-04-14 04:25:25
11 0.3 0.3 0.0 0.153 0.160 0.000 2020-04-14 05:38:20
12 0.0 0.7 0.8 0.000 0.436 0.493 2020-04-14 05:40:02
13 0.0 0.0 0.2 0.000 0.000 0.101 2020-04-14 05:40:44
14 0.3 0.0 0.3 0.195 0.000 0.198 2020-04-14 06:09:26
15 0.2 0.2 0.0 0.130 0.128 0.000 2020-04-14 06:17:15
16 0.2 0.0 0.0 0.144 0.000 0.000 2020-04-14 06:19:36
17 0.3 0.0 0.4 0.177 0.000 0.220 2020-04-14 06:23:43
18 0.2 0.0 0.0 0.110 0.000 0.000 2020-04-14 06:25:19
19 0.0 0.0 0.0 1.199 1.035 0.251 2020-04-14 07:05:24
20 0.2 0.2 0.0 0.125 0.107 0.000 2020-04-14 07:21:46

g

a)

1  2020-04-13 18:30:00 0.60000000 0.6000000 0.0000000 0.3500000 0.38200000 0.00000000
2  2020-04-13 19:30:00 0.15000000 0.2500000 0.3750000 0.0890000 0.14175000 0.23225000
3  2020-04-13 21:30:00 0.00000000 0.4000000 0.4000000 0.0000000 0.21800000 0.21000000
4  2020-04-13 22:30:00 0.20000000 0.2000000 0.0000000 0.1120000 0.11300000 0.00000000
5  2020-04-14 02:30:00 0.30000000 0.0000000 0.3000000 0.1550000 0.00000000 0.16800000
6  2020-04-14 03:30:00 0.30000000 0.0000000 0.2000000 0.1685000 0.00000000 0.12900000
7  2020-04-14 05:30:00 0.18750000 0.1500000 0.2125000 0.1136250 0.09050000 0.12650000
8  2020-04-14 06:30:00 0.10000000 0.1000000 0.0000000 0.6620000 0.57100000 0.12550000
9  2020-04-14 07:30:00 0.00000000 0.3000000 0.2000000 0.0000000 0.16200000 0.11800000
10 2020-04-14 19:30:00 0.20000000 0.3000000 0.0000000 0.1460000 0.19000000 0.00000000
11 2020-04-14 20:30:00 0.06666667 0.2000000 0.2666667 0.0380000 0.11766667 0.17366667
12 2020-04-14 22:30:00 0.20000000 0.3000000 0.0000000 0.1353333 0.18533333 0.00000000
13 2020-04-14 23:30:00 0.00000000 0.5000000 0.5000000 0.0000000 0.28000000 0.32100000
14 2020-04-15 01:30:00 0.25000000 0.2000000 0.4500000 0.1355000 0.11450000 0.26100000




相关问题
Weird Time Issue in Python

Problem with using times in Python. Terminal > Python >>> calendar.timegm(datetime.datetime.now().utctimetuple()) 1258449380 This time indicates GMT: Tue, 17 Nov 2009 09:16:20 GMT Eclipse ...

How does a .NET process get the culture information?

I have a Windows service running (C#, .NET 2.0) on Windows Server 2003 R2. In one server the System.Threading.Thread.CurrentThread.CurrentCulture is {en-AU} and in the other {en-US}. This has caused a ...

Best way to get maximum Date value in java?

I m writing a bit of logic that requires treating null dates as meaning forever in the future (the date in question is an expiration date, which may or may not exist). Instead of putting in special ...

Date format in SQL Server

Short Question: What s the best date format to use in SQL Server? Long Explanation: We re converting our database from mysql to SQL Server. In mysql we always used int(11) to avoid the daylight ...

Converting asp.net/sql web app to be ime zone savy

Our current app stores all dates via the server s datetime. I m thinking we need to update all datetime values in the database over to UTC time. Now for displaying these dates back to the user, I ...

Check if string is of SortableDateTimePattern format

Is there any way I can easily check if a string conforms to the SortableDateTimePattern ("s"), or do I need to write a regular expression? I ve got a form where users can input a copyright date (as a ...

DateTime in PropertyGrid in .Net

I m currently working on some DateTime properties in a PropertyGrid in c#.Net. Im using the default drop-down datetime picker. My question: Is there any way to show the full date with the time? By ...

热门标签