English 中文(简体)
记录仪表记录
原标题:Filter log file entries based on date range

My server is having unusually high CPU usage, and I can see Apache is using way too much memory. I have a feeling, I m being DOS d by a single IP - maybe you can help me find the attacker?

I ve使用了以下线,找到10个最“活性”的IP:

cat access.log | awk  {print $1}  |sort  |uniq -c |sort -n |tail

最大的5个IP公司作为“平均”用户对服务器的要求大约为200倍。 然而,如果这5人只是非常频繁的来访者,或者他们攻击服务器,我就找不到。

Is there are way, to specify the above search to a time interval, eg. the last two hours OR between 10-12 today?

Cheers!

UPDATED 23 OCT 2011 - The authorities I need:

在最后X小时内进入[头两小时]

awk -vDate=`date -d now-2 hours  +[%d/%b/%Y:%H:%M:%S`   { if ($4 > Date) print Date FS $4}  access.log

Get most active IPs within the last X hours [Here two hours]

awk -vDate=`date -d now-2 hours  +[%d/%b/%Y:%H:%M:%S`   { if ($4 > Date) print $1}  access.log | sort  |uniq -c |sort -n | tail

相对时间范围内的条目

awk -vDate=`date -d now-4 hours  +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d now-2 hours  +[%d/%b/%Y:%H:%M:%S`   { if ($4 > Date && $4 < Date2) print Date FS Date2 FS $4}  access.log

B. 在绝对时期的条目

awk -vDate=`date -d  13:20  +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d 13:30  +[%d/%b/%Y:%H:%M:%S`   { if ($4 > Date && $4 < Date2) print $0}  access.log 

在绝对时期内找到最活跃的IP

awk -vDate=`date -d  13:20  +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d 13:30  +[%d/%b/%Y:%H:%M:%S`   { if ($4 > Date && $4 < Date2) print $1}  access.log | sort  |uniq -c |sort -n | tail
最佳回答

是的,这样做有多种途径。 在这方面,我将如何做到这一点。 对于开端人来说,没有必要调管散射输出,而只开标卷awk

awk -vDate=`date -d now-2 hours  +[%d/%b/%Y:%H:%M:%S`  $4 > Date {print Date, $0}  access_log

假设你的记录像矿(可再生)一样,将时间储存在现场4并置于括号内。 以上所述就是在最后2小时内找到一切。

因此,我正在对两小时前的格式价值进行储存,并与实地四进行比较。 有条件的表述应当直截了当。 我随后将印刷该日期,随后是投递场主(OFS——或此处的空间),随后是整个项目0美元。 你可以使用你先前的表述,仅印刷1美元(封面地址)。

awk -vDate=`date -d now-2 hours  +[%d/%b/%Y:%H:%M:%S`  $4 > Date {print $1}  | sort  |uniq -c |sort -n | tail

如果你想使用范围,就应具体说明两个日期变量,并适当构建你的表述。

如果你想在2-4岁之前找到某种东西的话,那么你表达的话可能会看像这样的东西。

awk -vDate=`date -d now-4 hours  +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d now-2 hours  +[%d/%b/%Y:%H:%M:%S`  $4 > Date && $4 < Date2 {print Date, Date2, $4} access_log 

Here is a question I answered regarding dates in bash you might find helpful. Print date for the monday of the current week (in bash)

问题回答

Introduction

接受的的回答是错误的,涉及Antoine s comment : 由于<代码>wk,将<>><>alphavidic > /em> 比较。 因此,如果您在两年期结束时和两个月开始登记活动:

  • [27/Feb/2023:00:00:00
  • [28/Feb/2023:00:00:00
  • [01/Mar/2023:00:00:00

awk will consider:

[01/Mar/2023:00:00:00 < [27/Feb/2023:00:00:00 < [28/Feb/2023:00:00:00

错了! 你们必须比较一下 date stings!

For this, you could use libraries. Conforming to the language you use.

I will present here two different way, one using with Date::Parse library, and another (quicker), using with GNU/.

As this is a common task

And because this is not exactly same than extract last 10 minutes from logfile where it s about a bunch of time upto the end of logfile.

由于我早上需要他们,我(快速)写道:

#!/usr/bin/perl -ws
# This script parse logfiles for a specific period of time

sub usage {
    printf "Usage: %s -s=<start time> [-e=<end time>] <logfile>
";
    die $_[0] if $_[0];
    exit 0;
}

use Date::Parse;

usage "No start time submited" unless $s;
my $startim=str2time($s) or die;

my $endtim=str2time($e) if $e;
$endtim=time() unless $e;

usage "Logfile not submited" unless $ARGV[0];
open my $in, "<" . $ARGV[0] or usage "Can t open  $ARGV[0]  for reading";
$_=<$in>;
exit unless $_; # empty file
# Determining regular expression, depending on log format
my $logre=qr{^(S{3}s+d{1,2}s+(d{2}:){2}d+)};
$logre=qr{^[^[]*[(d+/S+/(d+:){3}d+s+d+)]} unless /$logre/;

while (<$in>) {
    /$logre/ && do {
        my $ltim=str2time($1);
        print if $endtim >= $ltim && $ltim >= $startim;
    };
};

可使用:

./timelapsinlog.pl -s=09:18 -e=09:24 /path/to/logfile

印刷记录在09h18至09h24之间。

./timelapsinlog.pl -s= 2017/01/23 09:18:12  /path/to/logfile

http://www.em>january 23th, 9h18 12" upto now/em>。

In order to reduce perl code, I ve used -s switch to permit auto-assignement of variables from commandline: -s=09:18 will populate a variable $s wich will contain 09:18. Care to not miss the equal sign = and no spaces!

Nota: This hold two diffent kind of regex for two different log standard. If you require different date/time format parsing, either post your own regex or post a sample of formatted date from your logfile

^(S{3}s+d{1,2}s+(d{2}:){2}d+)         # ^Jan  1 01:23:45
^[^[]*[(d+/S+/(d+:){3}d+s+d+)]    # ^... [01/Jan/2017:01:23:45 +0000]

Quicker** bash version:

Answering to Gilles Quénot s comment, I ve tried to create a version.

As this version seem quicker than version, You may found a full version of grepByDates.sh with comments on my website (not on gith...), I post here a shorter version:

#!/bin/bash

prog=${0##*/}
usage() {
    cat <<EOUsage
        Usage: $prog <start date> <end date> <logfile>
            Each argument are required. End date could by `now`.
EOUsage
}

die() {
    echo >&2 "ERROR $prog: $*"
    exit 1
}

(($#==3))|| { usage; die  Wrong number of arguments. ;}

[[ -f $3 ]] || die "File not found."
# Conversion of argument to EPOCHSECONDS by asking `date` for the two conversions
{
    read -r start
    read -r end
} < <(
    date -f - +%s <<<"$1"$ 
 "$2"
)

# Determing wich kind of log format, between "apache logs" and "system logs":
read -r oline <"$3"   # read one log line
if [[ $oline =~ ^[^ ]{3} +[0-9]{1,2} +([0-9]{2}:){2}[0-9]+ ]]; then
    # Look like syslog format
    sedcmd= s/^([^ ]{3} +[0-9]{1,2} +([0-9]{2}:){2}[0-9]+).*/1/ 
elif [[ $oline =~ ^[^[]+[[0-9]+/[^ ]+/([0-9]+:){3}[0-9]+ +[0-9]+] ]]; then
    # Look like apache logs
    sedcmd= s/^[0-9.]+ +[^ ]+ +[^ ]+ [([^]]+)].*$/1/;s/:/ /;y|/|-| 
else
    die  Log format not recognized 
fi
# Print lines begining by `1<tabulation>`
sed -ne s/^1\o11//p <(
    # paste `bc` tests with log file
    paste <(
        # bc will do comparison against EPOCHSECONDS returned by date and $start - $end
        bc < <(
            # Create a bc function for testing against $start - $end.
            cat <<EOInitBc
                define void f(x) {
                    if ((x>$start) && (x<$end)) { 1;return ;};
                    0;}
EOInitBc
            # Run sed to extract date strings from logfile, then
                # run date to convert string to EPOCHSECONDS
            sed "$sedcmd" <"$3" |
                date -f - + f(%s) 
        )
    ) "$3" 
)

Explanation

  • Script run sed to extract date strings from logfile
  • Pass date strings to date -f - +%s to convert in one run all strings to EPOCH (Unix Timestamp).
  • Run bc for the tests: print 1 if min > date > max or else print 0.
  • Run paste to merge bc output with logfile.
  • Finally run sed to find lines that match 1<tab> then replace match with nothing, then print.

So this script will fork 5 subprocess to do dedicated things by specialised tools, but won t do shell loop against each lines of logfile!

** Note:

Of course, this is quicker on my host because I run on a multicore processor, each task run parallelized!!

Conclusion:

This is not a program! This is an aggregation script!

If you consider bash not as a programming language, but as a super language or a tools aggregator, you could take the full power of all your tools!!

If someone encounters with the awk: invalid -v option, here s a script to get the most active IPs in a predefined time range:

cat <FILE_NAME> | awk  $4 >= "[04/Jul/2017:07:00:00" && $4 < "[04/Jul/2017:08:00:00"  | awk  {print $1}  | sort -n | uniq -c | sort -nr | head -20

快速和可读的方式在沙尔这样做。 这似乎比现金版快。 (配制时间用内部模块显示,该模块从该代码中删除)

./ext_lines.py -v -s Feb 12 00:23:00 -e Feb 15 00:23:00 -i /var/log/syslog.1

Total time                : 445 ms 187 musec
Time per line             : 7 musec 58 ns
Number of lines           : 63,072
Number of extracted lines : 29,265

I can t compare this code with the daemon.log file used by others... But, here is my config

Operating System: Kubuntu 22.10 KDE Plasma Version: 5.25.5 KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Kernel Version: 6.2.0-060200rc8-generic (64-bit)
Graphics Platform: X11 Processors: 16 × AMD Ryzen 7 5700U with Radeon Graphics
Memory: 14.9 GiB of RAM

The essential code could fit in just one line (dts = ...), but to make it more readable it s being "splited" in three. It s not only rather fast, it s also very compact :-)

from argparse import ArgumentParser, FileType
from datetime import datetime
from os.path import basename
from sys import argv, float_info
from time import mktime, localtime, strptime

__version__ =  1.0.0                      # Workaround (internal use)

now = datetime.now

progname = basename(argv[0])

parser = ArgumentParser(description =  Is Python strptime faster than sed and Perl ? ,
                        prog = progname)

parser.add_argument( --version ,
                    dest =  version ,
                    action =  version ,
                    version =  {} : {} .format(progname,
                                               str(__version__)))
parser.add_argument( -i ,
                     --input ,
                    dest =  infile ,
                    default =  /var/log/syslog.1 ,
                    type = FileType( r ,
                                    encoding =  UTF-8 ),
                    help =  Input file (stdin not yet supported) )
parser.add_argument( -f ,
                     --format ,
                    dest =  fmt ,
                    default =  %b %d %H:%M:%S ,
                    help =  Date input format )
parser.add_argument( -s ,
                     --start ,
                    dest =  start ,
                    default = None,
                    help =  Starting date : >= )
parser.add_argument( -e ,
                     --end ,
                    dest =  end ,
                    default = None,
                    help =  Ending date : <= )
parser.add_argument( -v ,
                    dest =  verbose ,
                    action =  store_true ,
                    default = False,
                    help =  Verbose mode )

args = parser.parse_args()
verbose = args.verbose
start = args.start
end = args.end
infile = args.infile
fmt = args.fmt

############### Start code ################

lines = tuple(infile)

# Use defaut values if start or end are undefined
if not start :
    start = lines[0][:14]

if not end :
    end = lines[-1][:14]

# Convert start and end to timestamp
start = mktime(strptime(start,
                        fmt))
end = mktime(strptime(end,
                      fmt))

# Extract matching lines
t1 = now()
dts = [(x, line) for x, line in [(mktime(strptime(line[:14 ],
                                                  fmt)),
                                  line) for line in lines] if start <= x <= end]
t2 = now()

# Print stats
if verbose :
    total_time =  Total time 
    time_p_line =  Time per line 
    n_lines =  Number of lines 
    n_ext_lines =  Number of extracted lines 

    print(f {total_time:<25} : {((t2 - t1) * 1000)} ms )
    print(f {time_p_line:<25} : {((t2 -t1) / len(lines) * 1000)} ms )
    print(f {n_lines:<25} : {len(lines):,} )
    print(f {n_ext_lines:<25} : {len(dts):,} )

# Print extracted lines
print(  .join([x[1] for x in dts]))

To parse the access.log precisely in a specified range, in this case only the last 10 minutes (based from EPOCH aka number of seconds since 1970/01/01):

Input file:

172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.4 - - [17/Feb/2023:17:25:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.5 - - [17/Feb/2023:17:15:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"

Perl s oneliner:

With the reliable Time::Piece time parser, using strptime() to parse date, and strftime() to format new one. This module is installed in core (by default) thats is not the case with not reliable Date::Parse

$ perl -MTime::Piece -sne  
    BEGIN{
        my $t = localtime;
        our $now = $t->epoch;
        our $monthsRe = join "|", $t->mon_list;
    }
    m![(d{2}/(?:$monthsRe)/d{4}:d{2}:d{2}:d{2})s!;
    my $d = Time::Piece->strptime("$1", "%d/%b/%Y:%H:%M:%S");
    my $old = $d->strftime("%s");
    my $diff = (($now - $old) + $gap);
    if ($diff > $min and $diff < $max) {print}
  -- -gap=$({ echo -n "0"; date "+%:::z*3600"; } | bc) 
     -min=0 
     -max=600 access.log

Explanations of arguments: -gap, -min, -max switches

  • -gap the $((7*3600)) aka 25200 seconds, is the gap with UTC : +7 hours in seconds in my current case ?? (Thai TZ) ¹ rewrote as { echo -n "0"; date "+%:::z*3600"; } | bc if you have GNU date. If not, use another way to set the gap
  • -min the min seconds since we print log matching line(s)
  • -max the max seconds until we print log matching line(s)
  • to know the gap from UTC, take a look to:

¹

$ LANG=C date
Fri Feb 17 15:50:13 +07 2023

The +07 is the gap.

This way, you can filter exactly at the exact seconds range with this snippet.

Sample output

172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"




相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...