Question

I have to parse some information out of big log file lines. Its something like

abc.log:2012-03-03 11:12:12,457 ABC[123.RPH.-101] XYZ: Query=get_data @a=0,@b=1 Rows=10Time=100

There are many log lines like above in the logfiles. I need to extract information like datetime i.e. 2012-03-03 11:12:12,457 job details i.e. 123.RPH.-101 Query i.e. get_data (no parameters) Rows i.e. 10 Time i.e. 100

So output should look like

2012-03-03 11:12:12,457|123|-101|get_data|10|100

我用刀子尝试过各种 per断,但并未正确。

Answer 1

这确实是可怕的,但自sed起。尚未找到答案。

sed -e  s/[^0-9]*//  -re  s/[^ ]*[([^.]*).[^.]*.([^]]*)]/| 1 | 2/  -e  s/[^ ]* Query=/| /  -e  s/ [^ ]* Rows=/ | /  -e  s/Time=/ | /  my_logfile

Answer 2

我在甘油的解决方案:它利用甘油延伸来匹配。

你没有具体说明文件格式,因此,你可能不得不调整文件格式。

Script invocation: gawk -v OFS= | -f script.awk

{
match($0, /[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+,[0-9]+/)
date_time = substr($0, RSTART, RLENGTH)

match($0, /[([0-9]+).RPH.(-?[0-9]+)]/, matches)
job_detail_1 = matches[1]
job_detail_2 = matches[2]

match($0, /Query=(w+)/, matches)
query = matches[1]

match($0, /Rows=([0-9]+)/, matches)
rows = matches[1]

match($0, /Time=([0-9]+)/, matches)
time = matches[1]

print date_time, job_detail_1, job_detail_2, query,rows, time
}

Answer 3

在这里,AWK解决办法又少了(但也是在 m子里工作):

BEGIN { OFS="|" }

{
    i = match($3, /[[^]]+]/)
    job = substr($3, i + 1, RLENGTH - 2)
    split($5, X, "=")
    query = X[2]
    split($7, X, "=")
    rows = X[2]
    split($8, X, "=")
    time= X[2]

    print $1 " " $2, job, query, rows, time
}

此处没有假设<代码>Rows=10和time=100 strings按空间分开,也就是说,问题就是一个典型例子。

Answer 4

主管机构:

@(collect :vars ())
@file:@year-@mon-@day @hh:@mm:@ss,@ms @jobname[@job1.RPH.@job2] @queryname: Query=@query @params Rows=@{rows /[0-9]+/}Time=@time
@(output)
@year-@mon-@day @hh-@mm-@ss,@ms|@job1|@job2|@query|@rows|@time
@(end)
@(end)

页: 1

$ txr data.txr data.log
2012-03-03 11-12-12,457|123|-101|get_data|10|100

这里,一种办法是使方案断言,记录中的每一条线都必须符合这一模式。首先,无法弥补收集方面的差距。也就是说,不能用非配对材料来寻找与下列内容相匹配的线索:

@(collect :gap 0 :vars ())

第二,在文字末尾,我们补充说:

@(eof)

这具体规定了在案卷末的对比。如果由于非配对线(由于<代码>:gap 0 限值),@(eof)将失效,因此该字母将以失效状态终止。

In this type of task, field splitting regex hacks will backfire because they can blindly produce incorrect results for some subset of the input being processed. If the input contains a vast number of lines, there is no easy way to check for mistakes. It s best to have a very specific match that is likely to reject anything which doesn t resemble the examples on which the pattern is based.

Answer 5

正当的需要

awk -F  [][ =.]  -v OFS= |   {print $1 " " $2, $4, $6, $10, $15, $17}

I m 假设“abc.log:”实际上并未列入记录。

友情链接