English 中文(简体)
How can I remove responses from LiveHTTPHeaders output using awk, perl or sed?
原标题:

Let s say I have something like this (this is only an example, actual request will be different: I loaded StackOverflow with LiveHTTPHeaders enabled to have some samples to work on):

http://stackoverflow.com/

GET / HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Sat, 28 Nov 2009 16:04:24 GMT
Vary: Accept-Encoding
Server: Microsoft-IIS/7.0
Date: Sat, 28 Nov 2009 16:04:23 GMT
Content-Length: 19015
----------------------------------------------------------
...

Full log of requests and responses is available on pastebin

And I want to remove all responses (HTTP/1.x 200 OK and everything in that response, for example) and all one liners showing page address. I would like to only have all requests left in text file with saved LiveHTTPHeaders output.

So, the output would be:

GET / HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

GET /so/all.css?v=5290 HTTP/1.1
Host: sstatic.net
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/css,*/*;q=0.1
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://stackoverflow.com/

...

Again, the full text of what I want to keep is available on pastebin.

If I save LiveHTTPHeaders captured session to text file and I would like to get result like from second code in this question, how do I do this? Maybe with awk, sed or perl? Or something else? I m on Linux.


Edit: I m trying to run Sinan s script. Script is this:
#!/usr/bin/perl
local $/ = "

";
while (<>) {
    print if /^GET|POST/; # Add more request types as needed
}

I tried running it this way:

./cleanup-headers.pl livehttp.txt > filtered.txt

And this way:

perl cleanup-headers.pl < livehttp.txt > filtered.txt

... file filtered.txt was created but it s totally empty.

Anyone tried it on FULL headers i pasted into pastebin? Did it worked?

Full headers

最佳回答

Looks like you re having trailing whitespace issues.

$ sed -e  s/^s*$//  livehttp.txt | 
  perl -e  $/ = ""; while (<>) { print if /^(GET|POST)/ } 

This works by putting Perl s readline operator into paragraph mode (via $/ = ""), which grabs records a chunk at a time, separated by two or more consecutive newlines.

It s nice when it works, but it s a bit brittle. Blank but not empty lines will gum up the works, but sed can clean those up.

Equivalent and more concise command:

$ sed -e  s/^s*$//  livehttp.txt | perl -000 -ne  print if /^(GET|POST)/ 
问题回答

In Perl:

local $/ = "

";
while (<>) {
    print if /^(?:GET|POST)/; # Add more request types as needed
}

Notes: Looking at the output generated by LiveHTTPHeaders, entries are quite clearly separated by two newlines, so I think setting $/ = " " is more appropriate than setting $/ = . I believe your problems were due to the fact that the lines in your input file were actually indented.

I did originally download the file from pastebin and use the full file to test my script. I do not believe the file you were using to test on your computer was identical to the one you put on pastebin.

If you want to robustly deal with possibly indented lines while remaining consistent with the format of the output of LiveHTTPHeaders, you should use something like the following:

#!/usr/bin/perl

use strict; use warnings;

local $/ = "

";
while (<>) {
    next unless /^s*(?:GET|POST)/;
    s!^s+!!gm;
    print;
}

I consider using sed and perl in the same pipeline to be a little bit of an abomination.

just one gawk command

awk -vRS=  /^(GET|POST)/  ORS="

" file

you can use the bash shell

while read -r line
do    
    case "$line" in
        GET*|POST*) flag=1;;        
        "") flag=0;;
    esac
    [ "$flag" -eq 1 ] && echo "$line"
done < "file" 

Run Sinan s code as:

perl test.pl < infile.txt > outfile.txt




相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...

热门标签