English 中文(简体)
Text manipulation and removal
原标题:

I have text files generated by one of my tools with structure shown below.

1 line text
(space)
multiple
lines
text
(space)
multiple
lines
text
nr 2
---------------------------------------------------------- (58  -  characters)
different 1 line text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text
nr 2
----------------------------------------------------------
different 1 line text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text
nr 2
----------------------------------------------------------
(space)

Each file begins with 1 line text and ends with - signs separator and space. There are different numbers of sections in each file and each section that is in the middle starts and ends with - signs. Below is what I would like to achieve.

multiple
lines
text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text

I would like to remove all one liners, all 58 - characters dividers and all second multiple liners and have only first multiple liners from each section one under another divided with spaces. Could someone recommend how to do it on linux? Any suggestions will help.

问题回答
perl -00 -ne  print if $.%2==0  

The -00 flag sets the record separator to be blank lines.

Edit: to print the first multiline group:

awk  BEGIN {toggle=1} /^(space)$/ {if (!toggle) print ""; toggle=!toggle; next} {if (! toggle) print}  file.txt

Original: to print the second multiline group:

awk  /^(space)$/ { accum=""; next} /^-+$/ {print accum; accum=""; next} {accum=accum"
"$0}  file.txt

The following perl script will do what you want (I find that sed is not that well suited to tasks spanning multiple lines).

#!/usr/bin/perl

$first = 1;
$skip = 2;
while (<>) {
    chomp;
    $ln = $_;
    if ($ln =~ /^-{58}$/) {
        $skip = 2;
        next;
    }
    if ($skip > 0) {
        $skip--;
        if ($skip == 0) {
            if ($first) {
                $first = 0;
            } else {
                print "
";
            }
        }
        next;
    }
    if ($skip == 0) {
        print $ln . "
";
        if ($ln =~ /^$/) {
            $skip = -1;
        }
    }
}

This is based on the assumption that your (space) lines are just empty lines. If they re not, you will need to adjust the /^$/ pattern near the bottom to match what it actually is.

It is basically a simplified state machine controlled by the $skip variable. When this is positive, you re skipping that many lines (starts at 2 and is set to 2 for every --- line).

When $skip reaches zero, it stays there until you get an empty line (you re echoing these lines as you go). When you get an empty line, you set it to -1 and stop echoing the lines.

The $first variable is a bit of a hack to ensure there s no trailing blank line in your output.

Here s the output I got from your input file:

multiple
lines
text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text

which I believe is what you were after.

I would go awk over sed. Build a list until you hit /-+$/ and then output the multiple lines section that you stored up until each dashed line.

EDIT: I would go perl before that, but awk is fun, too.

gawk

awk   { print $2 }  RS="-
" FS="

" file

output

$ ./shell.sh
multiple
lines
text
different
multiple
lines
text
different
multiple
lines 
text

the equivalent in Perl.

$ = "
";
$/ = "-
";
while (<>) {
    chomp;
    ($f1,$f2) = split "

", $_ ;
    print $f2;
}




相关问题
Signed executables under Linux

For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

How to write a Remote DataModule to run on a linux server?

i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...

How can I use exit codes to run shell scripts sequentially?

Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better. Here is what I have so far svn update /var/www/...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

热门标签