English 中文(简体)
如何使用 Perl 处理大型数据文件?
原标题:How to handle large data files using Perl?
  • 时间:2012-05-23 16:11:13
  •  标签:
  • perl
  • bigdata

我有一个300GB的文档, 我需要下面显示的线条。 从下面显示的线条中, 我只需要从 < code\\ gt; miR 开始的线条 。

我写了一个 Perl 程序, 它实际上打印了我想要的输出, 但当我对大文件( 类似线条在下面显示) 应用相同的代码, 最多300GB 数据时, 如何继续这个程序? 是否有其他方法可以在这个代码中执行? 因为代码如果运行, 将会被杀死 。

#!/usr/bin/perl -w
$len=@ARGV;
if($len eq 0){
    print "Give file 
";
    exit;
}
$file=$ARGV[0];
open(FH,$file) || die "cant open file
";
@lines=<FH>;
close FH;
while ($line=<FH>){
    chomp $line;
    if ($line =~ /^>miR/){
        $_=$line;
        s/>//g && s/,//g;
        print "$_
";
        if($_=~ /(S+)s(S+)s(S+)s(S+)s(S+)s(S+)s(S+)s(S+)s(S+)s(S+)s(S+)/){
            print $1,"	",$2,"	",$7,"	",$3,"
";
        }

..

Forward:    Score: 124..000000  Q:2 to 18  R:1 to 20 Align Len (17) (64..71%) (82..35%)

   Query:    3  gaauAUUCGUUAG-AAUGGUAa 5 
                    |:: :|||| || |||| 
   Ref:      5  --ctTGGTTAATCATTCCCATt 3 

   Energy:  -10..480000 kCal/Mol

Scores for this hit:
>miR844a    AT2G33810,  124..00  -10..48  2 18    1 20    17  64..71%  82..35%


   Forward: Score: 120..000000  Q:2 to 19  R:289 to 308 Align Len (17) (64..71%) (76..47%)

   Query:    3  gaaUAUUCGUUAGAAUGGUAa 5 
                   ||::| ||  || |||| 
   Ref:      5  ttgATGGG-AAAATTTCCATt 3 

   Energy:  -9..850000 kCal/Mol

Scores for this hit:
>miR844a    AT2G33810,  120..00  -9..85   2 19    289 308 17  64..71%  76..47%


   Forward: Score: 118..000000  Q:2 to 19  R:483 to 503 Align Len (17) (64..71%) (82..35%)

   Query:    3  gaaUAUUCGUUAGAAUGGUAa 5 
                   :||:  |||| ||:||| 
   Ref:      5  gggGTAGAAAATCATATCATa 3 
问题回答

我们可以设置 local $/ = & gt; (作为记录分隔符),然后按以下方式使用:

use Modern::Perl;

{
    local $/ =  > ;
    while (<DATA>){
        next if !/^miR/;
        s/,//g;
        my($var0, $var1, $var2, $var6) = (split    , $_, 8)[0..2, 6];
        say"$var0,	$var1,	$var6,	$var2";
    }
}


__DATA__
>miR844a    AT2G33810,  124.00  -10.48  2 18    1 20    17  64.71%  82.35%


   Forward: Score: 120.000000  Q:2 to 19  R:289 to 308 Align Len (17) (64.71%) (76.47%)

   Query:    3  gaaUAUUCGUUAGAAUGGUAa 5 
                   ||::| ||  || |||| 
   Ref:      5  ttgATGGG-AAAATTTCCATt 3 

   Energy:  -9.850000 kCal/Mol

Scores for this hit:
>moR844a    AT2G33810,  120.00  -9.85   2 19    289 308 17  64.71%  76.47%


   Forward: Score: 118.000000  Q:2 to 19  R:483 to 503 Align Len (17) (64.71%) (82.35%)

   Query:    3  gaaUAUUCGUUAGAAUGGUAa 5 
                   :||:  |||| ||:||| 
   Ref:      5  gggGTAGAAAATCATATCATa 3 
>miR844a    AT2G33810,  120.00  -9.85   2 19    289 308 17  64.71%  76.47%


   Forward: Score: 118.000000  Q:2 to 19  R:483 to 503 Align Len (17) (64.71%) (82.35%)

   Query:    3  gaaUAUUCGUUAGAAUGGUAa 5 
                   :||:  |||| ||:||| 
   Ref:      5  gggGTAGAAAATCATATCATa 3 

产出:

miR844a,    AT2G33810,  1,  124.00
miR844a,    AT2G33810,  289,    120.00

如果当前记录不以“ miR” 开头, 则需要下一个记录( 记录从“ & gt; ” 开始 ), 否则除去任何逗号, 然后分割记录, 以获取( 从 regex) 之后的值 。

希望这有帮助!





相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...

热门标签