English 中文(简体)
pperl 文件处理
原标题:perl file manipulation

我有一个文件 File1 ,其中载有这些数据:

NC_009066   5239    5308    trnA(tgc)   2.10899859667e-09   -
NC_009066   5309    5382    trnN(gtt)   7.03000463545e-10   -
NC_009066   5422    5487    trnC(gca)   7.09999799728e-08   -
NC_009066   5487    5557    trnY(gta)   3.72200156562e-11   -
NC_009066   5549    7097    cox1    291081744.81    +
NC_009066   7109    7180    trnS2(tga)  1.83000043035e-09   -
NC_009066   7183    7256    trnD(gtc)   2.5720000267e-09    +

和另一个快文件 < 坚固 > File2

> NC_009066,1,0-17045,
GCTATCGTAGCTTAATTAAAGCATAACACTGAAGATGTTAAGATGAACCCTAGAAA

我将文件 1 放在数列行中, 逐行排列, 然后我就可以在 < code>/ s+/ 上分割每行, 从而访问每列 。

for $line(@array){
    @column= split(/s+/,$line);
    # print $column[5]."
";

$gene=substr($seq,$column[1],$column[2]);#$seq extracted from File2....}

but I want to do is to take the second column from the 1st line with the 3rd column from the 2nd line (substr($seq,5239,5382)) and then 2nd column from 2nd line and 3rd column from 3rd line (substr($seq,5309,5487))..... what is the best way to do it ??

最佳回答

首先,请注意, split 的默认效果是将 $/code> 分割成白空格,丢弃引导和尾随空字段。 最常见的是, 这是您想要的, 以及 split /s+/ 是不必要的。 如果您想要在除 $/code > 以外的变量上援引默认的分割, 您必须传递一个单一的字面空间, 即 < em>not regex, 作为模式参数, 例如 split, $line

我建议你首先使用 map 来创建仅包含第二和第三栏中数据的阵列。

然后,你可以环绕数据,提取开始值和结束值,并将基因从序列中提取出来。

代码看起来是这样的

use strict;
use warnings;

open my $fh,  < ,  f1.txt  or die $!;

my @data = map [ (split)[1,2] ], <$fh>;

my $seq =  GCTATCGTAGCTTAATTAAAGCATAACACTGAAGATGTTAAGATGAACCCTAGAAA ;

for my $i (1 .. $#data) {
  my ($start, $end) = ( $data[$i-1][0], $data[$i][1] );
  my $gene = substr($seq, $start, $end - $start);
  print "$gene
";
}

请注意,循环的值高于指数 < code>1 (数组中 < em> second < /em > 元素)至 $#data (最后一个元素)。这是因为循环的体将 先前的 元素的第一列和当前元素的第二列作为一对,而第一个元素之前没有元素。

请注意,您可能需要将参数调整到 substr,因为我不知道您的指数是从零开始还是从一开始,或者是否包括该指数中的字符。

例如,在 $start = 1; $end = 2 , substr (AT TC ) 实际指 A AT TC 时, 将返回 T

问题回答

您已经了解了一切, 您只是错误地使用 < code> substr 。 < a href=" http:// p3rl. org/substr" rel= "nofollow"\\ code>perldoc -f substr 中的概要写着 :

子增量,FFFSET, 增加

但您却给它两个偏移。 相反, 从另一个偏移中减去一个偏移, 以计算正确的长度参数 。

使用二维数组:

for (my $i = 0; $i < scalar(@array); ++$i) {
    $$table[$i] = [ split(/s+/,$array[$i]) ];
}

# you may put this into a loop
$start = $$table[0][1];
$end = $$table[1][2] - $$table[0][1];
$gene = substr($seq, $start, $end);

另见"http://perldoc.perl.org/perllol.html" rel="no follow">perllol





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签