我正在编写一个Perl脚本,在该脚本中,我需要对字符串的每个字符进行循环。有很多字符串,每个字符串都有100个字符长(如果你想知道的话,它们是短的DNA序列)。
那么,使用substr
一次提取一个字符更快吗?还是将字符串拆分为数组,然后在数组上迭代更快?
在我等待答案的时候,我想我会去阅读一下如何用Perl进行基准测试。
我正在编写一个Perl脚本,在该脚本中,我需要对字符串的每个字符进行循环。有很多字符串,每个字符串都有100个字符长(如果你想知道的话,它们是短的DNA序列)。
那么,使用substr
一次提取一个字符更快吗?还是将字符串拆分为数组,然后在数组上迭代更快?
在我等待答案的时候,我想我会去阅读一下如何用Perl进行基准测试。
这实际上取决于你对数据做了什么——但是嘿,你的最后一个问题是正确的!不要猜测,基准。
Perl提供了Benchmark模块正是针对这类事情的,并且使用它非常简单。下面是一个小示例代码:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);
my $dna;
$dna .= [qw(G A T C)]->[rand 4] for 1 .. 100;
sub frequency_substr {
my $length = length $dna;
my %hist;
for my $pos (0 .. $length) {
$hist{$pos}{substr $dna, $pos, 1} ++;
}
\%hist;
}
sub frequency_split {
my %hist;
my $pos = 0;
for my $char (split //, $dna) {
$hist{$pos ++}{$char} ++;
}
\%hist;
}
sub frequency_regmatch {
my %hist;
while ($dna =~ /(.)/g) {
$hist{pos($dna)}{$1} ++;
}
\%hist;
}
cmpthese(-5, # Run each for at least 5 seconds
{
substr => &frequency_substr,
split => &frequency_split,
regex => &frequency_regmatch
}
);
以及一个示例结果:
Rate regex split substr
regex 6254/s -- -26% -32%
split 8421/s 35% -- -9%
substr 9240/s 48% 10% --
事实证明substr的速度惊人。:)
以下是我要做的,而不是首先尝试在substr
和split
之间进行选择:
#!/usr/bin/perl
use strict; use warnings;
my %dist;
while ( my $s = <> ) {
while ( $s =~ /(.)/g ) {
++ $dist{ pos($s) }{ $1 };
}
}
我的好奇心压倒了我。以下是一个基准:
#!/usr/bin/perl
use strict; use warnings;
use Benchmark qw( cmpthese );
my @chars = qw(A C G T);
my @to_split = my @to_substr = my @to_match = map {
join , map $chars[rand @chars], 1 .. 100
} 1 .. 1_000;
cmpthese -1, {
split => &bench_split,
substr => &bench_substr,
match => &bench_match,
};
sub bench_split {
my %dist;
for my $s ( @to_split ) {
my @s = split //, $s;
for my $i ( 0 .. $#s ) {
++ $dist{ $i }{ $s[$i] };
}
}
}
sub bench_substr {
my %dist;
for my $s ( @to_substr ) {
my $u = length($s) - 1;
for my $i (0 .. $u) {
++ $dist{ $i }{ substr($s, $i, 1) };
}
}
}
sub bench_match {
my %dist;
for my $s ( @to_match ) {
while ( $s =~ /(.)/g ) {
++ $dist{ pos($s) }{ $1 };
}
}
}
输出:
Rate split match substr split 4.93/s -- -31% -65% match 7.11/s 44% -- -49% substr 14.0/s 184% 97% --
我在掌握Perl中有一个处理这个问题的例子。您是想创建一堆单独的标量,每个标量都携带Perl标量的内存开销,还是想将所有内容存储在一个字符串中以减少内存,但可能需要做更多的工作。你说你有很多这样的字符串,所以如果你担心记忆,把它们作为单个字符串可能会对你更好。
掌握Perl还有几章涉及基准测试和评测,如果您对此感兴趣的话。
Ether说先让它工作,然后再担心其他的。其中一部分是将操作隐藏在面向任务的接口后面。一个好的面向对象模块可以为您做到这一点。如果你不喜欢实现,你可以更改它。然而,更高级别的程序不必更改,因为界面保持不变。
I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...
The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?
I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...
I tried to print all the possible combination of members of several vectors. Why the function below doesn t return the string as I expected? #include <iostream> #include <vector> #...
I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...
I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??
Is there a PHP string function that transforms a multi-line string into a single-line string? I m getting some data back from an API that contains multiple lines. For example: <p>Some Data</...
I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...