English 中文(简体)
Why does my Perl script to decompress files slower when I use threads?
原标题:

So I m running perl 5.10 on a core 2 duo macbook pro compiled with threading support: usethreads=define, useithreads=define. I ve got a simple script to read 4 gzipped files containing aroud 750000 lines each. I m using Compress::Zlib to do the uncompressing and reading of the files. I ve got 2 implementations the only difference between them being one includes use threads. Other than that both script run the same subroutine to do the reading. Hence in psuedocode the non-threading program does this:

read_gzipped(file1);
read_gzipped(file2);
read_gzipped(file3);
read_gzipped(file4);

The threaded version goes like this:

my thr0 = threads->new($read_gzipped, file1 )
my thr1 = threads->new($read_gzipped, file1 )
my thr2 = threads->new($read_gzipped, file1 )
my thr3 = threads->new($read_gzipped, file1 )

thr0->join()
thr1->join()
thr2->join()
thr3->join()

Now the threaded version is actually running almost 2 times slower then the non-threaded script. This obviously was not the result I was hoping for. Can anyone explain what I m doing wrong here?

最佳回答

My guess is the bottleneck for GZIP operations is disk access. If you have four threads competing for disk access on platter harddisk, that slows things down considerably. The disk head will have to move to different files in rapid succession. If you just process one file at a time, the head can stay near that file, and the disk cache will be more accurate.

问题回答

You re using threads to try and speed up something that s IO-bound, not CPU-bound. That just introduces more IO contention, which slows down the script.

ithreads work well if you re dealing with something which is mostly not cpu bound. decompression is cpu bound.

You can easily alleviate the problem with using Parallel::ForkManager module.

Generally - threads in Perl and not really good.

I m not prepared to assume that you re I/O bound without seeing the output of top while this is running. Like depesz, I tend to assume that compression/decompression operations (which are math-heavy) are more likely to be CPU-bound.

When you re dealing with a CPU-bound operation, using more threads/processes than you have processors will almost never[1] improve matters - if the CPU utilization is already at 100%, more threads/processes won t magically increase its capacity - and will most likely make things worse by adding in more context-switching overhead.

[1] I ve heard it suggested that heavy compilations, such as building a new kernel, benefit from telling make to use twice as many processes as the machine has processors and my personal experience has been that this seems to be accurate. The explanation I ve heard for it is that this allows each CPU to be kept busy compiling in one process while the other process is waiting for data to be fetched from main memory. If you view compiling as a CPU-bound process, this is an exception to the normal rule. If you view it as an I/O bound case (where the I/O is between the CPU and main memory rather than disk/network/user I/O), it is not.





相关问题
Why does my chdir to a filehandle not work in Perl?

When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...

How do I use GetOptions to get the default argument?

I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...

Object-Oriented Perl constructor syntax and named parameters

I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...

Where can I find object-oriented Perl tutorials? [closed]

A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...

热门标签