我有兴趣学习Perl。 我正在使用《学习手册》和泛泛网站,供参考。
我期待着使用Perl进行一些网络/文本的报废应用,以应用我所学到的一切。
请允许我提出一些很好的选择。
(这不是家庭工作。) 想在Perl做有助于我利用基本东西的东西。 固定特征
我有兴趣学习Perl。 我正在使用《学习手册》和泛泛网站,供参考。
我期待着使用Perl进行一些网络/文本的报废应用,以应用我所学到的一切。
请允许我提出一些很好的选择。
(这不是家庭工作。) 想在Perl做有助于我利用基本东西的东西。 固定特征
If the web pages you want to scrape require JavaScript to function properly, you are going to need more than what WWW::Mechanize can provide you. You might even have to resort to controlling a specific browser via Perl (e.g. using Win32::IE::Mechanize or WWW::Mechanize::Firefox).
正如其他人所说的那样,
http://search.cpan.org/dist/Scrappy”rel=“noreferer”> 同样值得看一看——它让你做许多手法很少——其文件就是一例: Scrappy使用 在你可能希望把自己看作是另一个选择的情况下。 另外,如果你需要从超文本表格中提取数据,。 传真:TableExtract 使这一死亡变得容易——你可以找到你重新感兴趣的桌子,点名标题,并非常容易地提取数据,例如:
my $spidy = Scrappy->new;
$spidy->crawl( http://search.cpan.org/recent , {
#cpansearch li a => sub {
print shift->text, "
";
}
});
use HTML::TableExtract;
$te = HTML::TableExtract->new( headers => [qw(Date Price Cost)] );
$te->parse($html_string) or die "Didn t find table";
foreach $row ($te->rows) {
print join( , , @$row), "
";
}
The most popular web scraping module for Perl is WWW::Mechanize, which is excellent if you can t just retrieve your destination page but need to navigate to it using links or forms, for instance, to log in. Have a look at its documentation for inspiration. If your needs are simple, you can extract the information you need from the HTML using regular expressions (but beware your sanity), otherwise it might be better to use a module such as HTML::TreeBuilder to do the job.
一种似乎有趣的单元,但我没有经过真正的尝试,是。 它是WWWWWWWWW的一个子类:机械化,但支持Javascript和AJAX,并融入了
D. 访问网络 Perl模块。 http://teusje.wordpress.com/ 201005/02/web-scraping-with-perl/"rel=“nofollow”>beginners tutorial can be found here。
It s safe, easy to use and fast.
I am building a Web interface to monitor an embedded system. I have built a Perl script which runs remote commands and gathers output from that system. Now what I need is a Web interface which makes ...
How do I tell what type of value is in a Perl variable? $x might be a scalar, a ref to an array or a ref to a hash (or maybe other things).
When I try a "chdir" with a filehandle as argument, "chdir" returns 0 and a pwd returns still the same directory. Should that be so? I tried this, because in the documentation to chdir I found: "...
I ve read the doc for GetOptions but I can t seem to find what I need... (maybe I am blind) What I want to do is to parse command line like this myperlscript.pl -mode [sth] [inputfile] I can use ...
I m a little confused about what is going on in Perl constructors. I found these two examples perldoc perlbot. package Foo; #In Perl, the constructor is just a subroutine called new. sub new { #I ...
I would like to submit a form to a CGI script localy (w3c-markup-validator), but it is too slow using curl and apache, I want to use this CGI script more than 5,000 times in an another script. and ...
So I m running perl 5.10 on a core 2 duo macbook pro compiled with threading support: usethreads=define, useithreads=define. I ve got a simple script to read 4 gzipped files containing aroud 750000 ...
A Google search yields a number of results - but which ones are the best? The Perl site appears to contain two - perlboot and perltoot. I m reading these now, but what else is out there? Note: I ve ...