English 中文(简体)
从网页中删除数字值?
原标题:Scraping digit values from a webpage?

我想从一个网站上抓取17个值。

这是包含以下数据的页面的url:http://www.bungie.net/stats/reach/online.aspx

On the lower left side of the page there is Unordered List titled "ONLINE PLAYLIST" I want to scrape the number of Players in each list item that contain such information. The number needs to be only digits i.e. no comma.

最佳回答
$c = curl_init();
curl_setopt_array($c, array(
    CURLOPT_URL =>  http://www.bungie.net/stats/reach/online.aspx ,
    CURLOPT_RETURNTRANSFER => true,
    ));
$r = curl_exec($c);
curl_close($c);

preg_match_all( |([^<>]+)</a> </h4>s*([0-9,]+) Players|s , $r, $m);
$teams = array_combine($m[1], $m[2]);
foreach ($teams as &$v) $v = str_replace( , ,  ,$v);
echo  <pre> .print_r($teams,1). </pre> ;

当前输出:

Array
(
    [NOBLE MAP PACK] => 997
    [RUMBLE PIT] => 4117
    [LIVING DEAD] => 6638
    [TEAM SLAYER] => 7730
    [MLG] => 586
    [TEAM SWAT] => 6358
    [TEAM SNIPERS] => 2145
    [TEAM OBJECTIVE] => 758
    [MULTI TEAM] => 1707
    [BIG TEAM BATTLE] => 5706
    [INVASION] => 2881
    [FIREFIGHT] => 2780
    [SCORE ATTACK] => 1121
    [CO-OP CAMPAIGN] => 695
    [TEAM ARENA] => 393
    [DOUBLES ARENA] => 680
    [FFA ARENA] => 120
)

编辑:修复了名称捕获组,以便捕获“CO-OP”,而不仅仅是“OP”。

问题回答

在我看来,这里只需要一点正则表达式。我最近在PERL中做了这样的事情,这并不是很棘手,而且网上也有很多有用的线程和教程。

检查页面,看起来每个列表项都被分配了一个名为“glowBox”的类。我会尝试获取页面的全文/源代码,然后进行筛选,这样您就可以获得以此类开头的部分。可替换地,您可以使用lookahead或lookbacking来检查数字前面或后面是否有“.一旦你缩小了范围,你就需要一个捕获组来提取数字,作为以后可以使用的东西。在PERL中,捕获的字符串会自动分配给变量$1、$2、$3…等。如果你只是在执行正则表达式的无序列表的每一行中循环,你应该只需要$1就可以捕获数字。

您的捕获组可能如下所示:(d+)

括号使其成为一个捕获组,d仅与数字字符匹配,+表示要捕获任何内容,d必须至少匹配一次。不确定您的需求是什么,但如果您同时需要名称和数字,PERL可以轻松地在页面上抓取必要的数据,并将其转换为具有键/值对的哈希。

一定要查看http://www.regexr.com,类似于CSS禅宗花园的regex。您可以将整个页面的源代码粘贴到其中,并使用正则表达式,直到它找到您想要的内容,并且只找到您想要。有关正则表达式怪异语法的更多信息和解释,请启动这里,显然,使用谷歌。

编辑:似乎太晚了。





相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...

热门标签