English 中文(简体)
<tbody> glitch in PHP Simple HTML DOM parser
原标题:

I m using PHP Simple HTML DOM Parser to scrape some data of a webshop (also running XAMPP 1.7.2 with PHP5.3.0), and I m running into problems with <tbody> tag. The structure of the table is, essentialy (details aren t really that important):

<table>
  <thead>
    <!--text here-->
  </thead>
  <tbody>
    <!--text here-->
  </tbody>
</table>

Now, I m trying to get to the <tbody> section by using code:

$element = $html->find( tbody ,0)->innertext;

It doesn t throw any errors, it just prints nothing out when I try to echo it. I ve tested the code on other elements, <thead>, <table>, even something like <span class="price"> and they all work fine (ofcourse, removing ",0" fails the code). They all give their correct sections. Outertext ditto. But it all fails on <tbody>.

Now, I ve skimmed through the Parser, but I m not sure I can figure it out. I ve noticed that <thead> isn t even mentioned, but it works fine. shrug

I guess I could try and do child navigation, but that seems to glitch as well. I ve just tried running:

$el = $html->find( table ,0);
$el2 = $el->children(2);
echo $el2->outertext;

and no dice. Tried replacing children with first_child and 2 with 1, and still no dice. Funny, though, if I try ->find instead of children, it works perfectly.

I m pretty confident I could find a work-around the whole thing, but this behaviour seems odd enough to post here. My curious mind is happy for all the help it can get.

问题回答

in simple_html_dom.php file comment or remove line #396

// if ($m[1]=== tbody ) continue;

There is a bug report for this issue here: http://sourceforge.net/p/simplehtmldom/bugs/79/

It is still open at the time of this writing. There is an alternative fix if you do not wish to modify the source code, for example in a loop to find <tr> s

<?php
  // The *BROKEN* way to find the <tr> s 
  // below the <tbody> below the <table id="foo">
  foreach($dom->find( tbl#foo tbody tr ) as $tr) {
    /* you will get nothing */
  }

You can instead selectively check the parent tag name while iterating all <tr> s like so:

<?php
  // A workaround to find the <tr> s 
  // below the <tbody> below the <table id="foo">
  foreach($dom->find( tbl#foo tr ) as $tr) { // note the lack of tbody selector
    /* you will get all trs, but let s only work with ones with the parent
       of a tbody! */
    if($tr->parent->tag ==  tbody ) { // our workaround
      /* this part will work as you would expect the above broken code to work */
    }
  }

Also note, a slightly unrelated issue that I ran into, that Chrome and FF inspectors will correct tag soup regarding<tbody> and <thead>. Be careful -- only look at the actual source -- stay away from the DOM inspectors if you run into unexplainable issues.

Make sure your tbody is coming from some javascript execution. I was facing the same problem with a span tag. Later I found that, if any html code is getting into the page via jquery/any other javascript execution then in that case simple_html_dom simply fails.

Make sure that tbody is really is there. Many browsers will add a tbody to tables in the inspect panel even though they are not present in the response.





相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...

热门标签