English 中文(简体)
Craigslist, CURL, Simple PHP DOM Issues
原标题:

I am logging into Craigslist with CURL to scrape the status of my posted listings. The problem I encounter is the transfer of HTML from CURL $output to file_get_html. While Craigslist statuses are actually nested inside TR elements, I just wanted to test the most basic functions to see if things were getting passed through (i.e. link scraping). They are not.

For example, this doesn t work:

$cookie_file_path = getcwd()."/cookie.txt";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,  https://accounts.craigslist.org/login?LoginType=L&step=confirmation&originalURI=%2Flogin&rt=&rp=&inputEmailHandle= .$email. &inputPassword= .$password. &submit=Log%20In );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER,  http://www.craigslist.org );

$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);

$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo $output;

//

include_once( simple_html_dom.php );
$html = file_get_html($output);
//find all links
foreach($html->find( a ) as $element)
       echo $element->href .  <br> ; 

I know the expression works because it returns links if I put in http://google.com , or something or other.

问题回答

This is how it should be done

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL,  http://www.sitename.com );  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
curl_close($curl);  

$html= str_get_html($str); 

Shouldn t you be using str_get_html instead of file_get_html? Since $ouput is a string!





相关问题
CSS working only in Firefox

I am trying to create a search text-field like on the Apple website. The HTML looks like this: <div class="frm-search"> <div> <input class="btn" type="image" src="http://www....

image changed but appears the same in browser

I m writing a php script to crop an image. The script overwrites the old image with the new one, but when I reload the page (which is supposed to pickup the new image) I still see the old one. ...

Firefox background image horizontal centering oddity

I am building some basic HTML code for a CMS. One of the page-related options in the CMS is "background image" and "stretch page width / height to background image width / height." so that with large ...

Separator line in ASP.NET

I d like to add a simple separator line in an aspx web form. Does anyone know how? It sounds easy enough, but still I can t manage to find how to do it.. 10x!

热门标签