Question

I am logging into Craigslist with CURL to scrape the status of my posted listings. The problem I encounter is the transfer of HTML from CURL $output to file_get_html. While Craigslist statuses are actually nested inside TR elements, I just wanted to test the most basic functions to see if things were getting passed through (i.e. link scraping). They are not.

For example, this doesn t work:

$cookie_file_path = getcwd()."/cookie.txt";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,  https://accounts.craigslist.org/login?LoginType=L&step=confirmation&originalURI=%2Flogin&rt=&rp=&inputEmailHandle= .$email. &inputPassword= .$password. &submit=Log%20In );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER,  http://www.craigslist.org );

$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);

$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo $output;

//

include_once( simple_html_dom.php );
$html = file_get_html($output);
//find all links
foreach($html->find( a ) as $element)
       echo $element->href .  <br> ;

I know the expression works because it returns links if I put in http://google.com , or something or other.

Answer 1

This is how it should be done

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL,  http://www.sitename.com );  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
curl_close($curl);  

$html= str_get_html($str);

Answer 2

Shouldn t you be using str_get_html instead of file_get_html? Since $ouput is a string!

友情链接