I am logging into Craigslist with CURL to scrape the status of my posted listings. The problem I encounter is the transfer of HTML from CURL $output to file_get_html. While Craigslist statuses are actually nested inside TR elements, I just wanted to test the most basic functions to see if things were getting passed through (i.e. link scraping). They are not.
For example, this doesn t work:
$cookie_file_path = getcwd()."/cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, https://accounts.craigslist.org/login?LoginType=L&step=confirmation&originalURI=%2Flogin&rt=&rp=&inputEmailHandle= .$email. &inputPassword= .$password. &submit=Log%20In );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, http://www.craigslist.org );
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
echo $output;
//
include_once( simple_html_dom.php );
$html = file_get_html($output);
//find all links
foreach($html->find( a ) as $element)
echo $element->href . <br> ;
I know the expression works because it returns links if I put in http://google.com , or something or other.