English 中文(简体)
CURL / screen scrape delivery tracking details from Canada Post
原标题:

I need to obtain delivery tracking details from the Canada Post website, which does not offer an API.

I ve formulated a URL that when entered into a browser correctly returns the tracking information, but I can t get the request to function with CURL (it returns a 500 We re Sorry page).


class cURL { 
var $headers; 
var $user_agent; 
var $compression; 
var $cookie_file; 
var $proxy; 
function cURL($cookies=TRUE,$cookie= cookies.txt ,$compression= gzip ,$proxy=  ) { 
$this->headers[] =  Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg ; 
$this->headers[] =  Connection: Keep-Alive ; 
$this->headers[] =  Content-type: application/x-www-form-urlencoded;charset=UTF-8 ; 
$this->user_agent =  Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0) ; 
$this->compression=$compression; 
$this->proxy=$proxy; 
$this->cookies=$cookies; 
if ($this->cookies == TRUE) $this->cookie($cookie); 
} 
function cookie($cookie_file) { 
if (file_exists($cookie_file)) { 
$this->cookie_file=$cookie_file; 
} else { 
fopen($cookie_file, w ) or $this->error( The cookie file could not be opened. Make sure this directory has the correct permissions ); 
$this->cookie_file=$cookie_file; 
fclose($this->cookie_file); 
} 
} 
function get($url) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 0); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process,CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($cUrl, CURLOPT_PROXY,  proxy_ip:proxy_port ); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function post($url,$data) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 1); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process, CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_POSTFIELDS, $data); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($process, CURLOPT_POST, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function error($error) { 
echo "cURL Error
$error"; die; } } $cc = new cURL(); $test = $cc->get( http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=x0x0x0x0x0x0x0&trackingType=trackPersonal ); echo $test;

[UPDATE] after removing the Accept header line as per Tim s reply, I now get a page with the following You are currently visiting our Basic Site. This site is used for low-bandwidth connections, mobile devices and alternative browsers. - but, again, no tracking information.

最佳回答

I believe the problem is with this line:

$this->headers[] =  Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg ; 

Add text/html and you should be good. Or just drop that header.

问题回答

I used Snoopy for screen scrapes. Totally recommended.

UPDATE: I could fetch that content using Snoopy (but I needed to modify a simple line: 809)

Here is my code:

<?php
    include( Snoopy.class.php );

    $http = new Snoopy();
    $http->fetch( http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=x0x0x0x0x0x0x0&trackingType=trackPersonal );

    echo $http->results;
?>

You need to download Snoopy library and modify the line 809:

$cookie_headers .= $cookieKey."=".urlencode($cookieVal)."; ";

with:

$cookie_headers .= $cookieKey."=".$cookieVal."; ";

And voilà!

How old is this thread? Canadapost certainly does offer an API. http://sellonline.canadapost.ca/DevelopersResources/





相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...

热门标签