English 中文(简体)
Screen Scraping in PHP with login
原标题:

Looking around for a solution to this, I have found different methods. Some use regex, some use DOM scripting or something.

I want to go to a site, log in, fill out a form and then check if the form sent. The logging in part is the part I can t find anything on.

Anyone know of an easy way to do this?

最佳回答

You may want to take a look at Perl s LWP library (I know it isn t PHP, but it s very useful for screen scraping, web unit testing, and such):

问题回答

I d agree with Les. Curl + Charles (or Fiddler, Firefox s Tamper Data extension, wireshark, etc.) is the way I ve always done this. The one trick I ve found is that some sites require a three step process:

  1. Hit the login page with a GET request first to get any session ids, cookies, and/or required fields (e.g. .net sites have __VIEWSTATE and __EVENTVALIDATION).
  2. Once you have these values, then you post to the login page
  3. Finally, request whatever resource you re after.

Don t plan on curl s cookie jar and cookie file being much help. You ll probably be best off parsing out the session id and cookies from the headers using a simple regex.

Hope this helps!

You might be better off with some sort of scriptable browser if you need to do a lot of GUI stuff. If you need to use PHP, check out curl: http://us2.php.net/curl

what I usually do is fire up charles go through the login process in a browser and record the raw requests. Copy+paste the requests and throw them through fopen or curl (with some small adjustments according to the responses).

I have fair bit of experience in this. I used to use Curl but it is no fun using it. In particular many times sites exchange XSRF tokens, or pass hidden variables, or set all kinds of cookies. Tracking all this with Curl becomes difficult. Atleast for me.

I then explored Selenium and I love it. There are 2 things- 1) install Selenium IDE (works only in Firefox). 2) Install Selenium RC Server

After starting Selenium IDE, go to the site that you are trying to automate and start recording events that you do on the site. Think it as recording a macro in the browser. Afterwards, you get the code output for the language you want.

Just so you know Browsermob uses Selenium for load testing and for automating tasks on browser.

I ve uploaded a ppt that I made a while back. This should save you a good amount of time- http://www.4shared.com/get/tlwT3qb_/SeleniumInstructions.html

In the above link select the option of regular download.





相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...

热门标签