English 中文(简体)
page scraping to get prices from google finance
原标题:

I am trying to get stock prices by scraping google finance pages, I am doing this in python, using urllib package and then using regex to get price data.

When I leave my python script running, it works initially for some time (few minutes) and then starts throwing exception [HTTP Error 503: Service Unavailable]

I guess this is happening because on web server side it detects frequent page updates as a robot and throws this exception after a while..

is there a way around this, i.e. deleting some cookie or creating some cookie etc..

or even better if google gives some api, I want to do this in python because the complete app in python, but if there is nothing available in python to do this, I can consider alternatives. This is my python method that I use in loop to get data ( with few seconds of sleep I call this method in loop)

 def getPriceFromGOOGLE(self, symbol):
    """ 
    gets last traded price from google for given security
    """         
    toReturn = 0.0
    try:
        base_url =  http://google.com/finance?q= 
        req = urllib2.Request(base_url + symbol)
        content = urllib2.urlopen(req).read()
        namestr =  name:"  + symbol +  ",cp:(.*),p:(.*),cid(.*)} 
        m = re.search(namestr, content)
        if m:
            data = str(m.group(2).strip().strip( " ))
            price = data.replace( , ,  )
            toReturn = float(price)
        else:
            print  ERROR   + str(symbol) +   ---   + str(content)      
    except Exception, exc:
        print  Exc:   + str(exc)       
    finally: 
        return toReturn
最佳回答
问题回答

The question is quite old but the selected answer is not valid anymore.
The API has been deprecated.

There is an open source project to scrape all companies from Google finance and match them with their current price at http://scrape-google-finance.compunect.com/
The project solved most issues, includes caching, IP management and works stable without getting blocked.
It uses the internal finance company matching api to scrape companies and the chart api to get prices. However it is php code, not python. You can still learn how it solved the tasks and adapt it.

To get around most rate-limiting or bot-detection from the likes of Google or Wikipedia or Yahoo, spoof your user-agent.

This will make your script s requests appear to be from the latest version of Google Chrome.

headers = { User-Agent  : "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.16 Safari/534.24"}
req = urllib2.Request(url,None,headers)
content = urllib2.urlopen(req).read()

Yahoo Finance is also a good place to get financial information which covers more countries and stocks.

For python 2, you can use ystockquote. For python 3, you can use yfq that I rewrite from the previous one.

To get current quotes of Google and Intel.

>>> import yfq
>>> yfq.get_price( GOOG+INTL )
{ GOOG :  600.25 ,  INTL :  22.25 }

To get historical quotes of Yahoo from March 3, 2012 to March 5, 2012.

>>> yfq.get_historical_prices( YHOO , 20120301 , 20120303 )
[[ Date ,  Open ,  High ,  Low ,  Close ,  Volume ,  Adj Close ], [ 2012-03-02 ,  14.89 ,  14.92 ,  14.66 ,  14.72 ,  9164900 ,  14.72 ], [ 2012-03-01 ,  14.89 ,  14.96 ,  14.79 ,  14.93 ,  12283300 ,  14.93 ]]




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签