I am trying to get stock prices by scraping google finance pages, I am doing this in python, using urllib package and then using regex to get price data.
When I leave my python script running, it works initially for some time (few minutes) and then starts throwing exception [HTTP Error 503: Service Unavailable]
I guess this is happening because on web server side it detects frequent page updates as a robot and throws this exception after a while..
is there a way around this, i.e. deleting some cookie or creating some cookie etc..
or even better if google gives some api, I want to do this in python because the complete app in python, but if there is nothing available in python to do this, I can consider alternatives. This is my python method that I use in loop to get data ( with few seconds of sleep I call this method in loop)
def getPriceFromGOOGLE(self, symbol):
"""
gets last traded price from google for given security
"""
toReturn = 0.0
try:
base_url = http://google.com/finance?q=
req = urllib2.Request(base_url + symbol)
content = urllib2.urlopen(req).read()
namestr = name:" + symbol + ",cp:(.*),p:(.*),cid(.*)}
m = re.search(namestr, content)
if m:
data = str(m.group(2).strip().strip( " ))
price = data.replace( , , )
toReturn = float(price)
else:
print ERROR + str(symbol) + --- + str(content)
except Exception, exc:
print Exc: + str(exc)
finally:
return toReturn