I m trying to save files to a directory after scraping them from the web using scrapy. I m extracting a date from the file and using that as the file name. The problem I m running into, however, is that some files have the same date, i.e. there are two files that would take the name "June 2, 2009". So, what I m looking to do is somehow check whether there is already a file with the same name, and if so, name it something like "June 2, 2009.1" or some such.
使用Im的代码如下:
def parse_item(self, response):
self.log( Hi, this is an item page! %s % response.url)
response = response.replace(body=response.body.replace( <br /> ,
))
hxs = HtmlXPathSelector(response)
date = hxs.select("//div[@id= content ]").extract()[0]
dateStrip = re.search(r"([A-Z]*|[A-z][a-z]+)sd*d,s[0-9]+", date)
newDate = dateStrip.group()
content = hxs.select("//div[@id= content ]")
content = content.select( string() ).extract()[0]
filename = ("/path/to/a/folder/ %s.txt") % (newDate)
with codecs.open(filename, w , encoding= utf-8 ) as output:
output.write(content)