So I parsed a html page with .findAll
(BeautifulSoup) to variable named result
.
If I type result
in Python shell then press Enter, I see normal text as expected, but as I wanted to postprocess this result as string object, I noticed that str(result)
returns garbage, like this sample:
xd1x87xd0xb8xd0xbbxd0xbdxd0xb8xd1x86xd0xb0</a><br />
<hr />
</div>
Html page sources is utf-8
编码
How can I handle this?
法典基本上就是这样,如果是:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)
甲型六氯环己烷