English 中文(简体)
UnicodeEncodeError on MySQL insert in Python
原标题:

I used lxml to parse some web page as below:

>>> doc = lxml.html.fromstring(htmldata)
>>> element in doc.cssselect(sometag)[0]
>>> text = element.text_content()
>>> print text
u Waldenstrxf6m 

Why it prints u Waldenstrxf6m but not "Waldenström" here?

After that, I tried to add this text to a MySQL table with UTF-8 character set and utf8_general_ci collatio, Users is a Django model:

>>> Users.objects.create(last_name=text)
 ascii  codec can t encode character u xf6  in position 9: ordinal not in range(128)

What I was doing wrong here? How can I get the the correct data "Waldenström" and write it to database?

最佳回答

you want text.encode( utf8 )

问题回答
>>> print text
u Waldenstrxf6m 

There is a difference between displaying something in the shell (which uses the repr) and printing it (which just spits out the string):

>>> u Waldenstrxf6m 
u Waldenstrxf6m 

>>> print u Waldenstrxf6m 
Waldenström

So, I m not sure your snippet above is really what happened. If it definitely is, then your XHTML must contain exactly that string:

<div class="something">u Waldenstrxf6m </div>

(maybe it was incorrectly generated by Python using a string s repr() instead of its str()?)

If this is right and intentional, you would need to parse that Python string literal into a simple string. One way of doing that would be:

>>> r= r"u Waldenstrxf6m "
>>> print r[2:-1].decode( unicode-escape )
Waldenström

If the snippet at the top is actually not quite right and you are simply asking why Python s repr escapes all non-ASCII characters, the answer is that printing non-ASCII to the console is unreliable across various environments so the escape is safer. In the above examples you might have received ?s or worse instead of the ö if you were unlucky.

In Python 3 this changes:

>>>  Waldenstrxf6m 
 Waldenström 




相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

php return a specific row from query

Is it possible in php to return a specific row of data from a mysql query? None of the fetch statements that I ve found return a 2 dimensional array to access specific rows. I want to be able to ...

Character Encodings in PHP and MySQL

Our website was developed with a meta tag set to... <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> This works fine for M-dashes and special quotes, etc. However, I ...

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don t want to hit the DB for every page view. Currently there are ~...

Averaging a total in mySQL

My table looks like person_id | car_id | miles ------------------------------ 1 | 1 | 100 1 | 2 | 200 2 | 3 | 1000 2 | 4 | 500 I need to ...

热门标签