Question

This question already has answers here:

Possible Duplicate:
Converting a latin string to unicode in python

保存在文件后,我有一份列表,格式如下:

list_example = [
         u"u00cdndia, Tailu00e2ndia &amp; Cingapura",
         u"Lines through the days 1 (Arabic) u0633u0637u0648u0631 u0639u0628u0631 u0627u0644u0623u064au0627u0645 1",
]

但列表中字符串的实际格式是

actual_format = [
         "Índia, Tailândia & Cingapura ",
         "Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "
]

我如何才能将 list_example 中的字符串转换为 actual_format 列表中的字符串?

Answer 1

你的问题对我来说有点不清楚,无论如何,以下准则应该有助于解决你的问题。

如果您在 Python 源代码中定义这些字符串, 那么您应该

know in which character encoding your editor saves the source code file (e.g. utf-8)
declare that encoding in the first line of your source file, via e.g. # -*- coding: utf-8 -*-
define those strings as unicode objects:

strings = [u "andia, Tailândia & amp; Cingapura", u " lines through the days 1 (阿拉伯语) {\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\> <\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

(注:在 Python 3 中, 字串默认为 Unicode 对象, 也就是说, 您不需要 < code> > u 。在 Python 2 中, unicode 字符属于 < code> unicode 类型, 在 Python 中, 3 Unicode 字符串属于 < code> string 类型。 )

当您想要将这些字符串保存到文件时, 您应该明确定义字符编码 :

with open( filename ,  w ) as f:
    s =  
 .join(strings)
    f.write(s.encode( utf-8 ))

当您想要从该文件中再次读取这些字符串时, 您必须再次明确定义字符编码, 以便正确解码文件内容 :

with open( filename ) as f:
    strings = [l.decode( utf-8 ) for line in f]

Answer 2

actual_format = [x.decode( unicode-escape ) for x in list_example]

友情链接