English 中文(简体)
• 如何在邮政总局中设置双重编码?
原标题:How fix double encoding in PostgreSQL?

我在PogreSQL有一个言辞的表格,但有些字句使UTF-8的果园如<代码>0xe7e36f和0xefbfd无效。

我怎么能够以诸如<条码>等某种编号来识别所有无效的词句?

EDIT: My database is in UTF-8, but I think there are double encoding from various other encodings. I think this because when I tried to convert to one type as LATIN1, I get an error saying that some char don t exists in that encoding, when I change to LATIN2 I get the same error, but with another character.

因此,可以采取哪些措施解决这一问题?

最佳回答

Usage

It s a solution for my specific case, but maybe with some modifications can help another people.

Usage

SELECT fix_wrong_encoding( LATIN1 );

Function

-- Convert words with wrong encoding
CREATE OR REPLACE FUNCTION fix_wrong_encoding(encoding_name VARCHAR)
RETURNS VOID
AS $$
DECLARE     
    r RECORD;
    counter INTEGER;
    token_id INTEGER;
BEGIN
    counter = 0;
    FOR r IN SELECT t.id, t.text FROM token t
    LOOP
        BEGIN
            RAISE NOTICE  Converting % , r.text;
            r.text := convert_from(convert_to(r.text,encoding_name), UTF8 );
            RAISE NOTICE  Converted to % , r.text;
            RAISE NOTICE  Checking existence. ;
            SELECT id INTO token_id FROM token WHERE text = r.text;             
            IF (token_id IS NOT NULL) THEN
                BEGIN
                    RAISE NOTICE  Token already exists. Updating ids in textblockhastoken ;
                    IF(token_id = r.id) THEN
                        RAISE NOTICE  Token is the same. ;
                        CONTINUE;
                    END IF;
                    UPDATE textblockhastoken SET tokenid = token_id
                    WHERE tokenid = r.id;
                    RAISE NOTICE  Removing current token. ;
                    DELETE FROM token WHERE id = r.id;
                END;
            ELSE
                BEGIN
                    RAISE NOTICE  Token don  t exists. Updating text in token ;
                    UPDATE token SET text = r.text WHERE id = r.id;
                END;
            END IF;
            EXCEPTION WHEN untranslatable_character THEN
                --do nothing
            WHEN character_not_in_repertoire THEN
                --do nothing
            END;
            counter = counter + 1;
            RAISE NOTICE  % token converted , counter;
    END LOOP;
END
$$
LANGUAGE plpgsql;
问题回答

暂无回答




相关问题
摘录数据

我如何将Excel板的数据输入我的Django应用? I m将PosgreSQL数据库作为数据库。

Postgres dump of only parts of tables for a dev snapshot

On production our database is a few hundred gigabytes in size. For development and testing, we need to create snapshots of this database that are functionally equivalent, but which are only 10 or 20 ...

How to join attributes in sql select statement?

I want to join few attributes in select statement as one for example select id, (name + + surname + + age) as info from users this doesn t work, how to do it? I m using postgreSQL.

What text encoding to use?

I need to setup my PostgreSQL DB s text encoding to handle non-American English characters that you d find showing up in languages such as German, Spanish, and French. What character encoding should ...

SQL LIKE condition to check for integer?

I am using a set of SQL LIKE conditions to go through the alphabet and list all items beginning with the appropriate letter, e.g. to get all books where the title starts with the letter "A": SELECT * ...

热门标签