English 中文(简体)
UTFNSString的检测特征,由一千四舍五入和编码问题组成。
原标题:Detect character position in an UTF NSString from a byte offset(was SQLite offsets() and encoding problem)

Short story: I have an UTF NSString and a byte offset. I want to know the character at that byte offset. How can I do?

如果你敢于:

。 我对一些案文进行了索引编制,我用这一编号来表示,在我显示结果时,案文中有一个特定部分。

关键问题是,利用这一冲淡语,我无法指出这个词的正确位置。 有时,它正确地指出,有时是3/4个避而不谈。

我的发言非常简单:

CREATE VIRTUAL TABLE t1 USING fts4(file, body, page);

如果我问:

SELECT page, body, offsets(t1) from t1 where body match  and ;

我收到:

...........
502|1 0 427 3
505|1 0 370 3 1 0 1307 3 1 0 1768 3
506|1 0 10 3 1 0 1861 3 1 0 2521 3

...........

举例来说,如果我看一看我身上的427char,我就没有获得正确的地位,但我跳出2/3char。 同样,如果我去370人,如果我去10人,我就有了正确的立场。

我在哪里错了?

最佳回答

Sqlite FTS3 docs,以及你注意到,抵消和长度载于bytes/em>。

为了显示正确的抵消,你必须在把 by塞分解成特征之前使用冲抵和长度。 抵消额来自Sqlite, 分别按多面特性计算,而你则用这一抵消计算得出characters

Your indexed text probably has 3 or 4 characters that are two bytes. Hence the off-by-3-or-4 problem.

问题回答

每一@metatation的回答,被抵消者为tes,而不是 character。 贵国数据库中的案文可能是UTF8-encoded Unicode,在这种情况下,任何单一非ASCIIcharacter由多个bytes代表。 非经共体特性的例子包括有acc素(à、ö等)、智能引言、非拉丁特性的特性(加雷克、热带风暴、多数亚洲特征等)等。

如果Kallite数据库中的 by子是UTF8-encoded Unicode strings,你可以确定统法协会的真正编码特性,但以下述方式抵消:

NSUInteger characterOffsetForByteOffsetInUTF8String(NSUInteger byteOffset, const char *string) {
    /*
     * UTF-8 represents ASCII characters in a single byte. Characters with a code
     * point from U+0080 upwards are represented as multiple bytes. The first byte
     * always has the two most significant bits set (i.e. 11xxxxxx). All subsequent
     * bytes have the most significant bit set, the next most significant bit unset
     * (i.e. 10xxxxxx).
     * 
     * We use that here to determine character offsets. We step through the first
     * `byteOffset` bytes of `string`, incrementing the character offset result
     * every time we come across a byte that doesn t match 10xxxxxx, i.e. where
     * (byte & 11000000) != 10000000
     *
     * See also: http://en.wikipedia.org/wiki/UTF-8#Description
     */
    NSUInteger characterOffset = 0;
    for (NSUInteger i = 0; i < byteOffset; i++) {
        char c = string[i];
        if ((c & 0xc0) != 0x80) {
            characterOffset++;
        }
    }
    return characterOffset;
}

洞穴:如果你重新使用被抵消的特性,将其编入<条码>NSString,同时铭记<条码>在星号上使用UTF-16,因此,其代码高于U+FFFF的特性以16-bit值的pair表示。 一般来说,你在案文内容上胜诉,但如果你关注特别模糊的特性,或某些非文字性质的话。 统法协会可以代表Emojis,因此,上述算法需要改进,以适应这些算法。

(The Code snippet s from ,这一地雷项目感到可以自由使用。)

尤其是Simon的解决方案受到这一深层的启发;这里我是如何这样做的。

可能比回到<代码>更“Swifty”的方式。 缩略语 NSAttributedString。

extension String {

    func charRangeForByteRange(range : NSRange) -> NSRange {

        let bytes = [UInt8](utf8)

        var charOffset = 0

        for i in 0..<range.location {
            if ((bytes[i] & 0xc0) != 0x80) { charOffset++ }
        }

        let location = charOffset

        for i in range.location..<(range.location + range.length) {
            if ((bytes[i] & 0xc0) != 0x80) { charOffset++ }
        }

        let length = charOffset - location

        return NSMakeRange(location, length)
    }
}




相关问题
sqlite3 is chopping/cutting/truncating my text columns

I have values being cut off and would like to display the full values. Sqlite3 -column -header locations.dbs " select n.namelist, f.state, t.state from names n left join locations l on l.id = n.id ...

Entity Framework with File-Based Database

I am in the process of developing a desktop application that needs a database. The application is currently targeted to SQL Express 2005 and works wonderfully. However, I m not crazy about having ...

Improve INSERT-per-second performance of SQLite

Optimizing SQLite is tricky. Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second! Background: We are using SQLite as part of a desktop ...

Metadata for columns in SQLite v2.8 (PHP5)

How can I get metadata / constraints (primary key and "null allowed" in particular) for each column in a SQLite v2.8 table using PHP5 (like mysql_fetch_field for MySql)? sqlite_fetch_column_types (OO:...

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签