English 中文(简体)
how to extract characters from a Korean string in VBA
原标题:

Need to extract the initial character from a Korean word in MS-Excel and MS-Access. When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ . Is there a function to do this? or at least an idiom?

If you know how to get the Unicode value from the String I d be able to work it out from there but I m sure I d be reinventing the wheel. (yet again)

最佳回答

I think what you are looking for is a Byte Array Dim aByte() as byte aByte="한글" should give you the two unicode values for each character in the string

问题回答

Disclaimer: I know little about Access or VBA, but what you re having is a generic Unicode problem, it s not specific to those tools. I retagged your question to add tags related to this issue.

Access is doing the right thing by returning 한, it is indeed the first character of that two-character string. What you want here is the canonical decomposition of this hangul in its constituent jamos, also known as Normalization Form D (NFD), for “decomposed”. The NFD form is ᄒ ‌ᅡ ‌ᆫ, of which the first character is what you want.

Note also that as per your example, you seem to want a function to return the equivalent hangul (ㅎ) for the jamo (ᄒ) – there really are two different code points because they represent different semantic units (a full-fledged hangul syllable, or a part of a hangul). There is no pre-defined mapping from the former to the latter, you could write a small function to that effect, as the number of jamos is limited to a few dozens (the real work is done in the first function, NFD).

Adding to Arthur s excellent answer, I want to point out that extracting jamo from hangeul syllables is very straightforward from the standard. While the solution isn t specific to Excel or Access (it s a Python module), it only involves arithmetic expressions so it should be easily translated to other languages. The formulas, as can be seen, are identical to those in page 109 of the standard. The decomposition is returned as a tuple of integers encoded strings, which can be easily verified to correspond to the Hangul Jamo Code Chart.

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ ==  __main__ :
    test_values = u 항가있닭넓짧 

    for syllable in test_values:
        print syllable,  : ,
        for s in decompose(syllable): print s,
        print

This is the output in my console:

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ

I assume you got what you needed, but it seems rather convoluted. I don t know anything about this, but recently did some investigating of handling Unicode, and looked into all the string Byte functions, such as LeftB(), RightB(), InputB(), InStrB(), LenB(), AscB(), ChrB() and MidB(), and there s also StrConv(), which has a vbUnicode argument. These are all functions that I d think would be used in any double-byte context, but then, I don t work in that environment so might be missing something very important.





相关问题
import of excel in SQL imports NULL lines

I have a stored procedure that imports differently formatted workbooks into a database table, does work on them then drops the table. Here is the populating query. SELECT IDENTITY(INT,1,1) AS ID ...

Connecting to Oracle 10g with ODBC from Excel VBA

The following code works. the connection opens fine but recordset.recordCount always returns -1 when there is data in the table. ANd If I try to call any methods/properties on recordset it crashes ...

Excel date to Unix timestamp

Does anyone know how to convert an Excel date to a correct Unix timestamp?

C# GemBox Excel Import Error

I am trying to import an excel file into a data table using GemBox and I keep getting this error: Invalid data value when extracting to DataTable at SourceRowIndex: 1, and SourceColumnIndex: 1. As ...

Importing from excel "applications" using SSIS

I am looking for any tips or resources on importing from excel into a SQL database, but specifically when the information is NOT in column and row format. I am currently doing some pre-development ...

热门标签