English 中文(简体)
Vb.net 中是否有函数可以告诉我们 UTF8 Unicode 校验中 2 字符串是否等同?
原标题:Is there a function in vb.net that will tell us whether 2 string is equivalent under UTF8 unicode collation?

这个问题与“https://stackoverflow.com/ questions/8514329/how-to-emulate-mysqls-utf8-general-cilation-in-php-string-compariss”相似,“如何在 PHP 字符串比较中模仿 MySQLs utf8_general_ci recollect , 但我想要 vb.net 而不是 PhP 的函数 。

最近我做了许多 所谓的独一无二的钥匙。

有些钥匙在 UTF8 Unicode 校验中等同。

例如,看看这2个关键:

38.15_-79.07 i38.15_-79.07

如果我把它粘贴到头版上 看看源代码 你会看到

38.15_-79.07

i38.15_-79.07

注:在堆叠溢出时,它们看起来仍然不同。

我知道这不一样。 我想即使是在堆叠交换器里,它也看不出来。 说我有100万个这样的记录, 我想测试MySQL UTF8的整理是否同样, 两根字符串也会被申报。 在上传前我想知道这一点。 我怎么做到的 。

所以 vb. net 认为这些是不同的密钥。 当我们创建 Mysql 查询并将其上传到数据库时, 数据库抱怨的是相同的密钥。 只要一个抱怨和上传100万个数据库就会被卡住 。

我们甚至不知道什么是地狱?

不管怎么说,我想要一个函数 当给两个字符串时 就会告诉我它们是否算同一个字符串。

如果可能的话,我们需要一个函数,将字符串转换成最“标准”的形式。

例如,似乎没有编码,该功能将重新确认所有这些无特性,并消除这些特性。

有这种事吗?

到目前为止,这是我的工作,我需要更全面的东西。

    Private Function StraightenQuotesReplacement() As Generic.Dictionary(Of String, String)
    Static replacement As Generic.Dictionary(Of String, String)
    If replacement Is Nothing Then
        replacement = New Generic.Dictionary(Of String, String)
        replacement.Add(ChrW(&H201C), """")
        replacement.Add(ChrW(&H201D), """")
        replacement.Add(ChrW(&H2018), " ")
        replacement.Add(ChrW(&H2019), " ")
    End If
    Return replacement
End Function

<Extension()>
Public Function straightenQuotes(ByVal somestring As String) As String
    For Each key In StraightenQuotesReplacement.Keys
        somestring = somestring.Replace(key, StraightenQuotesReplacement.Item(key))
    Next
    Return somestring
End Function

<Extension()>
Public Function germanCharacter(ByVal s As String) As String
    Dim t = s
    t = t.Replace("ä", "ae")
    t = t.Replace("ö", "oe")
    t = t.Replace("ü", "ue")
    t = t.Replace("Ä", "Ae")
    t = t.Replace("Ö", "Oe")
    t = t.Replace("Ü", "Ue")
    t = t.Replace("ß", "ss")
    Return t
End Function
<Extension()>
Public Function japaneseCharacter(ByVal s As String) As String
    Dim t = s
    t = t.Replace("ヶ", "ケ")
    Return t
End Function

<Extension()>
Public Function greekCharacter(ByVal s As String) As String
    Dim t = s
    t = t.Replace("ς", "σ")
    t = t.Replace("ι", "ί")

    Return t
End Function
<Extension()>
Public Function franceCharacter(ByVal s As String) As String
    Dim t = s
    t = t.Replace("œ", "oe")
    Return t
End Function

<Extension()>
Public Function RemoveDiacritics(ByVal s As String) As String
    Dim normalizedString As String
    Dim stringBuilder As New StringBuilder
    normalizedString = s.Normalize(NormalizationForm.FormD)
    Dim i As Integer
    Dim c As Char
    For i = 0 To normalizedString.Length - 1
        c = normalizedString(i)
        If CharUnicodeInfo.GetUnicodeCategory(c) <> UnicodeCategory.NonSpacingMark Then
            stringBuilder.Append(c)
        End If
    Next
    Return stringBuilder.ToString()
End Function

<Extension()>
Public Function badcharacters(ByVal s As String) As String
    Dim t = s
    t = t.Replace(ChrW(8206), "")
    Return t
End Function

<Extension()>
Public Function sanitizeUTF8_Unicode(ByVal str As String) As String
    Return str.ToLower.removeDoubleSpaces.SpacetoDash.EncodeUrlLimited.straightenQuotes.RemoveDiacritics.greekCharacter.germanCharacter
End Function
最佳回答

也许对看起来相似的字符使用不同的 Unicode 代码点,例如连苯-minus (- U+002D)、 en-dash (- U+2013) 和 em-dash (- U+2014) 三个看起来都相似的字符: - - - - - - -

使用 AscW () 函数检查每个字符 。

编辑:

如下文评论所述,使用系统.Text.Armalization Form命名空间来确定哪些统一编码编码点被视为等同字符。

问题回答

我用下面的VBA代码 来调查奇特的线条

我将“byers-street”线复制到Excel工作表的 D18 单元格,并将call DsplInHex (Range (“D18”)) 键入立即窗口。 结果如下:

62 79 65 72 73 2D 73 74 72 65 65 74 2D 62 69 73 74 72 6F 5F 33 38 2E 31 35 2D 37 39 2E 30 37 20 62 79 65 72 73 2D 73 74 72 65 65 74 2D 62 69 73 74 72 6F 200E 5F 33 38 2E 31 35 2D 37 39 2E 30 37 

添加换行符和一些空格:

62 79 65 72 73 2D 73 74 72 65 65 74 2D 62 69 73 74 72 6F      5F 33 38 2E 31 35 2D 37 39 2E 30 37 20 
62 79 65 72 73 2D 73 74 72 65 65 74 2D 62 69 73 74 72 6F 200E 5F 33 38 2E 31 35 2D 37 39 2E 30 37 

根据我的 Unicode 书200E Left-to- right mark 。 我想知道您是如何将该字符添加到您的密钥的 。

VB.NET是正确的; 这些密钥是不同的。 要么 MySQL 删除这些字符, 要么您的传输过程删除了它。 无论是哪种方式, 您需要检查您的源数据以查找有趣的字符 。

Option Explicit
Public Sub DsplInHex(Stg As String)

  Dim Pos As Long

  For Pos = 1 To Len(Stg)
    Debug.Print Hex(AscW(Mid(Stg, Pos, 1))) & " ";
  Next
  Debug.Print

End Sub




相关问题
Is Shared ReadOnly lazyloaded?

I was wondering when I write Shared ReadOnly Variable As DataType = New DataType() Or alternatively Shared ReadOnly Variable As New DataType() Is it lazy loaded or as the instance initializes? ...

Entertaining a baby with VB.NET

I would like to write a little application in VB.NET that will detect a baby s cry. How would I get started with such an application?

Choose Enter Rather than Pressing Ok button

I have many fields in the page and the last field is a dropdown with list of values. When I select an item in a dropdown and press Enter, it doesn t do the "Ok". Instead I have to manually click on Ok ...

ALT Key Shortcuts Hidden

I am using VS2008 and creating forms. By default, the underscore of the character in a textbox when using an ampersand is not shown when I run the application. ex. "&Goto Here" is not ...

Set Select command in code

On button Click I want to Set the Select command of a Gridview. I do this and then databind the grid but it doesn t work. What am i doing wrong? protected void bttnView_Click(object sender, ...

Hover tooltip on specific words in rich text box?

I m trying to create something like a tooltip suddenly hoovering over the mouse pointer when specific words in the richt text box is hovered over. How can this be done?

热门标签