English 中文(简体)
如何用在PHP上的特性取代特殊性质?
原标题:how to replace special characters with the ones they re based on in PHP?

我如何取代:

  • "ã" with "a"
  • "é" with "e"

缩略语 这是可能的吗? 我读到的有些地方,我可以做一些具有基质的两面价值和升值的数学,但现在我可以找到任何参考资料。

最佳回答

This answer is incorrect. I didn t understand Unicode Normalization when I wrote it. Look at francadaval s comment and link

消除正常消费阶层这样做。 这些文件是好的,因此,我只是把它联系起来,而不是在这里重复:

http://www.php.net/manual/en/class.normal.php

具体来说,这一类成员的正常化:

http://www.php.net/manual/en/normal.normalize.php

请注意,统法协会的正常化有几种形式,你似乎想要使形式正常化。 与会资格证明书,但请读文件以确保。

你们不要试图发挥自己的作用: 太多事情可能是错误的,利用所提供的职能是一个更好的想法。

问题回答

如果你没有机会进入正常班级,或只是不想使用,那么你就可以利用以下职能取代大多数(所有?)的共同点。

function Unaccent($string)
{
    return preg_replace( ~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i ,  $1 , htmlentities($string, ENT_QUOTES,  UTF-8 ));
}

对于没有S-3/2号文件的人,我找到了另一种解决办法,它们运作良好,似乎非常全面。 http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001” 这里是这一职能。

/**
 * Unaccent the input string string. An example string like `ÀØėÿᾜὨζὅБю`
 * will be translated to `AOeyIOzoBY`. More complete than :
 *   strtr( (string)$str,
 *          "ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ",
 *          "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn" );
 *
 * @param $str input string
 * @param $utf8 if null, function will detect input string encoding
 * @author http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001
 * @return string input string without accent
 */
function remove_accents( $str, $utf8=true )
{
    $str = (string)$str;
    if( is_null($utf8) ) {
        if( !function_exists( mb_detect_encoding ) ) {
            $utf8 = (strtolower( mb_detect_encoding($str) )== utf-8 );
        } else {
            $length = strlen($str);
            $utf8 = true;
            for ($i=0; $i < $length; $i++) {
                $c = ord($str[$i]);
                if ($c < 0x80) $n = 0; # 0bbbbbbb
                elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
                elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
                elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
                elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
                elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
                else return false; # Does not match any model
                for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
                    if ((++$i == $length)
                        || ((ord($str[$i]) & 0xC0) != 0x80)) {
                        $utf8 = false;
                        break;
                    }

                }
            }
        }

    }

    if(!$utf8)
        $str = utf8_encode($str);

    $transliteration = array(
     IJ  =>  I ,  Ö  =>  O , Œ  =>  O , Ü  =>  U , ä  =>  a , æ  =>  a ,
     ij  =>  i , ö  =>  o , œ  =>  o , ü  =>  u , ß  =>  s , ſ  =>  s ,
     À  =>  A , Á  =>  A , Â  =>  A , Ã  =>  A , Ä  =>  A , Å  =>  A ,
     Æ  =>  A , Ā  =>  A , Ą  =>  A , Ă  =>  A , Ç  =>  C , Ć  =>  C ,
     Č  =>  C , Ĉ  =>  C , Ċ  =>  C , Ď  =>  D , Đ  =>  D , È  =>  E ,
     É  =>  E , Ê  =>  E , Ë  =>  E , Ē  =>  E , Ę  =>  E , Ě  =>  E ,
     Ĕ  =>  E , Ė  =>  E , Ĝ  =>  G , Ğ  =>  G , Ġ  =>  G , Ģ  =>  G ,
     Ĥ  =>  H , Ħ  =>  H , Ì  =>  I , Í  =>  I , Î  =>  I , Ï  =>  I ,
     Ī  =>  I , Ĩ  =>  I , Ĭ  =>  I , Į  =>  I , İ  =>  I , Ĵ  =>  J ,
     Ķ  =>  K , Ľ  =>  K , Ĺ  =>  K , Ļ  =>  K , Ŀ  =>  K , Ł  =>  L ,
     Ñ  =>  N , Ń  =>  N , Ň  =>  N , Ņ  =>  N , Ŋ  =>  N , Ò  =>  O ,
     Ó  =>  O , Ô  =>  O , Õ  =>  O , Ø  =>  O , Ō  =>  O , Ő  =>  O ,
     Ŏ  =>  O , Ŕ  =>  R , Ř  =>  R , Ŗ  =>  R , Ś  =>  S , Ş  =>  S ,
     Ŝ  =>  S , Ș  =>  S , Š  =>  S , Ť  =>  T , Ţ  =>  T , Ŧ  =>  T ,
     Ț  =>  T , Ù  =>  U , Ú  =>  U , Û  =>  U , Ū  =>  U , Ů  =>  U ,
     Ű  =>  U , Ŭ  =>  U , Ũ  =>  U , Ų  =>  U , Ŵ  =>  W , Ŷ  =>  Y ,
     Ÿ  =>  Y , Ý  =>  Y , Ź  =>  Z , Ż  =>  Z , Ž  =>  Z , à  =>  a ,
     á  =>  a , â  =>  a , ã  =>  a , ā  =>  a , ą  =>  a , ă  =>  a ,
     å  =>  a , ç  =>  c , ć  =>  c , č  =>  c , ĉ  =>  c , ċ  =>  c ,
     ď  =>  d , đ  =>  d , è  =>  e , é  =>  e , ê  =>  e , ë  =>  e ,
     ē  =>  e , ę  =>  e , ě  =>  e , ĕ  =>  e , ė  =>  e , ƒ  =>  f ,
     ĝ  =>  g , ğ  =>  g , ġ  =>  g , ģ  =>  g , ĥ  =>  h , ħ  =>  h ,
     ì  =>  i , í  =>  i , î  =>  i , ï  =>  i , ī  =>  i , ĩ  =>  i ,
     ĭ  =>  i , į  =>  i , ı  =>  i , ĵ  =>  j , ķ  =>  k , ĸ  =>  k ,
     ł  =>  l , ľ  =>  l , ĺ  =>  l , ļ  =>  l , ŀ  =>  l , ñ  =>  n ,
     ń  =>  n , ň  =>  n , ņ  =>  n , ʼn  =>  n , ŋ  =>  n , ò  =>  o ,
     ó  =>  o , ô  =>  o , õ  =>  o , ø  =>  o , ō  =>  o , ő  =>  o ,
     ŏ  =>  o , ŕ  =>  r , ř  =>  r , ŗ  =>  r , ś  =>  s , š  =>  s ,
     ť  =>  t , ù  =>  u , ú  =>  u , û  =>  u , ū  =>  u , ů  =>  u ,
     ű  =>  u , ŭ  =>  u , ũ  =>  u , ų  =>  u , ŵ  =>  w , ÿ  =>  y ,
     ý  =>  y , ŷ  =>  y , ż  =>  z , ź  =>  z , ž  =>  z , Α  =>  A ,
     Ά  =>  A , Ἀ  =>  A , Ἁ  =>  A , Ἂ  =>  A , Ἃ  =>  A , Ἄ  =>  A ,
     Ἅ  =>  A , Ἆ  =>  A , Ἇ  =>  A , ᾈ  =>  A , ᾉ  =>  A , ᾊ  =>  A ,
     ᾋ  =>  A , ᾌ  =>  A , ᾍ  =>  A , ᾎ  =>  A , ᾏ  =>  A , Ᾰ  =>  A ,
     Ᾱ  =>  A , Ὰ  =>  A , ᾼ  =>  A , Β  =>  B , Γ  =>  G , Δ  =>  D ,
     Ε  =>  E , Έ  =>  E , Ἐ  =>  E , Ἑ  =>  E , Ἒ  =>  E , Ἓ  =>  E ,
     Ἔ  =>  E , Ἕ  =>  E , Ὲ  =>  E , Ζ  =>  Z , Η  =>  I , Ή  =>  I ,
     Ἠ  =>  I , Ἡ  =>  I , Ἢ  =>  I , Ἣ  =>  I , Ἤ  =>  I , Ἥ  =>  I ,
     Ἦ  =>  I , Ἧ  =>  I , ᾘ  =>  I , ᾙ  =>  I , ᾚ  =>  I , ᾛ  =>  I ,
     ᾜ  =>  I , ᾝ  =>  I , ᾞ  =>  I , ᾟ  =>  I , Ὴ  =>  I , ῌ  =>  I ,
     Θ  =>  T , Ι  =>  I , Ί  =>  I , Ϊ  =>  I , Ἰ  =>  I , Ἱ  =>  I ,
     Ἲ  =>  I , Ἳ  =>  I , Ἴ  =>  I , Ἵ  =>  I , Ἶ  =>  I , Ἷ  =>  I ,
     Ῐ  =>  I , Ῑ  =>  I , Ὶ  =>  I , Κ  =>  K , Λ  =>  L , Μ  =>  M ,
     Ν  =>  N , Ξ  =>  K , Ο  =>  O , Ό  =>  O , Ὀ  =>  O , Ὁ  =>  O ,
     Ὂ  =>  O , Ὃ  =>  O , Ὄ  =>  O , Ὅ  =>  O , Ὸ  =>  O , Π  =>  P ,
     Ρ  =>  R , Ῥ  =>  R , Σ  =>  S , Τ  =>  T , Υ  =>  Y , Ύ  =>  Y ,
     Ϋ  =>  Y , Ὑ  =>  Y , Ὓ  =>  Y , Ὕ  =>  Y , Ὗ  =>  Y , Ῠ  =>  Y ,
     Ῡ  =>  Y , Ὺ  =>  Y , Φ  =>  F , Χ  =>  X , Ψ  =>  P , Ω  =>  O ,
     Ώ  =>  O , Ὠ  =>  O , Ὡ  =>  O , Ὢ  =>  O , Ὣ  =>  O , Ὤ  =>  O ,
     Ὥ  =>  O , Ὦ  =>  O , Ὧ  =>  O , ᾨ  =>  O , ᾩ  =>  O , ᾪ  =>  O ,
     ᾫ  =>  O , ᾬ  =>  O , ᾭ  =>  O , ᾮ  =>  O , ᾯ  =>  O , Ὼ  =>  O ,
     ῼ  =>  O , α  =>  a , ά  =>  a , ἀ  =>  a , ἁ  =>  a , ἂ  =>  a ,
     ἃ  =>  a , ἄ  =>  a , ἅ  =>  a , ἆ  =>  a , ἇ  =>  a , ᾀ  =>  a ,
     ᾁ  =>  a , ᾂ  =>  a , ᾃ  =>  a , ᾄ  =>  a , ᾅ  =>  a , ᾆ  =>  a ,
     ᾇ  =>  a , ὰ  =>  a , ᾰ  =>  a , ᾱ  =>  a , ᾲ  =>  a , ᾳ  =>  a ,
     ᾴ  =>  a , ᾶ  =>  a , ᾷ  =>  a , β  =>  b , γ  =>  g , δ  =>  d ,
     ε  =>  e , έ  =>  e , ἐ  =>  e , ἑ  =>  e , ἒ  =>  e , ἓ  =>  e ,
     ἔ  =>  e , ἕ  =>  e , ὲ  =>  e , ζ  =>  z , η  =>  i , ή  =>  i ,
     ἠ  =>  i , ἡ  =>  i , ἢ  =>  i , ἣ  =>  i , ἤ  =>  i , ἥ  =>  i ,
     ἦ  =>  i , ἧ  =>  i , ᾐ  =>  i , ᾑ  =>  i , ᾒ  =>  i , ᾓ  =>  i ,
     ᾔ  =>  i , ᾕ  =>  i , ᾖ  =>  i , ᾗ  =>  i , ὴ  =>  i , ῂ  =>  i ,
     ῃ  =>  i , ῄ  =>  i , ῆ  =>  i , ῇ  =>  i , θ  =>  t , ι  =>  i ,
     ί  =>  i , ϊ  =>  i , ΐ  =>  i , ἰ  =>  i , ἱ  =>  i , ἲ  =>  i ,
     ἳ  =>  i , ἴ  =>  i , ἵ  =>  i , ἶ  =>  i , ἷ  =>  i , ὶ  =>  i ,
     ῐ  =>  i , ῑ  =>  i , ῒ  =>  i , ῖ  =>  i , ῗ  =>  i , κ  =>  k ,
     λ  =>  l , μ  =>  m , ν  =>  n , ξ  =>  k , ο  =>  o , ό  =>  o ,
     ὀ  =>  o , ὁ  =>  o , ὂ  =>  o , ὃ  =>  o , ὄ  =>  o , ὅ  =>  o ,
     ὸ  =>  o , π  =>  p , ρ  =>  r , ῤ  =>  r , ῥ  =>  r , σ  =>  s ,
     ς  =>  s , τ  =>  t , υ  =>  y , ύ  =>  y , ϋ  =>  y , ΰ  =>  y ,
     ὐ  =>  y , ὑ  =>  y , ὒ  =>  y , ὓ  =>  y , ὔ  =>  y , ὕ  =>  y ,
     ὖ  =>  y , ὗ  =>  y , ὺ  =>  y , ῠ  =>  y , ῡ  =>  y , ῢ  =>  y ,
     ῦ  =>  y , ῧ  =>  y , φ  =>  f , χ  =>  x , ψ  =>  p , ω  =>  o ,
     ώ  =>  o , ὠ  =>  o , ὡ  =>  o , ὢ  =>  o , ὣ  =>  o , ὤ  =>  o ,
     ὥ  =>  o , ὦ  =>  o , ὧ  =>  o , ᾠ  =>  o , ᾡ  =>  o , ᾢ  =>  o ,
     ᾣ  =>  o , ᾤ  =>  o , ᾥ  =>  o , ᾦ  =>  o , ᾧ  =>  o , ὼ  =>  o ,
     ῲ  =>  o , ῳ  =>  o , ῴ  =>  o , ῶ  =>  o , ῷ  =>  o , А  =>  A ,
     Б  =>  B , В  =>  V , Г  =>  G , Д  =>  D , Е  =>  E , Ё  =>  E ,
     Ж  =>  Z , З  =>  Z , И  =>  I , Й  =>  I , К  =>  K , Л  =>  L ,
     М  =>  M , Н  =>  N , О  =>  O , П  =>  P , Р  =>  R , С  =>  S ,
     Т  =>  T , У  =>  U , Ф  =>  F , Х  =>  K , Ц  =>  T , Ч  =>  C ,
     Ш  =>  S , Щ  =>  S , Ы  =>  Y , Э  =>  E , Ю  =>  Y , Я  =>  Y ,
     а  =>  A , б  =>  B , в  =>  V , г  =>  G , д  =>  D , е  =>  E ,
     ё  =>  E , ж  =>  Z , з  =>  Z , и  =>  I , й  =>  I , к  =>  K ,
     л  =>  L , м  =>  M , н  =>  N , о  =>  O , п  =>  P , р  =>  R ,
     с  =>  S , т  =>  T , у  =>  U , ф  =>  F , х  =>  K , ц  =>  T ,
     ч  =>  C , ш  =>  S , щ  =>  S , ы  =>  Y , э  =>  E , ю  =>  Y ,
     я  =>  Y , ð  =>  d , Ð  =>  D , þ  =>  t , Þ  =>  T , ა  =>  a ,
     ბ  =>  b , გ  =>  g , დ  =>  d , ე  =>  e , ვ  =>  v , ზ  =>  z ,
     თ  =>  t , ი  =>  i , კ  =>  k , ლ  =>  l , მ  =>  m , ნ  =>  n ,
     ო  =>  o , პ  =>  p , ჟ  =>  z , რ  =>  r , ს  =>  s , ტ  =>  t ,
     უ  =>  u , ფ  =>  p , ქ  =>  k , ღ  =>  g , ყ  =>  q , შ  =>  s ,
     ჩ  =>  c , ც  =>  t , ძ  =>  d , წ  =>  t , ჭ  =>  c , ხ  =>  k ,
     ჯ  =>  j , ჰ  =>  h 
    );
    $str = str_replace( array_keys( $transliteration ),
                        array_values( $transliteration ),
                        $str);
    return $str;
}
//- remove_accents()

Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before. The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.

保证安装PHP-Normalizer-extension(intl and icu)。

提普:你还可能想在完成对等程序之前将案文绘制成下级......

<?php

function normalizeUtf8String( $s)
{
    // Normalizer-class missing!
    if (! class_exists("Normalizer", $autoload = false))
        return $original_string;


    // maps German (umlauts) and other European characters onto two characters before just removing diacritics
    $s    = preg_replace(  @x{00c4}@u     , "AE",    $s );    // umlaut Ä => AE
    $s    = preg_replace(  @x{00d6}@u     , "OE",    $s );    // umlaut Ö => OE
    $s    = preg_replace(  @x{00dc}@u     , "UE",    $s );    // umlaut Ü => UE
    $s    = preg_replace(  @x{00e4}@u     , "ae",    $s );    // umlaut ä => ae
    $s    = preg_replace(  @x{00f6}@u     , "oe",    $s );    // umlaut ö => oe
    $s    = preg_replace(  @x{00fc}@u     , "ue",    $s );    // umlaut ü => ue
    $s    = preg_replace(  @x{00f1}@u     , "ny",    $s );    // ñ => ny
    $s    = preg_replace(  @x{00ff}@u     , "yu",    $s );    // ÿ => yu


    // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
        // exmaple:  Ú => U´,  á => a`
    $s    = Normalizer::normalize( $s, Normalizer::FORM_D );


    $s    = preg_replace(  @pM@u         , "",    $s );    // removes diacritics


    $s    = preg_replace(  @x{00df}@u     , "ss",    $s );    // maps German ß onto ss
    $s    = preg_replace(  @x{00c6}@u     , "AE",    $s );    // Æ => AE
    $s    = preg_replace(  @x{00e6}@u     , "ae",    $s );    // æ => ae
    $s    = preg_replace(  @x{0132}@u     , "IJ",    $s );    // ? => IJ
    $s    = preg_replace(  @x{0133}@u     , "ij",    $s );    // ? => ij
    $s    = preg_replace(  @x{0152}@u     , "OE",    $s );    // Π=> OE
    $s    = preg_replace(  @x{0153}@u     , "oe",    $s );    // œ => oe

    $s    = preg_replace(  @x{00d0}@u     , "D",    $s );    // Ð => D
    $s    = preg_replace(  @x{0110}@u     , "D",    $s );    // Ð => D
    $s    = preg_replace(  @x{00f0}@u     , "d",    $s );    // ð => d
    $s    = preg_replace(  @x{0111}@u     , "d",    $s );    // d => d
    $s    = preg_replace(  @x{0126}@u     , "H",    $s );    // H => H
    $s    = preg_replace(  @x{0127}@u     , "h",    $s );    // h => h
    $s    = preg_replace(  @x{0131}@u     , "i",    $s );    // i => i
    $s    = preg_replace(  @x{0138}@u     , "k",    $s );    // ? => k
    $s    = preg_replace(  @x{013f}@u     , "L",    $s );    // ? => L
    $s    = preg_replace(  @x{0141}@u     , "L",    $s );    // L => L
    $s    = preg_replace(  @x{0140}@u     , "l",    $s );    // ? => l
    $s    = preg_replace(  @x{0142}@u     , "l",    $s );    // l => l
    $s    = preg_replace(  @x{014a}@u     , "N",    $s );    // ? => N
    $s    = preg_replace(  @x{0149}@u     , "n",    $s );    // ? => n
    $s    = preg_replace(  @x{014b}@u     , "n",    $s );    // ? => n
    $s    = preg_replace(  @x{00d8}@u     , "O",    $s );    // Ø => O
    $s    = preg_replace(  @x{00f8}@u     , "o",    $s );    // ø => o
    $s    = preg_replace(  @x{017f}@u     , "s",    $s );    // ? => s
    $s    = preg_replace(  @x{00de}@u     , "T",    $s );    // Þ => T
    $s    = preg_replace(  @x{0166}@u     , "T",    $s );    // T => T
    $s    = preg_replace(  @x{00fe}@u     , "t",    $s );    // þ => t
    $s    = preg_replace(  @x{0167}@u     , "t",    $s );    // t => t

    // remove all non-ASCii characters
    $s    = preg_replace(  @[^-x80]@u     , "",    $s );


    // possible errors in UTF8-regular-expressions
    if (empty($s))
        return $original_string;
    else
        return $s;
}
?>

The above function is mainly based on the following article: http://ahinea.com/en/tech/accented-translate.html

I18N_UnicodeNormal-1.0.0

include( … );

echo preg_replace(
  /(P{L})/ui , // replace all except members of Unicode class "letters", case insensitive
   , // with nothing
 I18N_UnicodeNormalizer::toNFKD( ÅÉÏÔÙåéïôù ) // ù → u + `
);

————

如果其他解决办法中没有任何一项对你有利,那么我的工作就是:

<?php

$string = "áéíóúÁ—whatever";

// create an array of the hex codes of the characters you want to replace (formatted as shown) and whatever you want to replace them with.
$characters = array(
  "[xF3]" => "&ocacute;", //ó
  "[xFC]" => "&uuml;", //ü
  "[xF1]" => "&ntilde;", //ñ
  "[xEB]" => "&euml;", //ë
  "[xE9]" => "&eacute;", //é
  "[xBD]" => "&frac12;", //½
);
// note that you must use a two-digit hex code for whatever reason.
// So, for example, although the hex code for an em dash is 2014, you have to use 97 instead. ("[x97]" => "&mdash;")

// separate the key->value array into two separate arrays. Or just make two arrays from the beginning, but it s easier to read this way.
foreach ($characters as $hex => $html) {
  $replaceThis[] = $hex;
  $replaceWith[] = $html;
}

$string = preg_replace($replaceThis, $replaceWith, $string);

?>

它可能不是最棘手的解决办法,但它运作,不需要经常表达的知识。

人们经常使用str_replacerel=“nofollow noretinger”> > strtr<>,以及一个大的特性清单,“从”和“从”改为“到”——即使该清单看得上了......

另一种解决办法是:iconv, 选择//TRANSLIT,但从我记得......

另外,如果你使用PHP5.3,新的Normal 班级可能有趣;-





相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签