我如何取代:
- "ã" with "a"
- "é" with "e"
缩略语 这是可能的吗? 我读到的有些地方,我可以做一些具有基质的两面价值和升值的数学,但现在我可以找到任何参考资料。
我如何取代:
缩略语 这是可能的吗? 我读到的有些地方,我可以做一些具有基质的两面价值和升值的数学,但现在我可以找到任何参考资料。
This answer is incorrect. I didn t understand Unicode Normalization when I wrote it. Look at francadaval s comment and link
消除正常消费阶层这样做。 这些文件是好的,因此,我只是把它联系起来,而不是在这里重复:
http://www.php.net/manual/en/class.normal.php。
具体来说,这一类成员的正常化:
http://www.php.net/manual/en/normal.normalize.php
请注意,统法协会的正常化有几种形式,你似乎想要使形式正常化。 与会资格证明书,但请读文件以确保。
你们不要试图发挥自己的作用: 太多事情可能是错误的,利用所提供的职能是一个更好的想法。
如果你没有机会进入正常班级,或只是不想使用,那么你就可以利用以下职能取代大多数(所有?)的共同点。
function Unaccent($string)
{
return preg_replace( ~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i , $1 , htmlentities($string, ENT_QUOTES, UTF-8 ));
}
对于没有S-3/2号文件的人,我找到了另一种解决办法,它们运作良好,似乎非常全面。 http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001” 这里是这一职能。
/**
* Unaccent the input string string. An example string like `ÀØėÿᾜὨζὅБю`
* will be translated to `AOeyIOzoBY`. More complete than :
* strtr( (string)$str,
* "ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ",
* "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn" );
*
* @param $str input string
* @param $utf8 if null, function will detect input string encoding
* @author http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001
* @return string input string without accent
*/
function remove_accents( $str, $utf8=true )
{
$str = (string)$str;
if( is_null($utf8) ) {
if( !function_exists( mb_detect_encoding ) ) {
$utf8 = (strtolower( mb_detect_encoding($str) )== utf-8 );
} else {
$length = strlen($str);
$utf8 = true;
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length)
|| ((ord($str[$i]) & 0xC0) != 0x80)) {
$utf8 = false;
break;
}
}
}
}
}
if(!$utf8)
$str = utf8_encode($str);
$transliteration = array(
IJ => I , Ö => O , Œ => O , Ü => U , ä => a , æ => a ,
ij => i , ö => o , œ => o , ü => u , ß => s , ſ => s ,
À => A , Á => A , Â => A , Ã => A , Ä => A , Å => A ,
Æ => A , Ā => A , Ą => A , Ă => A , Ç => C , Ć => C ,
Č => C , Ĉ => C , Ċ => C , Ď => D , Đ => D , È => E ,
É => E , Ê => E , Ë => E , Ē => E , Ę => E , Ě => E ,
Ĕ => E , Ė => E , Ĝ => G , Ğ => G , Ġ => G , Ģ => G ,
Ĥ => H , Ħ => H , Ì => I , Í => I , Î => I , Ï => I ,
Ī => I , Ĩ => I , Ĭ => I , Į => I , İ => I , Ĵ => J ,
Ķ => K , Ľ => K , Ĺ => K , Ļ => K , Ŀ => K , Ł => L ,
Ñ => N , Ń => N , Ň => N , Ņ => N , Ŋ => N , Ò => O ,
Ó => O , Ô => O , Õ => O , Ø => O , Ō => O , Ő => O ,
Ŏ => O , Ŕ => R , Ř => R , Ŗ => R , Ś => S , Ş => S ,
Ŝ => S , Ș => S , Š => S , Ť => T , Ţ => T , Ŧ => T ,
Ț => T , Ù => U , Ú => U , Û => U , Ū => U , Ů => U ,
Ű => U , Ŭ => U , Ũ => U , Ų => U , Ŵ => W , Ŷ => Y ,
Ÿ => Y , Ý => Y , Ź => Z , Ż => Z , Ž => Z , à => a ,
á => a , â => a , ã => a , ā => a , ą => a , ă => a ,
å => a , ç => c , ć => c , č => c , ĉ => c , ċ => c ,
ď => d , đ => d , è => e , é => e , ê => e , ë => e ,
ē => e , ę => e , ě => e , ĕ => e , ė => e , ƒ => f ,
ĝ => g , ğ => g , ġ => g , ģ => g , ĥ => h , ħ => h ,
ì => i , í => i , î => i , ï => i , ī => i , ĩ => i ,
ĭ => i , į => i , ı => i , ĵ => j , ķ => k , ĸ => k ,
ł => l , ľ => l , ĺ => l , ļ => l , ŀ => l , ñ => n ,
ń => n , ň => n , ņ => n , ʼn => n , ŋ => n , ò => o ,
ó => o , ô => o , õ => o , ø => o , ō => o , ő => o ,
ŏ => o , ŕ => r , ř => r , ŗ => r , ś => s , š => s ,
ť => t , ù => u , ú => u , û => u , ū => u , ů => u ,
ű => u , ŭ => u , ũ => u , ų => u , ŵ => w , ÿ => y ,
ý => y , ŷ => y , ż => z , ź => z , ž => z , Α => A ,
Ά => A , Ἀ => A , Ἁ => A , Ἂ => A , Ἃ => A , Ἄ => A ,
Ἅ => A , Ἆ => A , Ἇ => A , ᾈ => A , ᾉ => A , ᾊ => A ,
ᾋ => A , ᾌ => A , ᾍ => A , ᾎ => A , ᾏ => A , Ᾰ => A ,
Ᾱ => A , Ὰ => A , ᾼ => A , Β => B , Γ => G , Δ => D ,
Ε => E , Έ => E , Ἐ => E , Ἑ => E , Ἒ => E , Ἓ => E ,
Ἔ => E , Ἕ => E , Ὲ => E , Ζ => Z , Η => I , Ή => I ,
Ἠ => I , Ἡ => I , Ἢ => I , Ἣ => I , Ἤ => I , Ἥ => I ,
Ἦ => I , Ἧ => I , ᾘ => I , ᾙ => I , ᾚ => I , ᾛ => I ,
ᾜ => I , ᾝ => I , ᾞ => I , ᾟ => I , Ὴ => I , ῌ => I ,
Θ => T , Ι => I , Ί => I , Ϊ => I , Ἰ => I , Ἱ => I ,
Ἲ => I , Ἳ => I , Ἴ => I , Ἵ => I , Ἶ => I , Ἷ => I ,
Ῐ => I , Ῑ => I , Ὶ => I , Κ => K , Λ => L , Μ => M ,
Ν => N , Ξ => K , Ο => O , Ό => O , Ὀ => O , Ὁ => O ,
Ὂ => O , Ὃ => O , Ὄ => O , Ὅ => O , Ὸ => O , Π => P ,
Ρ => R , Ῥ => R , Σ => S , Τ => T , Υ => Y , Ύ => Y ,
Ϋ => Y , Ὑ => Y , Ὓ => Y , Ὕ => Y , Ὗ => Y , Ῠ => Y ,
Ῡ => Y , Ὺ => Y , Φ => F , Χ => X , Ψ => P , Ω => O ,
Ώ => O , Ὠ => O , Ὡ => O , Ὢ => O , Ὣ => O , Ὤ => O ,
Ὥ => O , Ὦ => O , Ὧ => O , ᾨ => O , ᾩ => O , ᾪ => O ,
ᾫ => O , ᾬ => O , ᾭ => O , ᾮ => O , ᾯ => O , Ὼ => O ,
ῼ => O , α => a , ά => a , ἀ => a , ἁ => a , ἂ => a ,
ἃ => a , ἄ => a , ἅ => a , ἆ => a , ἇ => a , ᾀ => a ,
ᾁ => a , ᾂ => a , ᾃ => a , ᾄ => a , ᾅ => a , ᾆ => a ,
ᾇ => a , ὰ => a , ᾰ => a , ᾱ => a , ᾲ => a , ᾳ => a ,
ᾴ => a , ᾶ => a , ᾷ => a , β => b , γ => g , δ => d ,
ε => e , έ => e , ἐ => e , ἑ => e , ἒ => e , ἓ => e ,
ἔ => e , ἕ => e , ὲ => e , ζ => z , η => i , ή => i ,
ἠ => i , ἡ => i , ἢ => i , ἣ => i , ἤ => i , ἥ => i ,
ἦ => i , ἧ => i , ᾐ => i , ᾑ => i , ᾒ => i , ᾓ => i ,
ᾔ => i , ᾕ => i , ᾖ => i , ᾗ => i , ὴ => i , ῂ => i ,
ῃ => i , ῄ => i , ῆ => i , ῇ => i , θ => t , ι => i ,
ί => i , ϊ => i , ΐ => i , ἰ => i , ἱ => i , ἲ => i ,
ἳ => i , ἴ => i , ἵ => i , ἶ => i , ἷ => i , ὶ => i ,
ῐ => i , ῑ => i , ῒ => i , ῖ => i , ῗ => i , κ => k ,
λ => l , μ => m , ν => n , ξ => k , ο => o , ό => o ,
ὀ => o , ὁ => o , ὂ => o , ὃ => o , ὄ => o , ὅ => o ,
ὸ => o , π => p , ρ => r , ῤ => r , ῥ => r , σ => s ,
ς => s , τ => t , υ => y , ύ => y , ϋ => y , ΰ => y ,
ὐ => y , ὑ => y , ὒ => y , ὓ => y , ὔ => y , ὕ => y ,
ὖ => y , ὗ => y , ὺ => y , ῠ => y , ῡ => y , ῢ => y ,
ῦ => y , ῧ => y , φ => f , χ => x , ψ => p , ω => o ,
ώ => o , ὠ => o , ὡ => o , ὢ => o , ὣ => o , ὤ => o ,
ὥ => o , ὦ => o , ὧ => o , ᾠ => o , ᾡ => o , ᾢ => o ,
ᾣ => o , ᾤ => o , ᾥ => o , ᾦ => o , ᾧ => o , ὼ => o ,
ῲ => o , ῳ => o , ῴ => o , ῶ => o , ῷ => o , А => A ,
Б => B , В => V , Г => G , Д => D , Е => E , Ё => E ,
Ж => Z , З => Z , И => I , Й => I , К => K , Л => L ,
М => M , Н => N , О => O , П => P , Р => R , С => S ,
Т => T , У => U , Ф => F , Х => K , Ц => T , Ч => C ,
Ш => S , Щ => S , Ы => Y , Э => E , Ю => Y , Я => Y ,
а => A , б => B , в => V , г => G , д => D , е => E ,
ё => E , ж => Z , з => Z , и => I , й => I , к => K ,
л => L , м => M , н => N , о => O , п => P , р => R ,
с => S , т => T , у => U , ф => F , х => K , ц => T ,
ч => C , ш => S , щ => S , ы => Y , э => E , ю => Y ,
я => Y , ð => d , Ð => D , þ => t , Þ => T , ა => a ,
ბ => b , გ => g , დ => d , ე => e , ვ => v , ზ => z ,
თ => t , ი => i , კ => k , ლ => l , მ => m , ნ => n ,
ო => o , პ => p , ჟ => z , რ => r , ს => s , ტ => t ,
უ => u , ფ => p , ქ => k , ღ => g , ყ => q , შ => s ,
ჩ => c , ც => t , ძ => d , წ => t , ჭ => c , ხ => k ,
ჯ => j , ჰ => h
);
$str = str_replace( array_keys( $transliteration ),
array_values( $transliteration ),
$str);
return $str;
}
//- remove_accents()
Short str_replace 使用习俗果园:
<?php
$original_string = "¿Dónde está el niño que vive aquí? En el témpano o en el iglú. ÁFRICA, MÉXICO, ÍNDICE, CANCIÓN y NÚMERO.";
$some_special_chars = array("á", "é", "í", "ó", "ú", "Á", "É", "Í", "Ó", "Ú", "ñ", "Ñ");
$replacement_chars = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U", "n", "N");
$replaced_string = str_replace($some_special_chars, $replacement_chars, $original_string);
echo $replaced_string; // outputs ¿Donde esta el nino que vive aqui? En el tempano o en el iglu. AFRICA, MEXICO, INDICE, CANCION y NUMERO.
?>
Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before. The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.
保证安装PHP-Normalizer-extension(intl and icu)。
提普:你还可能想在完成对等程序之前将案文绘制成下级......
<?php
function normalizeUtf8String( $s)
{
// Normalizer-class missing!
if (! class_exists("Normalizer", $autoload = false))
return $original_string;
// maps German (umlauts) and other European characters onto two characters before just removing diacritics
$s = preg_replace( @x{00c4}@u , "AE", $s ); // umlaut Ä => AE
$s = preg_replace( @x{00d6}@u , "OE", $s ); // umlaut Ö => OE
$s = preg_replace( @x{00dc}@u , "UE", $s ); // umlaut Ü => UE
$s = preg_replace( @x{00e4}@u , "ae", $s ); // umlaut ä => ae
$s = preg_replace( @x{00f6}@u , "oe", $s ); // umlaut ö => oe
$s = preg_replace( @x{00fc}@u , "ue", $s ); // umlaut ü => ue
$s = preg_replace( @x{00f1}@u , "ny", $s ); // ñ => ny
$s = preg_replace( @x{00ff}@u , "yu", $s ); // ÿ => yu
// maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
// exmaple: Ú => U´, á => a`
$s = Normalizer::normalize( $s, Normalizer::FORM_D );
$s = preg_replace( @pM@u , "", $s ); // removes diacritics
$s = preg_replace( @x{00df}@u , "ss", $s ); // maps German ß onto ss
$s = preg_replace( @x{00c6}@u , "AE", $s ); // Æ => AE
$s = preg_replace( @x{00e6}@u , "ae", $s ); // æ => ae
$s = preg_replace( @x{0132}@u , "IJ", $s ); // ? => IJ
$s = preg_replace( @x{0133}@u , "ij", $s ); // ? => ij
$s = preg_replace( @x{0152}@u , "OE", $s ); // Œ => OE
$s = preg_replace( @x{0153}@u , "oe", $s ); // œ => oe
$s = preg_replace( @x{00d0}@u , "D", $s ); // Ð => D
$s = preg_replace( @x{0110}@u , "D", $s ); // Ð => D
$s = preg_replace( @x{00f0}@u , "d", $s ); // ð => d
$s = preg_replace( @x{0111}@u , "d", $s ); // d => d
$s = preg_replace( @x{0126}@u , "H", $s ); // H => H
$s = preg_replace( @x{0127}@u , "h", $s ); // h => h
$s = preg_replace( @x{0131}@u , "i", $s ); // i => i
$s = preg_replace( @x{0138}@u , "k", $s ); // ? => k
$s = preg_replace( @x{013f}@u , "L", $s ); // ? => L
$s = preg_replace( @x{0141}@u , "L", $s ); // L => L
$s = preg_replace( @x{0140}@u , "l", $s ); // ? => l
$s = preg_replace( @x{0142}@u , "l", $s ); // l => l
$s = preg_replace( @x{014a}@u , "N", $s ); // ? => N
$s = preg_replace( @x{0149}@u , "n", $s ); // ? => n
$s = preg_replace( @x{014b}@u , "n", $s ); // ? => n
$s = preg_replace( @x{00d8}@u , "O", $s ); // Ø => O
$s = preg_replace( @x{00f8}@u , "o", $s ); // ø => o
$s = preg_replace( @x{017f}@u , "s", $s ); // ? => s
$s = preg_replace( @x{00de}@u , "T", $s ); // Þ => T
$s = preg_replace( @x{0166}@u , "T", $s ); // T => T
$s = preg_replace( @x{00fe}@u , "t", $s ); // þ => t
$s = preg_replace( @x{0167}@u , "t", $s ); // t => t
// remove all non-ASCii characters
$s = preg_replace( @[^ -x80]@u , "", $s );
// possible errors in UTF8-regular-expressions
if (empty($s))
return $original_string;
else
return $s;
}
?>
The above function is mainly based on the following article: http://ahinea.com/en/tech/accented-translate.html
include( … );
echo preg_replace(
/(P{L})/ui , // replace all except members of Unicode class "letters", case insensitive
, // with nothing
I18N_UnicodeNormalizer::toNFKD( ÅÉÏÔÙåéïôù ) // ù → u + `
);
————
如果其他解决办法中没有任何一项对你有利,那么我的工作就是:
<?php
$string = "áéíóúÁ—whatever";
// create an array of the hex codes of the characters you want to replace (formatted as shown) and whatever you want to replace them with.
$characters = array(
"[xF3]" => "&ocacute;", //ó
"[xFC]" => "ü", //ü
"[xF1]" => "ñ", //ñ
"[xEB]" => "ë", //ë
"[xE9]" => "é", //é
"[xBD]" => "½", //½
);
// note that you must use a two-digit hex code for whatever reason.
// So, for example, although the hex code for an em dash is 2014, you have to use 97 instead. ("[x97]" => "—")
// separate the key->value array into two separate arrays. Or just make two arrays from the beginning, but it s easier to read this way.
foreach ($characters as $hex => $html) {
$replaceThis[] = $hex;
$replaceWith[] = $html;
}
$string = preg_replace($replaceThis, $replaceWith, $string);
?>
它可能不是最棘手的解决办法,但它运作,不需要经常表达的知识。
人们经常使用str_replace
或rel=“nofollow noretinger”> > strtr
<>,以及一个大的特性清单,“从”和“从”改为“到”——即使该清单看得上了......
另一种解决办法是:iconv
, 选择//TRANSLIT
,但从我记得......
另外,如果你使用PHP5.3,新的Normal
班级可能有趣;-
I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...
The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?
I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...
I tried to print all the possible combination of members of several vectors. Why the function below doesn t return the string as I expected? #include <iostream> #include <vector> #...
I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...
I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??
Is there a PHP string function that transforms a multi-line string into a single-line string? I m getting some data back from an API that contains multiple lines. For example: <p>Some Data</...
I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...