TL;DR
您将 拉丁-1和 Unicode 字数表示混为一谈,这就是为什么你经常表达不回预期结果的原因。 我更正了这一说法,并从那类中删除了某些非专家性质,以获得这一经常表述,供谷歌表格使用:<><>>>。
你在移动装置上的问题可能是虚拟键盘的行为造成的,这些板输入出人意料的引证标志并非你经常表达的目标(请宣读如下)。
Detailed answer
在下文中,我使用<代码>255,用于解冻,xFF
,用于hexa。
The problem is that you are designating characters with their numeral representation in the Windows Latin-1 (CP1252) character set, when the Google RE2 regular expression library implemented in Google Forms designates characters with their Unicode code points (probably like most – if not all – modern regular expression engines).
For the first 256 positions (x00
to xFF
), characters are identical in both sets, so the confusion is permitted since the RE2 expression ^[x0A-xFF]*$
matches the same characters, which are:
! # ` · Ö Ö Ö Ö Ö
N.B.:上层的斜体与不可印的特性相符。
但是,为了建立RE2与高于<代码>xFF的职位的特性相一致的定期表述,你必须使用统法协会编码值(“编码点”)。
让我们比较一下贵问题所考虑的特性的多少表述:
Character |
Description |
Position in Windows Latin-1 character set |
Position in the Unicode character set |
Must match the regular expression |
" |
quotation mark (or double quote) |
34 or x22 |
34 or x22 |
yes |
|
apostrophe (or single quote) |
39 or x27 |
39 or x27 |
yes |
‘ |
left single quotation mark |
145 or x91 |
8216 or x2018 |
yes |
’ |
right single quotation mark |
146 or x92 |
8217 or x2019 |
yes |
“ |
left double quotation mark |
147 or x93 |
8220 or x201C |
yes |
” |
right double quotation mark |
148 or x94 |
8221 or x201D |
yes |
— |
Em dash |
151 or x97 |
8212 or x2014 |
no |
€ |
Euro sign |
128 or x80 |
8364 or x20AC |
no |
? |
grinning face |
not included |
128512 or x1F600 |
no |
other emojis |
other emojis |
not included |
... or x... |
no |
All the above clarifies that your regular expression ^[x0A-xFF]*$
will match lower-position characters, but not the left/right quotation marks that stand at high positions (well above xFF
) in Unicode. So you need to extend the character class with the representations of these specific marks, like this: ^[x0A-xFFx{2018}x{2019}x{201C}x{201D}]*$
.
Curly brackets are required by RE2 for hexadecimal numbers made of three digits or more.
顺便说一句,我似乎没有必要在以下职位之间列入所有控制特性:<条码>x0A和<条码>x1F(只有<条码>x0A和<条码>x0D与我似乎相关)。 还将<条码>x7F至<条码>x9F的标记用于控制(无印本)特性,这些特性在你的案件中不会成为投入。 因此,表达方式更为相关,但时间更长:[x0Ax0Dx20-x7ExA0-xFFx{2018}x{2019}x{201C}x{201D}]*
。 http://regex101.com/r/noslEo/1“rel=“nofollow noreferer”>there 。
By the way, these expressions exclude the Euro sign, the Em dash and emojis as desired.
The mismatch with characters x22
and x27
on mobile device may result from the virtual keyboard not inputing exactly the character targeted in the regular expression (quotations marks are numerous in Unicode and their shape sometimes very similar depending on the font; you could include more quotation marks in your character class).
Also, be aware that the Google RE2 library does not support the p{Emoji}
character class.