English 中文(简体)
是否有标准的方法 检测方向性?
原标题:Is there a standard way to detect directional character?

I m parsing a text file made from this Wikipedia article, basically I made a Ctrl+A and copy/paste all the content in a text file. (I use it as example). I m trying to make a list of words with their counts and for that I use a Scanner with this delimiter :

    sc.useDelimiter("[\p{javaWhitespace}\p{Punct}]+");

它对我的需要很有用,但分析结果后,我看到一些东西,看起来像一个空白的牌子(https://stackoverflow.com/q/ 10765597/1140748>>Again... )。字符在文章中的(nynorsk){%/code>之后(我复制/paste这里的字符消失时是funny,在Gedit中我可以使用 ,光标不动)。

经过进一步的研究,我发现这个标志其实是 >POP 直接指令(U+202C)

它不是唯一的方向性,从 Java>的文件来看,它们似乎具有定义性。

所以我想知道是否有一种标准方法来检测这些字符,如果可能的话,还有一种方法可以很容易地融入分隔符图案 。

我想避免自己列清单,因为恐怕我会忘记其中的一些。

最佳回答

你总是可以绕过另一条路 使用白名单 而不是黑名单

sc.useDelimiter("[^\p{L}]+");
问题回答

暂无回答




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...