I need to separate natural, coherent text/sentences in emails from lists, signatures, greetings and so on before further processing.
例如:
上午
last monday we did bla bla, lore Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua.
- list item 2
- list item 3
- list item 3
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid x ea commodi consequat. Quis aute iure reprehenderit in voluptate velit
页: 1
页: 1
例如,c。
33处罪恶街道,隆登
移动:00 234534/234345
Ideally the algorithm would match only the bold parts.
是否有任何建议的办法,或甚至是否有解决这一问题的现有决定因素? 我是否应当根据固定点数、长度等原因,尝试定期表达或统计方面的更多困难?