我有两个数据库<代码>a和b
,其中有科学论文的题目。 我想将这些数据库合并成一个数据库c
。
- It is possible that
a
contains titles which are not inb
and vice versa. - It is possible that a title is in both databases
a
andb
. - It is possible that the cases of the letters and the punctuation do not match:
- "This is a Title." VS. "this is a title"
- "This is - yet another - title." VS. "This is yet another title"
- "The k-mean algorithm based on bla." VS "The k mean Algorithm based on bla"
第一,我想在MySQL内部使用一个列支敦士登的距离功能,以便与两个数据库中相同的标题相匹配,但我想看一看上千万分,我不知道这是否够了。 然后,我认为,为了与标题相匹配而进行全文检索,但只要我知道全文检索与common的话不相匹配,那么,对等同在事实上不同的类似标题上不会很好地进行。
I do not need a 100 % in the matching procedure as a result. But I want to have the rate as high as possible. Any advice?