English 中文(简体)
字符串子序列
原标题:Subsequence of strings
  • 时间:2012-05-23 21:27:05
  •  标签:
  • python

我执行了一个 python 函数, 返回两个字符串中最长的共同子序列。 现在, 我想执行一个函数, 返回任意多个字符串中最长的共同子序列 。

我找到了三个字符串的帮助:

dp[i, j, k] = / 1 + dp[i - 1, j - 1, k - 1] if A[i] = B[j] = C[k]
               max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise

But I don t really understand this hint. So, I d be thankful if anybody could help me. Best regards, Mark

问题回答

LCS 问题在于 NPCompllete 是一个数量不限的字符串 N. 意思是, 没有已知的多元算法能够解决这个问题。 也意味着您可以放弃 DP 解决方案 : p

这里可以连接到一种通勤方法, 以接近多个字符串的 LCS 。

http://www.aaai.org/ocs/index.php/AAAI/AAI10/paper/download/1559/2197

您可以在 O( Nlog( N)) 时间( N是序列的合并长度)下做这项工作, 方法是用滚动散列进行类似二进制搜索 。

请注意,尽可能长的共同序列的长度是最小序列的长度,即 smallestLength

初始化 :

  • assume the length of the longest common subsequence (which we call a) is a = smallestLength/2

算法 :

  • iteration_number += 1
  • scan through all lists (in parallel if you want!) and perform a rolling hash; this will generate len(list)-(a-1) hashes for each list
  • insert all the hashes into a set data structure (one set per list) to achieve O(1) lookup time
  • check to see if any of the hashes collide (take the intersection of all the sets): if there are one or more collisions, manually confirm that there is an a-length subsequence common subsequence in those positions, since the hashes might be wrong (though this will never occur in practice if you choose a sufficiently fine hash)
  • did you find a shared sequence?
    • if you find such a sequence, repeat the above steps, but increase the assumed length like you would in binary search (add smallestlength/2**iteration_number)
    • if you don t find such a sequence, repeat the above steps, but decrease the assumed length like you would in a binary search (subtract smallestlength/2**iteration_number)

Python3的解决方案是:

#Uses python3
import sys
def calc_cache_pos(strings, indexes):
    factor = 1
    pos = 0
    for s, i in zip(strings, indexes):
        pos += i * factor
        factor *= len(s)
    return pos

def lcs_back(strings, indexes, cache):
    if -1 in indexes:
        return ""
    match = all(strings[0][indexes[0]] == s[i]
                for s, i in zip(strings, indexes))
    if match:
        new_indexes = [i - 1 for i in indexes]
        result = lcs_back(strings, new_indexes, cache) + strings[0][indexes[0]]
    else:
        substrings = [""] * len(strings)
        for n in range(len(strings)):
            if indexes[n] > 0:
                new_indexes = indexes[:]
                new_indexes[n] -= 1
                cache_pos = calc_cache_pos(strings, new_indexes)
                if cache[cache_pos] is None:
                    substrings[n] = lcs_back(strings, new_indexes, cache)
                else:
                    substrings[n] = cache[cache_pos]
        result = max(substrings, key=len)
    cache[calc_cache_pos(strings, indexes)] = result
    return result

def lcs(strings):
        if len(strings) == 0:
        return ""
    elif len(strings) == 1:
        return strings[0]
    else:
        cache_size = 1
        for s in strings:
            cache_size *= len(s)
        cache = [None] * cache_size
        indexes = [len(s) - 1 for s in strings]
        return (lcs_back(strings, indexes, cache))
if __name__ ==  __main__ :
    input = sys.stdin.read()
    data = list(map(int, input.split()))
    an = data[0]
    data = data[1:]
    a1 = data[:an]
    data = data[an:]
    bn = data[0]
    data = data[1:]
    b1 = data[:bn]
    data = data[bn:]
    cn = data[0]
    data = data[1:]
    c1 = data[:cn]
    a =   
    for i in a1:
        a = a + i
    b =   
    for i in b1:
        b = b + i
    c =   
    for i in c1:
        c = c + i
    print(lcs([a, b, c]))

这里读取 3+ 数组的输入, 每个字符组后面有一个空格。 < 坚固> 在每一数组输入数组大小之前, 每个数组输入 。 输入

< 强度 > INPUT :

8 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8

a b a a a c b b b b b b

6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6

d c a b a b a

6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6

b b a a c a a a

<% 1>/% 1 < profession > :

baa( baa)

仅供参考:

from difflib import SequenceMatcher


def lcs_of_2(a, b):
    """
    get longest common string
    :param a:
    :param b:
    :return:
    """
    match = SequenceMatcher(None, a, b).find_longest_match(0, len(a), 0, len(b))
    return a[match[0]: match[0] + match[2]]


def lcs_of_list(*args):
    """
    get longest common string of list
    :param args:
    :return:
    """
    if len(args) == 2:
        return lcs_of_2(args[0], args[1])
    first = args[0]
    remains = args[1:]
    return lcs_of_2(first, lcs_of_list(*remains))


if __name__ ==  __main__ :
    a =  abcdef 
    b =  abbbbabcdeffff 
    c =  bcdefff 
    print(lcs_of_list(a, b, c))

结果:

bcdef




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签