English 中文(简体)
利用双向操作员进行快速搜索
原标题:Fast string search using bitwise operators
问题回答

如果你只找一个档案,那么你就会得到很多保障,以防感染。 你们都需要使用大的缓冲(~16K)和<代码>strstrstr(。 如果该档案被编码为Acii, 仅用于>gcagctgaaaca"。 如果实际编码在借方中;简单地估计可能接受的护卫(代号应为8;排在第一旁),并使用<条码>m(<>>>+微重重叠的比照。

我在此请注意,glibc strmemmem已使用Knuth-Morris-Pratt在直线时间进行搜索,从而测试这种性能。 这可能令你感到惊讶。

如果你先用无损编码方法(如Huffman,指数Golumb等)表示脱氧核糖核酸,那么你就获得核素各种组合(例如AAA>>>/code>等的脱氧核酸概率表(“编码树”)。

这意味着,一旦你宣布DNA:

  1. You ll probably be using fewer bits to store GCAGCTGAAAACA and other subsequences, than the "unencoded" approach of always using two bits per base.
  2. You can walk through the coding tree or table to build an encoded search string, which will usually be shorter than the unencoded search string.
  3. You can apply the same family of exact search algorithms (e.g. Boyer-Moore) to locate this shorter, encoded search string.

关于平行办法,将编码目标分为Nchunks,并使用缩短的编码搜索线,对每个丘克进行搜索算法。 通过跟踪每个科室的比额,你应能产生相应的职位。

总的来说,如果你计划用数百万计的顺序数据进行突变的搜索,这种压缩办法将是有益的。 你们的搜索点较少,总起来可能更少。

Boyer-More是一种在便衣中寻找替代物的技术。 基本想法是,如果你的指示是10个特点,那么你可以看看一下第9号立场的特性。 如果这种性质不属于你搜查范围,你就可以在这种性质之后开始搜查。 (如果这种特性确实是你所描述的,Boyer-More算法使用一个审查表来掌握最佳特性。)

也许可以把这一想法重新用于你包装的基因组群的表述。 毕竟,只有256个不同的tes子,因此你可以安全地预先计算出ski。

将字母缩入比区的好处是契约性:一种是按四级分类的。 这与谷歌一些优化的文字搜索类似。

这表明有4起平行处决,每一次(译文)搜捕被一种特性(两条轨道)所抵消。 速效办法可能只是看看看一线或第二旁边的搜身,然后在与其他扼杀相匹配之前和之后检查外衣,必要时掩盖目的。 第一次查询由x86指令scasb直接进行。 之后的配对可建立在<代码>cmpb的登记值基础上。

You could create a state machine. In this topic, Fast algorithm to extract thousands of simple patterns out of large amounts of text , I used [f]lex to create the state machine for me. It would require some hackery to use the 4 letter ( := two bit) alphabet, but it can be done using the same tables as generated by [f]lex. (you could even create your own fgetc() like function which extracts two bits at a time from the input stream, and keeps the other six bits for consecutive calls. Pushback will be a bit harder, but not undoable).

BTW:我严重怀疑,在将数据压缩到每名核杀手的两个轨道上是否有任何好处,但这是一个不同的问题。

奥凯,鉴于你的参数,问题就在于很难,而不只是你处理传统扼杀性搜索问题。 更类似于数据库表格中的问题,因为表格比援助团要大得多。

  • aka buzhash。 如果你有数十亿人,你会重新寻找具有64美分的散射。

  • 根据每127个部分的搜索线,建立一个洗衣桌。 记忆表只需要储存(hash,string-id),而不是全方位。

  • 检查你的大型目标,计算滚动 has,在桌上研究 has的每一价值。 每当有配对时,便写上(直截了当、目标排出)的双向,可能还有一卷。

  • 重新阅读你的目标舱位和空气流,根据需要装上搜索载体,将其与每个被抵消的目标相比较。

I am assuming that loading all pattern strings into memory at once is prohibitive. There are ways to segment the hash table into something that is larger than RAM but not a traditional random-access hash file; if you re interested, search for "hybrid hash" and "grace hash", which are more common in the database world.

我不知道这段话是否值得你,但你的双流使你对管理各种模式的剧目提供了完美的预测性投入。 Belady s/2004/41。





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签