English 中文(简体)
多种短期规则模式匹配法
原标题:Multiple short rules pattern matching algorithm

随着标题的推移,我们愿就与以下制约因素相匹配的模式的最快算法提出一些建议:

Long dictionary: 256

Short but not fixed length rules (from 1 to 3 or 4 bytes depth at most)

小型(150) 规则数目(如果是三条)或中度(~1K)

Snort或AC-DFA-Split使用过的AC-DFA-Split,比Snort使用的现有AC-DFA性能好

Software based (recent COTS systems like E3 of E5) Ideally would like to employ some SIMD / SSE stuff due to the fact that currently they are 128 bit wide and in near future they will be 256 in opposition to CPU s 64

We started this project by prefiltering Snort AC with algorithm shown on Sigmatch paper but sadly the results have not been that impressive (~12% improvement when compiling with GCC but none with ICC)

之后,我们试图通过IPP图书馆利用SSE 4.2中的新模式配对能力,但根本无法取得任何业绩(在机器编码中直接这样做的猜测会更好,但肯定会更加复杂)

因此回到原来的思路。 现在,我们正按照理事会理事会议路线开展工作,但是我们知道,除非我们取代拟议的中方协进会,否则将很难改善业绩,但至少能够支持更多的规则,而没有显著的业绩下降。

我们知道,使用比照平行主义思想,对长期模式使用大量记忆,但问题范围被长期减少到3或4个,从而使其成为一种可行的替代办法。

我们特别发现Nedtries,但想知道你的想法是什么,或者是否有更好的替代办法。

理想的情况是,源代码将设在C,并持有开放源许可证。


IMHO,我们的想法是,在应对不同规模时,寻找一个逐个移动的物品,但通过利用最平行的方法,利用SIMD / SSE,并尽可能减少分支。

我不知道这样做是明智的还是明智的。


回到适当的关键板上:D

实质上,大多数算法没有正确利用目前的硬件能力或限制。 他们非常缺乏能力,非常有分支,不说他们打算利用现存的COTS CPU的能力,使你能够有一定程度的瘫痪(SIMD, SSE, ......)

This is preciselly what we are seeking for, an algorithm (or an implementation of an already existing algorithm) that properly considers all that, with the advantag of not trying to cover all rule lengths, just short ones

例如,我看到了一些关于《国家财政协议》的文件,其中警告说,由于适当的切身效率、增强瘫痪状态等原因,这些年来,他们的业绩可能与《国家财政协议》相配,记忆要求要少得多。

问题回答

Please take a look at: http://www.slideshare.net/bouma2 Support of 1 and 2 bytes is similar to what Baxter wrote above. Nevertheless, it would help if you could provide the number of single-byte and double-byte strings you expect to be in the DB, and the kind of traffic you are expecting to process (Internet, corporate etc.) - after all, too many single-byte strings may end up in a match for every byte. The idea of Bouma2 is to allow the incorporation of occurrence statistics into the preprocessing stage, thereby reducing the false-positives rate.

如同你一样,它已经采用了与业绩挂钩的模式。 除非您有某种新算法,或能够指出数据或你的规则中的一些统计偏差,否则,其将难以加速原始算法。

您可考虑将特性的pairs作为模式对应要素处理。 这将使国家机器的分量因素变得巨大,但你大概不关心援助团。 这可能要购买两个因素。

在摆脱蒸汽算法时,人们往往在组装中采用仔细手法,包括cl断使用特别安全指令。 无论发现哪里,处理独一无二的顺序可能有所助益的trick计是,对各项要素进行一系列比较,并通过和(或)而不是有条件的分行形成一种 b的结果,因为分行费用昂贵。 在这里,特别安全局的指示可能会有帮助,尽管它们的一致要求可能迫使你复制4或8次。

如果你在寻找时有很长的路要走,你可以分发一系列规则,以保存CPU(校对)。 分割规则可能是trick弄的。





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签