English 中文(简体)
Why does my finite state machine take so long to execute?
原标题:

I m working on a state machine which is supposed to extract function calls of the form

/* I am a comment */
//I am a comment
pref("this.is.a.string.which"can have QUOTES"", 123456);

where the extracted data would be pref("this.is.a.string.which"can have QUOTES"", 123456); from a file. Currently, to process a 41kb file, this process is taking close to a minute and a half. Is there something I m seriously misunderstanding here about this finite state machine?

#include <boost/algorithm/string.hpp>
std::vector<std::string> Foo()
{
    std::string fileData;
    //Fill filedata with the contents of a file
    std::vector<std::string> results;
    std::string::iterator begin = fileData.begin();
    std::string::iterator end = fileData.end();
    std::string::iterator stateZeroFoundLocation = fileData.begin();
    std::size_t state = 0;
    for(; begin < end; begin++)
    {
        switch (state)
        {
        case 0:
            if (boost::starts_with(boost::make_iterator_range(begin, end), "pref(")) {
                stateZeroFoundLocation = begin;
                begin += 4;
                state = 2;
            } else if (*begin ==  / )
                state = 1;
            break;
        case 1:
            state = 0;
            switch (*begin)
            {
            case  * :
                begin = boost::find_first(boost::make_iterator_range(begin, end), "*/").end();
                break;
            case  / :
                begin = std::find(begin, end, L 
 );
            }
            break;
        case 2:
            if (*begin ==  " )
                state = 3;
            break;
        case 3:
            switch(*begin)
            {
            case  \ :
                state = 4;
                break;
            case  " :
                state = 5;
            }
            break;
        case 4:
            state = 3;
            break;
        case 5:
            if (*begin ==  , )
                state = 6;
            break;
        case 6:
            if (*begin !=    )
                state = 7;
            break;
        case 7:
            switch(*begin)
            {
            case  " :
                state = 8;
                break;
            default:
                state = 10;
                break;
            }
            break;
        case 8:
            switch(*begin)
            {
            case  \ :
                state = 9;
                break;
            case  " :
                state = 10;
            }
            break;
        case 9:
            state = 8;
            break;
        case 10:
            if (*begin ==  ) )
                state = 11;
            break;
        case 11:
            if (*begin ==  ; )
                state = 12;
            break;
        case 12:
            state = 0;
            results.push_back(std::string(stateZeroFoundLocation, begin));
        };
    }
    return results;
}

Billy3

EDIT: Well this is one of the strangest things I ve ever seen. I just rebuilt the project and it s running reasonably again. Odd.

最佳回答

Unless your 41 kb file is mostly comments or prefs, it s going to spend most of its time in state 0. And for each character in state 0, you make a minimum of two function calls.

if (boost::starts_with(boost::make_iterator_range(begin, end), "pref(")) {

You can speed this up by pre-testing to see if the current character is p

if (*begin ==  p  && boost::starts_with(boost::make_iterator_range(begin, end), "pref(")) {

If the character isn t p then there is no need to make any function calls. In particular not creating a iterator, which is probably where the time is being spent.

问题回答

I don t know if this is part of the problem, but you have a typo in case 0, "perf" is misspelled as "pref".

Well it s hard to say just by looking at it...but I m guessing the find algorithms are doing it. Why are you searching within a FSM? By definition you re supposed to giving them one character at a time....Add more states. Also try making results a list, not a vector. A lot of copying going on with

vector<string>

But mostly: Profile it!

Finite state machines are a workable solution, but for text processing, its best to use a highly optimized finite state machine generator. In this case, a regular expression. Here it is as Perl regex:

# first clean the comments
$source =~ s|//.*$||;      # replace "// till end of line" with nothing
$source =~ s|/*.*?*/||s; # replace "/* any text until */" with nothing
                           # depending on your data, you may need a few other
                           # rules here to avoid blanking data, you could replace
                           # the comments with a unique identifier, and then
                           # expand any identifiers that the regex below returns

# then find your data
while ($source =~ /perf(s*"(.+?)",s*(d+)s*);/g) { 
   # matches your function signature and moves along source
   # do something with the captured groups, in this case $1 and $2
}

Since most regex libraries are Perl compatible, it shouldn t be hard to translate the syntax. If your search becomes more complicated, then a parser is in order.

If you are doing parsing, why not use a parser library.

I typically have Boost.Spirit.Qi in mind.

  • You express your grammar with EBNF-like expressions which certainly makes it easier for maintenance.
  • It s a header only library, so you don t have any problem of bringing a whole binary in the mix.

While I can appreciate the minimalist approach, I am afraid that your idea of coding a finite state machine on your own is not that wise. It works well with a toy example, but as the requirements add up you ll have one monstrous switch and it s going to become more and more complicated to understand what s going on.

And please, don t tell me you know it s not going to evolve: I don t believe in oracles ;)





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签