English 中文(简体)
How to get the sub-string lying in between two sub-strings in C?
原标题:

I have a packet capture code that writes http payload into a file. Now i want to extract the URL information from these dumps. For each packet , the payload begins like this.

GET /intl/en_com/images/logo_plain.png HTTP/1.1..Host: www.google.co.in..User-Agent: Mozilla/5.0

I would like to extract :

  1. the string between "GET" and "HTTP/1.1"
  2. the string between "Host:" and "User-Agent"

How to do this in C ? Are there any inbuilt string functions ? Or Regular expressions ?

最佳回答

C doesn t have built-in regular expressions, though libraries are available: http://www.arglist.com/regex/, http://www.pcre.org/ are the two I see most often.

For a task this simple, you can easily get away without using regexes though. Provided the lines are all less than some maximum length MAXLEN, just process them one line at a time:

char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0;      /* 0: Haven t seen GET yet; 1: haven t seen Host yet */
FILE *f = fopen("my_input_file", "rb");

if (!f) {
    report_error_somehow();
}

while (fgets(buf, sizeof buf, f)) {
    /* Strip trailing 
 and 
 */
    int len = strlen(buf);
    if (len >= 2 && buf[len - 1] ==  
  && buf[len - 2] ==  
 ) {
        buf[len - 2] = 0;
    } else {
        if (feof(f)) {
            /* Last line was not 
-terminated: probably OK to ignore */
        } else {
            /* Either the line was too long, or ends with 
 but not 
. */
            report_error_somehow();
        }
    }

    if (state == 0 && !memcmp(buf, "GET ", 4)) {
        strcpy(url, buf + 4);    /* We know url[] is big enough */
        ++state;
    } else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
        strcpy(host, buf + 6);   /* We know host[] is big enough */
        break;
    }
}

fclose(f);

This solution doesn t require buffering the entire file in memory as KennyTM s answer does (though that is fine by the way if you know the files are small). Notice that we use fgets() instead of the unsafe gets(), which is prone to overflow buffers on long lines.

问题回答

Look for the location of using strchr (or strstr). Since the strings GET and HTTP/1.1 and Host: are of fixed length, the index and location of the path in between can be extracted easily.


If you want to use regular expressions, on POSIX-compliant systems there is regcomp(3), but that s also quite hard to use.





相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签