C doesn t have built-in regular expressions, though libraries are available: http://www.arglist.com/regex/, http://www.pcre.org/ are the two I see most often.
For a task this simple, you can easily get away without using regexes though. Provided the lines are all less than some maximum length MAXLEN
, just process them one line at a time:
char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0; /* 0: Haven t seen GET yet; 1: haven t seen Host yet */
FILE *f = fopen("my_input_file", "rb");
if (!f) {
report_error_somehow();
}
while (fgets(buf, sizeof buf, f)) {
/* Strip trailing
and
*/
int len = strlen(buf);
if (len >= 2 && buf[len - 1] ==
&& buf[len - 2] ==
) {
buf[len - 2] = 0;
} else {
if (feof(f)) {
/* Last line was not
-terminated: probably OK to ignore */
} else {
/* Either the line was too long, or ends with
but not
. */
report_error_somehow();
}
}
if (state == 0 && !memcmp(buf, "GET ", 4)) {
strcpy(url, buf + 4); /* We know url[] is big enough */
++state;
} else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
strcpy(host, buf + 6); /* We know host[] is big enough */
break;
}
}
fclose(f);
This solution doesn t require buffering the entire file in memory as KennyTM s answer does (though that is fine by the way if you know the files are small). Notice that we use fgets()
instead of the unsafe gets()
, which is prone to overflow buffers on long lines.