English 中文(简体)
Parsing Twitter feeds in C
原标题:

I m trying to figure out how to get the most recent latitude and longitude of a Twitter user (from the new Geo API data, ie the <geo:point> tag, you can see how they look like on my twitter user timeline xml feed). I also need to retrieve how old that data is (in seconds) from the <created_at> tag.

I m trying to write this in C to use with an mbed microcontroller so I can t use any big libraries (ideally I wouldn t use any libraries, but that might be a bad idea). The mbed site suggests a few light libraries - YAJL and FastXML seem useful - but my C knowledge is very basic and I m unsure as to how to proceed.

Assuming I have the code for retrieving a twitter user timeline into memory as a string and/or to disk (as either JSON or XML) how should I proceed?

At the moment I m doing this scraping on my webserver via PHP, but I d rather have it done in C as I hope to release the code when I m done (and I don t want my poor server being rammed!) The PHP looks like this:

<?php
date_default_timezone_set( UTC );
try {
  $tweets = json_decode(file_get_contents("http://twitter.com/statuses/user_timeline.json?screen_name=".urlencode($_GET[ screenname ])));
  foreach($tweets as $tweet) {
    if (is_array($tweet->geo->coordinates)) {
      echo date("U") - strtotime($tweet->created_at);
      echo ",{$tweet->geo->coordinates[0]},{$tweet->geo->coordinates[1]}";
      break;
    }
  }
} catch (Exception $e) {
  exit();
}

This works fairly well, but I have no idea how to turn this into C! Any ideas?

Here s a snippet of the XML I m expecting to deal with:

<statuses type="array">
 <status>
  <created_at>Sat Dec 12 22:25:17 +0000 2009</created_at>
  <id>6611101548</id>
  <text>Hello stackoverflow! This tweet is geotagged.</text>
  <other tags/>
  <geo>
   <georss:point>52.946972 -1.182846</georss:point>
  </geo>
 </status>
 <status ...>
</statuses>

(btw, the mbed is awesome, I m having an amazing time with it despite my lack of advanced knowledge in C or electronics, they re in stock at Farnell for £32 and definitely worth the money!)

最佳回答

Assuming you have all of the feed in memory, I would write a very crude, and simple, parser.

First, I d write a high level tokenizer. This tokenizer would return two types of tokens: XML Tags and Other.

So, if you had as a XML source:

<tag arg="stuff">
    <tag2>data</tag2>
</tag>

That would return "<tag arg="stuff">" as the first token, "
    " (note newline) in the second token, "<tag2>" in the third, "data" in the forth.

Something like this:

char *p = bufPtr;
char *start = p;
char *token;
char target;

if (*p ==  < ) {
    // found the start of a tag, lets look for the end
    target =  > ;
} else {
    // not in a tag, so we ll search for one
    target =  < ;
}
p++;
while (*p != target) {
    p++;
}
int length = p - start;
result = malloc(length + 1);
memcpy(result, start, length);
*(token + length) =   ; // terminate result string
bufPtr = p; // advance for the next token

(caveat, my C is rusty, there may well be some one off errors in here, but the gist is good.)

Now that I m getting these meta chunks of the XML, it s straightforward.

I just scan tokens until I see one that starts with your geo tag. Once you see this, you "know" the next token is your lat/long data. Grab that, parse it (perhaps with sscanf), to get your values.

What this does is effectively flatten you XML space. You don t really care how deep the tag is, and you really don t care it s well formed, or anything. You re pretty much assuming it s well formed and conforming.

Of the top of my head, I don t know if XML allows the < or > chars within a quoted tag attribute, but even if it does allow it, odds are good that this SPECIFIC XML does not, so it ll work. Otherwise you ll need to parse quoted stuff (not that much harder, but...).

Is this robust? Hell no. Very GIGO sensitive. But a simple check to make sure you don t run off the buffers end should save you there.

问题回答

暂无回答




相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...

热门标签