I have not done this particular thing, but I do do a lot of work with parsing captured packets in C/C++. I don t know if there exist Java libraries for any of this.
Essentially, you need to work your way up the protocol stack, starting with IP. The pcap data starts with the link-level header, but I don t think there s much in it that you re concerned about, other than ignoring non-IP packets.
The trickiest thing with IP is reassembling fragmented datagrams. This is done using the More Fragments bit in the Flags field and the Fragment Offset field, combined with the Identification field to distinguish fragments from different datagrams Then you use the Protocol field to identify TCP and UDP packets, and the Header Length field to find the start of the corresponding header.
The next step, for both TCP and UDP, is demultiplexing, separating out the various connections in the captured packet stream. Both protocols identify connections (well, UDP doesn t have connections per se, but I don t have a better word handy) by the 4-tuple of the source and destination IP address and the source and destination port, so a connection would be a sequence of packets that matches on all 4 of these values.
Once that s done, for UDP, you re just about finished, unless you want to check the checksum. The Length field in the UDP header tells you how long the packet is; subtract 8 bytes for the header and there s your data.
TCP is somewhat more complicated, as you do indeed have to reassemble the stream, This is done using the sequence number in the header, combined with the length. The sum of these two tells you the next sequence number in the stream. Remember that you re keeping track of the traffic in two directions.
(This is a lot easier than writing an actual TCP implementation, as then you have to implement the Nagle algorithm and other minutiae.)
There s a lot of information on the net about the header formats; google "IP header" for starters. A network analyzer like Wireshark is indispensable for this work, as it will show you how your captured data is supposed to look. Indeed, as Wireshark is open source, you can probably find out a lot by looking at how it does things