English 中文(简体)
Understanding the `ctags -e` file format (ctags for emacs)
原标题:

I am using "ExuberantCtags" also known as "ctags -e", also known as just "etags"

and I am trying to understand the TAGS file format which is generated by the etags command, in particular I want to understand line #2 of the TAGS file.

Wikipedia says that line #2 is described like this:

{src_file},{size_of_tag_definition_data_in_bytes}

In practical terms though TAGS file line:2 for "foo.c" looks like this

foo.c,1683

My quandary is how exactly does it find this number: 1683

I know it is the size of the "tag_definition" so what I want to know is what is the "tag_definition"?

I have tried looking through the ctags source code, but perhaps someone better at C than me will have more success figuring this out.

Thanks!

EDIT #2:

^L^J
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
float bar () {^?bar^A7,59^J
int main() {^?main^A11,91^J

Alright, so if I understand correctly, "79" refers to the number of bytes in the TAGS file from after 79 down to and including "91^J".

Makes perfect sense.

Now the numbers 20, 59, 91 in this example wikipedia says refer to the {byte_offset}

What is the {byte_offset} offset from?

Thanks for all the help Ken!

最佳回答

It s the number of bytes of tag data following the newline after the number.

Edit: It also doesn t include the ^L character between file tag data. Remember etags comes from a time long ago where reading a 500KB file was an expensive operation. ;)

Here s a complete tags file. I m showing it two ways, the first with control characters as ^X and no invisible characters. The end-of-line characters implicit in your example are ^J here:

^L^J
hello.cc,45^J
int main(^?5,41^J
int foo(^?9,92^J
int bar(^?13,121^J
^L^J
hello.h,15^J
#define X ^?2,1^J

Here s the same file displayed in hex:

0000000    0c  0a  68  65  6c  6c  6f  2e  63  63  2c  34  35  0a  69  6e
          ff  nl   h   e   l   l   o   .   c   c   ,   4   5  nl   i   n
0000020    74  20  6d  61  69  6e  28  7f  35  2c  34  31  0a  69  6e  74
           t  sp   m   a   i   n   ( del   5   ,   4   1  nl   i   n   t
0000040    20  66  6f  6f  28  7f  39  2c  39  32  0a  69  6e  74  20  62
          sp   f   o   o   ( del   9   ,   9   2  nl   i   n   t  sp   b
0000060    61  72  28  7f  31  33  2c  31  32  31  0a  0c  0a  68  65  6c
           a   r   ( del   1   3   ,   1   2   1  nl  ff  nl   h   e   l
0000100    6c  6f  2e  68  2c  31  35  0a  23  64  65  66  69  6e  65  20
           l   o   .   h   ,   1   5  nl   #   d   e   f   i   n   e  sp
0000120    58  20  7f  32  2c  31  0a                                    
           X  sp del   2   ,   1  nl

There are two sets of tag data in this example: 45 bytes of data for hello.cc and 15 bytes for hello.h.

The hello.cc data starts on the line following "hello.cc,45^J" and runs for 45 bytes--this also happens to be complete lines. The reason why bytes are given is so code reading the file can just allocate room for a 45 byte string and read 45 bytes. The "^L^J" line is after the 45 bytes of tag data. You use this as a marker that there are more files remaining and also to verify that the file is properly formatted.

The hello.h data starts on the line following "hello.h,15^J" and runs for 15 bytes.

问题回答

The {byte_offset} for a tag entry is the number of bytes from the start of the file the function is defined in. The number before the byte offset is the line number. In your example:

hello.c,79^J
float foo (float x) {^?foo^A3,20^J

the foo function begins 20 bytes from the start of hello.c. You can verify that with a text editor that shows your cursor position in the file. You can also use the Unix tail command to display a file a number of bytes in:

tail -c +20 hello.c




相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...

热门标签