English 中文(简体)
Static function-scoped pointers and memory leaks
原标题:

I ve written a simple library file with a function for reading lines from a file of any size. The function is called by passing in a stack-allocated buffer and size, but if the line is too big, a special heap-allocated buffer is initialized and used to pass back a larger line.

This heap-allocated buffer is function-scoped and declared static, initialized to NULL at the beginning of course. I ve written in some checks at the beginning of the function, to check if the heap buffer is non-null; if this is the case, then the previous line read was too long. Naturally, I free the heap buffer and set it back to NULL, thinking that the next read will likely only need to fill the stack-allocated buffer (it should be very rare to see lines over 1MB long, even in our application!).

I ve gone over the code and tested it fairly thoroughly, both by reading it carefully and by running a few tests. I am reasonably confident that the following invariant is maintained:

  • The heap buffer will be null (and will not leak any memory) on function return if the stack buffer is all that is needed.
  • If the heap buffer is not null, because it was needed, it will be freed on the next function call (and possibly reused if needed on that next line).

But I ve thought of a potential problem: If the last line in a file is too long, then since the function is presumably not called again, I m not sure I have any way to free the heap buffer-- it is function-scoped, after all.

So my question is, how do I go about freeing dynamically allocated memory in a function-scoped static pointer, ideally without calling the function again? (And ideally without making it a global variable, either!)

Code available on request. (I just haven t got access now, sorry. And I m hoping the question is sufficiently general and well-explained for it not to be needed, but by all means feel free to disabuse me of that notion!)


EDIT: I feel I should add a couple of notes about the usage of the function.

This particular function is used in the form of lines being read serially from a file, and then immediately copied into POD structs, one line per struct. Those are created on the heap as the file is read, and each one of those structs has a char pointer containing (a cleaned up version of) a line from the file. In order for these to persist, a copy already has to occur. (That was one of the big counterarguments brought up in many of the answers-- oh no, the line needs to be COPIED, oh dearie me).

As for multithreading, as I said this is designed to be used serially. No, it isn t thread safe, but I don t care.

Thanks for the multitude of responses, though! I ll read them more thoroughly when I get time. Currently, I m leaning towards either passing an extra pointer around or redesigning the function so that when fgets shows EOF, then I might just build the freeing logic there instead and the user hopefully won t need to worry about it.

最佳回答

If you can change the function, I would recommend changing the function interface itself. I know you have spent a lot of time debugging and testing it, but there are a few problems with your current implementation:

  • it is not thread-safe,
  • the user has no control over the data, so he must copy it if he needs it later, most likely in a buffer that s going to be malloc()ed, thus nullifying any advantage you got by the selective use of malloc() in your function,
  • most importantly, as you have discovered, a special action has to be taken by the user for a long last line.

Your users should not be worried by the implementation oddity of your function, they should be able to "just use it".

Unless you are doing it for educational purposes, I would recommend looking at this page, which has one implementation of "reading an arbitrarily long line from a stream", and links to other such implementations (each implementation is slightly different from the others, so you should be able to find one that you like).

Based upon your edit, MT-safe is not a requirement, and a copy is always going to happen. So, the most obvious design is one of the two:

  • Let the user supply a char **, which points to a buffer that your function will allocate, using a combination of malloc() and realloc() (if needed). It is the user s responsibility to free() it when done. That way, the user doesn t have to copy the data again, since he can pass a pointer to wherever the final destination of the data is.
  • return a char * that is allocated by your function. Again, it s the user s responsibility to free() it.

Both are pretty much equivalent.

For your current implementation, you can always return "not end of file" if the last line is very long, and doesn t end in a newline. Then, the user is going to call your function again, and then you can free your buffer. Personally, I would be happier with a function that allows me to read as many lines as I want, and not force me to go to the end of file.

问题回答

Aside from the difficulty of freeing that dynamically allocated buffer, there is another potential problem. It is not thread safe. Since it is a library function, then there is always the possibility that it will be used in a multi-threaded environment in the future.

It would probably be better to require the calling function to free the buffer via a related library function.

That could still be okay if you use the standard technique to indicate end-of-file (i.e have you read-line function return NULL).

What happens in this case is that after the final line is read, one more call to your read-line function will be needed so that it can return NULL to indicate that the end of file has been reached. In this last call, you can then free you buffer.

Two choices that occur immediately:

  1. Make the pointer to the heap-allocated buffer static but file scoped. Add a (static) function that checks if it is not null and if it is not null free()s it. Call atexit(free_func) at the start of the program, where free_func is the static function. You can have some global setup routine (caled by main()) where this is done.

  2. Don t worry about it; heap-allocated memory is released by the OS when your process exits, and the memory leak is not cumulative, so even if your program has a long life it won t raise an OOM exception (unless you have some other bug).

I assume your app is NOT multithreaded; in this case, you should not use a static buffer at all, or you should use thread-local data.

The interface you have chosen makes this an unsolvable problem:

  • The client must not know if the return value points to static or dynamic memory.

  • The return value must point to memory that outlives the call.

  • Any call might be the last.

I m not sure why you are troubled by this leak. After all, if the client reads a very long line, does something with the line, then does a ton of computation and allocation before reading the next line, you still have a big hunk of memory sitting around unused, clogging up the system. If this OK with you (arbitrary computation takes place before memory is reclaimed), you could just fess up that you re willing to retain dead memory indefinitely.

If you can t live with the leak, the simplest thing to do is to widen the interface so that the client can notify your function when the client is done with the memory. (Right now the contract with the client says that the client owns the memory until it calls your function again, at which point ownership reverts to your function.) Of course, to change the interface means either

  • adding a new function, which would require you to promote your pointer to be static but local to the compilation unit, or

  • adding some argument to the existing function (or overloading an argument) so that you have a call which means "I am done with your memory now, but I don t want another line".

A more radical change would be to rewrite the function to use dynamically allocated memory throughout its lifetime, gradually enlarging the block as needed until it is as large as the largest block ever read (or perhaps rounded up to the next power of two). Depending on actual cases this strategy may consume less address space than keeping a big static buffer.

In any case I m not convinced you should be worrying about this corner case. If you think this case matters, please edit your question to show us the evidence.

Instead of function scope, give it module scope (i.e. at file scope, but static, so it s not visible outside that file. Add a small function that frees the buffer, and use atexit() to assure that s called before the program exits. Alternative, don t worry about it -- a leak that happens only once, and is freed automatically as the program exits isn t particularly harmful.

I feel obliged to say that the design sounds to me like a recipe for disaster though. When you free the buffer, there s virtually no way to even guess whether it might still be in use. The user (apparently) has to keep track of where the data was returned, and copy the data to a new buffer if (and only if) you allocated one dynamically. In a multi-threading environment, you need to make the internal pointer thread-local to have any chance of working correctly at all. To the user, the function might do one of two entirely different things -- either return a buffer that s owned by the user, OR return a buffer that s owned by the function, and can only be used safely by allocating another buffer, and copying the data into the other buffer before the function is called again.

There s a few hacks I can think of, although both require moving the static declaration out of the function. I can t imagine why that would be a problem.

Using a GCC extension,

static char *buffer;
void use_buffer(size_t n) {
    buffer = realloc(buffer, n);
}
void cleanup_buffer() __attribute__((destructor)) {
    free(buffer);
}

Using C++,

static char *buffer;
static class buffer_guard {
    ~buffer_guard() { free(buffer); }
} my_buffer_guard;

In any case, I don t really like the design. In C, usually the caller is responsible for allocating/freeing memory that it needs to use, even if it s filled in by a callee.

BTW, compare with Glibc s nonstandard getline. It never uses static memory.

I was just going to comment below Mark s answer, but it may feel a little bit cramped. Still, this answer is in essence a comment on his answer, which I find very good in addition to being quick :).

Not only is your function not MT-safe, but even without threads, the interface to use it correctly is complicated. The caller must have finished with the previous result before calling the function again. If this code is still in use two years from now, someone will scratch his head trying to use it right... or worse, use it wrong without even thinking about it. That person could even be you...

Mark s suggestion (requiring the caller to free the buffer) is IMHO the most reasonable. But perhaps you don t trust malloc and free not to cause fragmentation in the long run, or have some other reason to prefer the static buffer solution. In this case you can keep the static buffer for ordinary-length lines, define a boolean flag that indicates if the static buffer is currently busy, and document that the following function (and not free) should be called with the address of the buffer when the caller no longer uses it:

char static_buffer[512];
int buffer_busy;

void free_buffer(char *p)
{
  if (p == static_buffer)
  {
     assert(buffer_busy);
     buffer_busy=0;
  }
  else free(p);
}

char *get_line(...)
{
  char *result;
  if (..short line..)
  {
     result = static_buffer;
     assert(!buffer_busy);
     buffer_busy=1;
  }
  else result = malloc(...);
  ...
  return result;
}

The only circumstances in which the assertions will trigger are circumstances in which your previous implementation would have silently gone wrong, and the overhead is very low compared to your existing solution (only toggling the flag, and asking the caller to call free_buffer when he s finished, which is cleaner). If the assertion in get_line in particular triggers, it means you needed dynamic allocation after all, because the caller could not be finished with a buffer at the time he was asking for another.

Note: this is still not MT-safe.





相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...

热门标签