English 中文(简体)
gcc/g++: error when compiling large file
原标题:

I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed.

When I try to compile this file, g++ exits and says that it couldn t reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches

--param ggc-min-expand=0 --param ggc-min-heapsize=4096

may solve the problem. They, however, only seem to work when optimization is turned on.

1) Is this really the solution that I am looking for?

2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this?

Best wishes,

Alexander

Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results)

The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++.

However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

问题回答

You may be better off using a constant data table instead. For example, instead of doing this:

void f() {
    a.push_back("one");
    a.push_back("two");
    a.push_back("three");
    // ...
}

try doing this:

const char *data[] = {
    "one",
    "two",
    "three",
    // ...
};

void f() {
    for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++) {
        a.push_back(data[i]);
    }
}

The compiler will likely be much more efficient generating a large constant data table, rather than huge functions containing many push_back() calls.

Can you do the same problem without generating 40 MB worth of C++? That s more than some operating systems I ve used. A loop and some data files, perhaps?

It sounds like your autogenerated app looks like this:

push_back(data00001);
...
push_back(data99999);

Why don t you put the data into an external file and let the program read this data in a loop?

If you re just generating a punch of calls to push_back() in a row, you can refactor it into something like this:

// Old code:
v.push_back("foo");
v.push_back("bar");
v.push_back("baz");

// Change that to this:
{
    static const char *stuff[] = {"foo", "bar", "baz"};
    v.insert(v.end(), stuff, stuff + ARRAYCOUNT(stuff));
}

Where ARRAYCOUNT is a macro defined as follows:

#define ARRAYCOUNT(a) (sizeof(a) / sizeof(a[0]))

The extra level of braces is just to avoid name conflicts if you have many such blocks; alternatively, you can just generate a new unique name for the stuff placeholder.

If that still doesn t work, I suggest breaking your source file up into many smaller source files. That s easy if you have many separate functions; if you have one enormous function, you ll have to work a little harder, but it s still very doable.

To complement some of the answers here, you may be better off generating a binary object file and linking it directly -- as opposed to compiling files consisting of const char[] s.

I had a similar problem working with gcc lately. (Around 60 MB of PNG data split into some 100 header files.) Including them all is the worst option: The amount of memory needed seems to grow exponentially with the size of the compilation unit.

if you cannot refactor your code, you could try to increment amount of swap space you have, provided your operating system supports large address space. This should work for 64-bit computers, but 3 gigabytes might be too much for 32 bit system.





相关问题
编辑大案

在我可以预见到需要编辑的大量档案(主要是固定文本档案,但可以是CSV、固定-width、XML,......迄今为止)。 我需要发展......。

How can I quickly parse large (>10GB) files?

I have to process text files 10-20GB in size of the format: field1 field2 field3 field4 field5 I would like to parse the data from each line of field2 into one of several files; the file this gets ...

gcc/g++: error when compiling large file

I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed. When I try to compile this file, g+...

Is git worth for managing many files bigger than 500MB

I would put under version control a big amount of data, i.e. a directory structure (with depth<=5) with hundreds files with size about 500Mb). The things I need is a system that help me: - to ...

热门标签