English 中文(简体)
Why does COW mmap fail with ENOMEM on (sparse) files larger than 4GB?
原标题:

This happens on a 2.6.26-2-amd64 Linux kernel when trying to mmap a 5GB file with copy-on-write semantics ( PROT_READ | PROT_WRITE and MAP_PRIVATE). Mapping files smaller than 4GB or using only PROT_READ works fine. This is not a soft resource limit issue as reported in this question; the virtual limit size is unlimited.

Here is the code that reproduces the problem (the actual code is part of Boost.Interprocess).

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <fcntl.h>
#include <unistd.h>

main()
{
        struct stat b;
        void *base;
        int fd = open("foo.bin", O_RDWR);

        fstat(fd, &b);
        base = mmap(0, b.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
        if (base == MAP_FAILED) {
                perror("mmap");
                return 1;
        }
        return 0;
}

and here is what happens:

dd if=/dev/zero of=foo.bin bs=1M seek=5000 count=1
./test-mmap
mmap: Cannot allocate memory

Here is the relevant strace (freshly compiled 4.5.20) output, as asked by nos.

open("foo.bin", O_RDWR)                 = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5243928576, ...}) = 0
mmap(NULL, 5243928576, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = -1 ENOMEM (Cannot allocate memory)
dup(2)                                  = 4
[...]
write(4, "mmap: Cannot allocate memory
", 29mmap: Cannot allocate memory
) = 29
最佳回答

Try passing MAP_NORESERVE in the flags field like this:

mmap(NULL, b.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, fd, 0);

It s likely the combination of your swap and physical memory are less than the 5GB requested.

Alternatively you can do this for testing purposes, if it works, you can make the code change above:

# echo 0 > /proc/sys/vm/overcommit_memory

Below are the relevant extracts from the manual pages.

mmap(2):

   MAP_NORESERVE
          Do  not reserve swap space for this mapping.  When swap space is
          reserved, one has the guarantee that it is  possible  to  modify
          the  mapping.   When  swap  space  is not reserved one might get
          SIGSEGV upon a write if no physical memory  is  available.   See
          also  the  discussion of the file /proc/sys/vm/overcommit_memory
          in proc(5).  In kernels before 2.6, this flag  only  had  effect
          for private writable mappings.

proc(5):

   /proc/sys/vm/overcommit_memory
          This file contains the kernel virtual  memory  accounting  mode.
          Values are:

                 0: heuristic overcommit (this is the default)
                 1: always overcommit, never check
                 2: always check, never overcommit

          In  mode 0, calls of mmap(2) with MAP_NORESERVE are not checked,
          and the default check is very weak, leading to the risk of  get‐
          ting a process "OOM-killed".  Under Linux 2.4 any non-zero value
          implies mode 1.  In mode 2  (available  since  Linux  2.6),  the
          total  virtual  address  space on the system is limited to (SS +
          RAM*(r/100)), where SS is the size of the swap space, and RAM is
          the  size  of  the physical memory, and r is the contents of the
          file /proc/sys/vm/overcommit_ratio.
问题回答

Quoting your memory, swap size and overcommit settings from your comment:

MemTotal: 4063428 kB SwapTotal: 514072 kB
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio 
50

With overcommit_memory set to 0 ("heuristic overcommit"), you can t create a private, writeable mapping that s larger than the current free memory and swap total - clearly, since you only have 4.5GB of memory + swap, that can never be true.

Your options are either to use MAP_NORESERVE (as Matt Joiner suggests), if you re sure that you ll never dirty (write to) more pages in the mapping than you have free memory and swap for; or to significantly increase the size of your swap space.





相关问题
Signed executables under Linux

For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...

Relation between USB and PCI

I m bit confused by the following statement in linux device drivers book. http://www.linuxdriver.co.il/ldd3/ 13.2. USB and Sysfs To help understand what this long device path means, we describe ...

Configuring kernel

After create a new system call, how to update the kernel? I tried these lines, make-kpkg clean fakeroot make-kpkg -initrd -append-to-version=-custom kernel_image kernel_headers But Ubuntu asked me ...

热门标签