English 中文(简体)
How to know whether a copy-on-write page is an actual copy?
原标题:

When I create a copy-on-write mapping (a MAP_PRIVATE) using mmap, then some pages of this mapping will be copied as soon as I write to specific addresses. At a certain point in my program I would like to figure out which pages have actually been copied. There is a call, called mincore , but that only reports whether the page is in memory or not, which is not the same as the page being copied or not.

Is there some way to figure out which pages have been copied ?

最佳回答

Good, following the advice of MarkR, I gave it a shot to go through the pagemap and kpageflags interface. Below a quick test to check whether a page is in memory SWAPBACKED as it is called. One problem remains of course, which is the problem that kpageflags is only accessible to the root.

int main(int argc, char* argv[])
{
  unsigned long long pagesize=getpagesize();
  assert(pagesize>0);
  int pagecount=4;
  int filesize=pagesize*pagecount;
  int fd=open("test.dat", O_RDWR);
  if (fd<=0)
    {
      fd=open("test.dat", O_CREAT|O_RDWR,S_IRUSR|S_IWUSR);
      printf("Created test.dat testfile
");
    }
  assert(fd);
  int err=ftruncate(fd,filesize);
  assert(!err);

  char* M=(char*)mmap(NULL, filesize, PROT_READ|PROT_WRITE, MAP_PRIVATE,fd,0);
  assert(M!=(char*)-1);
  assert(M);
  printf("Successfully create private mapping
");

The test setup contains 4 pages. page 0 and 2 are dirty

  strcpy(M,"I feel so dirty
");
  strcpy(M+pagesize*2,"Christ on crutches
");

page 3 has been read from.

  char t=M[pagesize*3];

page 1 will not be accessed

The pagemap file maps the process its virtual memory to actual pages, which can then be retrieved from the global kpageflags file later on. Read the file /usr/src/linux/Documentation/vm/pagemap.txt

  int mapfd=open("/proc/self/pagemap",O_RDONLY);
  assert(mapfd>0);
  unsigned long long target=((unsigned long)(void*)M)/pagesize;
  err=lseek64(mapfd, target*8, SEEK_SET);
  assert(err==target*8);
  assert(sizeof(long long)==8);

Here we read the page frame numbers for each of our virtual pages

  unsigned long long page2pfn[pagecount];
  err=read(mapfd,page2pfn,sizeof(long long)*pagecount);
  if (err<0)
    perror("Reading pagemap");
  if(err!=pagecount*8)
    printf("Could only read %d bytes
",err);

Now we are about to read for each virtual frame, the actual pageflags

  int pageflags=open("/proc/kpageflags",O_RDONLY);
  assert(pageflags>0);
  for(int i = 0 ; i < pagecount; i++)
    {
      unsigned long long v2a=page2pfn[i];
      printf("Page: %d, flag %llx
",i,page2pfn[i]);

      if(v2a&0x8000000000000000LL) // Is the virtual page present ?
        {
        unsigned long long pfn=v2a&0x3fffffffffffffLL;
        err=lseek64(pageflags,pfn*8,SEEK_SET);
        assert(err==pfn*8);
        unsigned long long pf;
        err=read(pageflags,&pf,8);
        assert(err==8);
        printf("pageflags are %llx with SWAPBACKED: %d
",pf,(pf>>14)&1);
        }
    }
}

All in all, I m not particularly happy with this approach since it requires access to a file that we in general can t access and it is bloody complicated (how about a simple kernel call to retrieve the pageflags ?).

问题回答

I usually use mprotect to set my tracked copy-on-write pages to read-only, then handle the resulting SIGSEGVs by marking the given page dirty and enabling writing.

It isn t ideal, but the overhead is quite manageable and it can be used in combination with mincore, etc. to do more complicated optimizations, like manage your working set size or to approximate pointer information for pages you expect to have swap out, which lets the runtime system cooperate with the kernel rather than fight it.

It is not easy, but possible to determine this. In order to find out whether a page is a copy of another page (possibly another process s) then you need to do the following (recentish kernels):

  1. Read the entry in /proc/pid/pagemap for the appropriate pages in the process(es)
  2. Interrogate /proc/kpageflags

You can then determine that two pages are actually the same page, in memory.

It is fairly tricky to do this, you need to be root, and whatever you do will probably have some race conditions in it, but it is possible.

Copy-on-write is implemented using the memory protection scheme of the virtual memory hardware.

When a read-only page is written to, a page fault occurs. The page fault handler checks if the page carries the copy-on-write flag: if so, a new page is allocated, the contents of the old page and copied, and the write is retried.

The new page is neither read-only nor copy-on-write, the link to the original page is completely broken.

So all you need to do is test the memory protection flags for the page.

On Windows, the API is GetWorkingSet, see the explanation at VirtualQueryEx. I don t know what the corresponding linux API is.

I gave an answer to someone with a similar goal and referenced a question similar to yours.

I think bmargulies answer to that question fits what you need perfectly when the two ideas are combined.

I don t recall such API being exported. Why do you want to do such a thing (What is the root of the problem you re solving?)

You might want to take a look at /proc/[pid]/smaps (which provides a somewhat detailed statistic of pages used/copied/stored).

Again, why would you want to do that? If you re sure this approach is the only one (usually, virtual memory is used and forgot about), you might want to consider writing a kernel module that handles such functionality.





相关问题
Signed executables under Linux

For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

How to write a Remote DataModule to run on a linux server?

i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...

How can I use exit codes to run shell scripts sequentially?

Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better. Here is what I have so far svn update /var/www/...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

热门标签