Question

在气球系统中,读物图书馆为我们提供了一种功能(组合-组合-组合)以协调海滩,防止假相。选择国家排雷中心的具体标志,我们可以使用提卜马图书馆。我想的是两点。我将某些镜子连接到某些处理器上,我希望从相应的联阿特派团天线上为每个透镜分配当地数据结构,以便减少对深层记忆作业的拖延。我如何能够这样做?

Answer 1

如果你只想在联阿盟的一个拨款人周围取得协调功能,你可以轻易地建设自己的功能。

想法是将“不结盟的<代码>(>)称作“略微增加的空间”。然后回到第一道地址。为了能够免费,你需要在已知地点储存基址。

这里就是一个例子。适当替代姓名:

pint         //  An unsigned integer that is large enough to store a pointer.
NUMA_malloc  //  The NUMA malloc function
NUMA_free    //  The NUMA free function

void* my_NUMA_malloc(size_t bytes,size_t align, /* NUMA parameters */ ){

    //  The NUMA malloc function
    void *ptr = numa_malloc(
        (size_t)(bytes + align + sizeof(pint)),
        /* NUMA parameters */
    );

    if (ptr == NULL)
        return NULL;

    //  Get aligned return address
    pint *ret = (pint*)((((pint)ptr + sizeof(pint)) & ~(pint)(align - 1)) + align);

    //  Save the free pointer
    ret[-1] = (pint)ptr;

    return ret;
}

void my_NUMA_free(void *ptr){
    if (ptr == NULL)
        return;

    //  Get the free pointer
    ptr = (void*)(((pint*)ptr)[-1]);

    //  The NUMA free function
    numa_free(ptr); 
}

在你使用时,你需要打上<条码>my_NUMA_free<>/code>,用于分配到<条码>米_NUMA_malloc的任何物品。

Answer 2

努马·诺马-埃()在提卜杜马的职能分配了整个记忆页,通常为4096页。藏线通常为64条 by。由于4096年是64个多领域,从Numa_alloc_*(a)返回的任何东西都将在海滩一级得到调整。

然而,要了解Numa_alloc_*()的功能。在人页上说,他们比我相信的相对较少,但我发现的最大问题是,由Numa_alloc_*()同时分配到许多核心,一度都遇到严重的争议问题。在我的案件中,用Numa_alloc_onnode()取代小鼠是一种洗 was(通过利用当地记忆而获得的隔断被分配/免费时间的增加所抵消);小鼠比两者更快。我一度在32条深线/分管上演了数千座12-16千米的小块。时间实验表明,单读速度是t numa_alloc_onnode()的,这造成了我的工序花了大量时间来进行分配,从而留下了锁定/保留问题,而这些问题可能是造成的原因。 I ve采用的解决方案是,一度把大的记忆库分配给uma,然后按需要将其分配给每一天线上。我利用原子的造物允许每个透镜(I pin threads to cpus)从每一天子的记忆中 gr。如果你想要:我这样做的话,你可以按路线调整分配。这种做法将 p倒了甚至小oc的ants子(它已经意识到,但没有意识到,至少Debain Squeeze版本似乎没有。这种做法的倒数部分是,你可以取消个人分配(不做更多的工作,不管怎么做),你只能免除所有内在的定点分配。然而,如果这是要求功能的暂时搁置空间,或者如果你可以具体指明不再需要这种记忆的话,那么这种做法就非常有效。如果你能够预测你在每一节点上还需要多少时间,这无疑是有助益的。

@nandu: 我赢得了全方位的席位,时间长,与我所做的其他事情相联的地方,使这一职位不够透明。我将做些什么是我新的小型(小型)功能稍微缩短的版本,以说明核心想法:

void *my_malloc(struct node_memory *nm,int node,long size)
{
  long off,obytes;

  // round up size to the nearest cache line size
  // (optional, though some rounding is essential to avoid misalignment problems)

  if ((obytes = (size % CACHE_LINE_SIZE)) > 0)
    size += CACHE_LINE_SIZE - obytes;

  // atomically increase the offset for the requested node by size

  if (((off = __sync_fetch_and_add(&(nm->off[node]),size)) + size) > nm->bytes) {
    fprintf(stderr,"Out of allocated memory on node %d
",node);
    return(NULL);
  }
  else
    return((void *) (nm->ptr[node] + off));

}

where

struct node_memory {
  long bytes;         // the number of bytes of memory allocated on each node
  char **ptr;         // ptr array of ptrs to the base of the memory on each node
  long *off;          // array of offsets from those bases (in bytes)
  int nptrs;          // the size of the ptr[] and off[] arrays
};

和Nm->ptr [node] 采用Numa_alloc_onnode()功能设立。

我通常在结构中储存可允许的节点信息,因此,我的“小区”可以检查没有职能电话的节点要求;我还检查天线的存在和面积是明智的。 ......sync_fetch_and_add()功能是原子功能中的一种动力;如果你重新编辑,就不需要其他东西。我使用原子,因为在我有限的经历中,原子在高读率/计数条件下(如4P NUMA机器)比对面更快。

友情链接