English 中文(简体)
我是如何在作为nes名单储存的庞大矩阵(100000x1,00000)上运作的?
原标题:How do I operate on a huge matrix (100000x100000) stored as nested list?

Circumstances

I have a procedure which will construct a matrix using the given list of values! and the list starts growing bigger like 100 thousand or million values in a list, which in turn, will result in million x million size matrix.

在程序方面,根据行文、专栏或仅根据内容,在矩阵上进行一些增/次/半/多式业务。

Issues

由于汇总表如此之大,我不认为在记忆中进行整个操纵将会奏效。

<<>问题>

therefore, my question would be: how should i manipulate this huge matrix and the huge value list? like, where to store it, how to read it etc, so that i could carry out my operations on the matrix and the computer won`t stuck or anything.

最佳回答

First and foremost, such matrix would have 10G elements. Considering that for any useful operation you would then need 30G elements, each taking 4-8 bytes, you cannot assume to do this at all on a 32-bit computer using any sort of in-memory technique. To solve this, I would use a) genuine 64-bit machine, b) memory-mapped binary files for storage, and c) ditch python.

Update

我在下文中计算,如果你有2个投入矩阵和1个产出矩阵,即100 000×100 00032个轨道浮动/净值元素,即120 GB(而不是GiB,但)。 Assume,在家用电脑上,你可以实现100兆赫/秒I/O带宽的固定目标,任何业务,包括附加和减小,都必须使用矩阵的每个单一要素,业务绝对下限为120 GB /(100 MB/s) = 1200秒,或20分钟,用于单一矩阵操作。 在C文中,尽可能高效地利用操作系统,豁免适用IO等。 对百万人而言,每项行动需要100倍的时间,即1.5天。 由于当时硬盘旋,计算机可能完全无法使用。

问题回答

I suggest using NumPy. It s quite fast on arithmetic operations.

您是否考虑使用字典? 如果矩阵十分稀少,储存该矩阵可能可行

matrix = {
 (101, 10213) : "value1",
 (1099, 78933) : "value2"
}

你的数据结构不可能有阵列,太大。 如果矩阵是双位矩阵,那么你就可以把较大的零区块与同一桶相隔,看一下储存的表述。





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签