English 中文(简体)
用程序将列名称添加到 ndarray 中
原标题:Programmatically add column names to numpy ndarray

我试图将列名添加到一个营养的 ndarray 中, 然后根据它们的名字选择列名。 但是这行不通 。 我无法辨别当我添加这些名称时, 或者后来我试图给他们打电话时, 是否出现问题 。

这是我的密码

data = np.genfromtxt(csv_file, delimiter= , , dtype=np.float, skip_header=1)

#Add headers
csv_names = [ s.strip( " ) for s in file(csv_file, r ).readline().strip().split( , )]
data = data.astype(np.dtype( [(n,  float64 ) for n in csv_names] ))

基于尺寸的诊断与我所期望的相符:

print len(csv_names)
>> 108
print data.shape
>> (1652, 108)

“ 打印数据. dtype. names” 也返回预期输出 。

但当我开始用他们的字段名称叫列时, 怪事就会发生。 “ 列” 仍然是一个有108列的阵列...

print data["EDUC"].shape
>> (1652, 108)

...它似乎包含的缺失值比数据集中的行还要多。

print np.sum(np.isnan(data["EDUC"]))
>> 27976

添加信头应该只是个小操作 但我和这个虫子打了好几个小时了,救命!

最佳回答

问题是,您在思考电子表格式的阵列, NumPy 则使用不同的概念。

以下是你必须知道的NumPy:

  1. NumPy arrays only contain elements of a single type.
  2. If you need spreadsheet-like "columns", this type must be some tuple-like type. Such arrays are called Structured Arrays, because their elements are structures (i.e. tuples).

在您的情况下, NumPy 将因此使用您的二维正态阵列, 并生成一个 one 的维阵列, 其类型为 108 元素图普( 您所想到的电子表格阵列是 2 维 ) 。

这些选择可能是出于效率原因作出的:一个阵列的所有要素类型相同,因此大小相同:可以低层次、非常简单和快速地进入。

如所显示的用户545424, 您想要做的事情有一个简单的 NumPy 解答 ( genfromtxt () 接受 names 参数, 加上列名) 。

如果您想要将您的阵列从普通 NumPy ndarray 转换为结构化阵列, 您可以做 :

data.view(dtype=[(n,  float64 ) for n in csv_names]).reshape(len(data))

(您关系密切: 您使用了 < code> 类 () 而不是 < code>view () ) 。 )

您也可以检查对相当几个堆积流问题的答案, 包括< a href=', https:// stackoverflow.com/ questions/ 3622850/ converting- a-2d- numpy- array- to- a- structured- array > > 将 2D 的堆积阵列转换为结构化的阵列 < /a > 和 < a href=> https:// stackoverflow.com/ questions/ 7724711/how- to- convert- round- round- round- numpy- array- to- record-ray-rary >, 以将常规的堆积阵列转换为记录阵列? 。

问题回答

不幸的是,当您尝试添加字段名称时,我不知道发生了什么事情, 但我知道您可以直接从文件创建您想要的阵列。

data = np.genfromtxt(csv_file, delimiter= , , names=True)

编辑:

加入字段名称似乎只在输入为图例列表时才有效 :

data = np.array(map(tuple,data), [(n,  float64 ) for n in csv_names])




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签