English 中文(简体)
Findrow,凡一栏的数值在安达斯数据中是最高的
原标题:Find row where values for column is maximal in a pandas DataFrame

如何找到一个具体栏目的价值是maximal的行文?

df.max()将给我每一栏的最高值,我不知道如何获得相应的增长。

最佳回答

使用pandas idxmax功能。 它直截了当:

>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=[ A , B , C ])
>>> df
          A         B         C
0  1.232853 -1.979459 -0.573626
1  0.140767  0.394940  1.068890
2  0.742023  1.343977 -0.579745
3  2.125299 -0.649328 -0.211692
4 -0.187253  1.908618 -1.862934
>>> df[ A ].idxmax()
3
>>> df[ B ].idxmax()
4
>>> df[ C ].idxmax()
1
  • 或者,您也可使用<代码>numpy.argmax,例如numpy.argmax(df[A]——该代码提供同样的东西,并且至少与idxmax/code>在曲线观测中显示。

  • <><>tidxmax(> 回归指数标签,而不是分类>。

  • 例子:如果你把价值作为你的指数标签加以扼杀,就像一行至一流一样,你可能要知道,最高值出现在第四行(而不是一行)。

  • if you want the integer position of that label within the Index you have to get it manually (which can be tricky now that duplicate row labels are allowed).


HISTORICAL NOTES:

  • idxmax() used to be called argmax() prior to 0.11
  • argmax was deprecated prior to 1.0.0 and removed entirely in 1.0.0
  • back as of Pandas 0.16, argmax used to exist and perform the same function (though appeared to run more slowly than idxmax).
  • argmax function returned the integer position within the index of the row location of the maximum element.
  • pandas moved to using row labels instead of integer indices. Positional integer indices used to be very common, more common than labels, especially in applications where duplicate row labels are common.

例如,将这一条形形码放在“DataFrame上,并贴有重复的行号:

In [19]: dfrm
Out[19]: 
          A         B         C
a  0.143693  0.653810  0.586007
b  0.623582  0.312903  0.919076
c  0.165438  0.889809  0.000967
d  0.308245  0.787776  0.571195
e  0.870068  0.935626  0.606911
f  0.037602  0.855193  0.728495
g  0.605366  0.338105  0.696460
h  0.000000  0.090814  0.963927
i  0.688343  0.188468  0.352213
i  0.879000  0.105039  0.900260

In [20]: dfrm[ A ].idxmax()
Out[20]:  i 

In [21]: dfrm.iloc[dfrm[ A ].idxmax()]  # .ix instead of .iloc in older versions of pandas
Out[21]: 
          A         B         C
i  0.688343  0.188468  0.352213
i  0.879000  0.105039  0.900260

因此,此处仅仅使用<代码>idxmax是不够的,而原有的<代码>argmax将正确地提供最大行程的positional(在这种情况下,立场9)。

这正是那些以有活力的类型语言出现的、令人厌恶的行为之一,这些语言使这种事情变得非常不幸,值得一味地 dead死。 如果你是书写系统代码,如果你系统在加入之前突然在某些数据集上被使用,那么就很容易用重复的浏览标签,特别是像CUSIP或SEDOL金融资产识别标志这样的贴标签。 你可以轻松地利用类型系统帮助你摆脱困境,而且你可能无法在指数上执行独一无二的特性,而不必输入意外缺失的数据。

因此,你回过头看问题,你想到你的单位测试覆盖了所有东西( did,或更可能没有写过任何测试)——否则(很可能)你会重新看到,你是否在一段时间内冲入这一错误,在这种情况下,你可能不得不从数据库中抽出许多小时的工作成果,禁止你在IPython试图手工复制问题,最后证明:xidmax/code>不能自动编辑,<<>m> > 失望/m>。

问题回答

You might also try idxmax:

In [5]: df = pandas.DataFrame(np.random.randn(10,3),columns=[ A , B , C ])

In [6]: df
Out[6]: 
          A         B         C
0  2.001289  0.482561  1.579985
1 -0.991646 -0.387835  1.320236
2  0.143826 -1.096889  1.486508
3 -0.193056 -0.499020  1.536540
4 -2.083647 -3.074591  0.175772
5 -0.186138 -1.949731  0.287432
6 -0.480790 -1.771560 -0.930234
7  0.227383 -0.278253  2.102004
8 -0.002592  1.434192 -1.624915
9  0.404911 -2.167599 -0.452900

In [7]: df.idxmax()
Out[7]: 
A    0
B    8
C    7

e.g.

In [8]: df.loc[df[ A ].idxmax()]
Out[8]: 
A    2.001289
B    0.482561
C    1.579985

Both above answers would only return one index if there are multiple rows that take the maximum value. If you want all the rows, there does not seem to have a function. But it is not hard to do. Below is an example for Series; the same can be done for DataFrame:

In [1]: from pandas import Series, DataFrame

In [2]: s=Series([2,4,4,3],index=[ a , b , c , d ])

In [3]: s.idxmax()
Out[3]:  b 

In [4]: s[s==s.max()]
Out[4]: 
b    4
c    4
dtype: int64
df.iloc[df[ columnX ].argmax()]

<代码>argmax(>将提供与第X栏的最高值相当的索引。 可以利用该指数的数据框架。

A more compact and readable solution using query() is like this:

import pandas as pd

df = pandas.DataFrame(np.random.randn(5,3),columns=[ A , B , C ])
print(df)

# find row with maximum A
df.query( A == A.max() )

它还交还了一个数据框架,而不是系列数据,对一些使用案例来说,这是手法。

非常简单:我们已经 below了以下,我们希望在C中印出一行,其价值最大:

A  B  C
x  1  4
y  2  10
z  5  9

In:

df.loc[df[ C ] == df[ C ].max()]   # condition check

概述:

A B C
y 2 10

如果你想要整整整整行而不是只读到<条码>id,你可以使用<条码>df.nlargest,并在你希望的顶层浏览中通过,而且你也可以通过一栏/栏。

df.nlargest(2,[ A ])

给与以下2项标准相吻合的: A/AC.12/Add.1。

http://code>df.nsmallest for min Value.

mx.iloc[0].idxmax()

这一代码行将给你如何从数据组中的一行中找到最高值,即<代码>mx为数据组,<代码>iloc[0]表示第0指数。

考虑到这一数据框架

[In]: df = pd.DataFrame(np.random.randn(4,3),columns=[ A , B , C ])
[Out]:
          A         B         C
0 -0.253233  0.226313  1.223688
1  0.472606  1.017674  1.520032
2  1.454875  1.066637  0.381890
3 -0.054181  0.234305 -0.557915

Assuming one want to know the rows where column "C" is max, the following will do the work

[In]: df[df[ C ]==df[ C ].max()])
[Out]:
          A         B         C
1  0.472606  1.017674  1.520032

我的工作是:

df[df[ colX ] == df[ colX ].max()]

然后,请在<代码>df上浏览,最大值为colX

然后,如果你想要指数,你可以在问询结束时添加<代码>。

数据框架的idmax 将最高值和行为(argmax>/code>的标签指数归为pandas/code>的版本(现在该代码退回警告)。 如果您希望使用<>位数指数,你可以做如下工作:

max_row = df[ A ].values.argmax()

or

import numpy as np
max_row = np.argmax(df[ A ].values)

请注意,如果你使用<代码>np.argmax(df[A]),则与<代码>df[A].argmax(相同。

使用:

data.iloc[data[ A ].idxmax()]

data[ A ].idxmax() -finds max value location in terms of row data.iloc() - returns the row

If there are ties in the maximum values, then 回返 the index of only the first max value. For example, in the following DataFrame:

   A  B  C
0  1  0  1
1  0  0  1
2  0  0  0
3  0  1  1
4  1  0  0

回返

A    0
B    3
C    0
dtype: int64

现在,如果我们想到与最高值相应的所有指数,那么我们可以使用<代码>max +eq,以创建一种风度数据框架,然后在df.index上加以使用,以过滤指数:

out = df.eq(df.max()).apply(lambda x: df.index[x].tolist())

产出:

A       [0, 4]
B          [3]
C    [0, 1, 3]
dtype: object




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签