Question

如何找到一个具体栏目的价值是maximal的行文?

df.max()将给我每一栏的最高值,我不知道如何获得相应的增长。

Answer 1

使用pandas idxmax功能。它直截了当:

>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=[ A , B , C ])
>>> df
          A         B         C
0  1.232853 -1.979459 -0.573626
1  0.140767  0.394940  1.068890
2  0.742023  1.343977 -0.579745
3  2.125299 -0.649328 -0.211692
4 -0.187253  1.908618 -1.862934
>>> df[ A ].idxmax()
3
>>> df[ B ].idxmax()
4
>>> df[ C ].idxmax()
1

或者,您也可使用<代码>numpy.argmax,例如numpy.argmax(df[A]——该代码提供同样的东西,并且至少与idxmax/code>在曲线观测中显示。

<><>tidxmax(> 回归指数标签,而不是分类>。
例子:如果你把价值作为你的指数标签加以扼杀,就像一行至一流一样,你可能要知道,最高值出现在第四行(而不是一行)。
if you want the integer position of that label within the Index you have to get it manually (which can be tricky now that duplicate row labels are allowed).

HISTORICAL NOTES:

idxmax() used to be called argmax() prior to 0.11
argmax was deprecated prior to 1.0.0 and removed entirely in 1.0.0
back as of Pandas 0.16, argmax used to exist and perform the same function (though appeared to run more slowly than idxmax).
argmax function returned the integer position within the index of the row location of the maximum element.
pandas moved to using row labels instead of integer indices. Positional integer indices used to be very common, more common than labels, especially in applications where duplicate row labels are common.

例如,将这一条形形码放在“DataFrame上,并贴有重复的行号:

In [19]: dfrm
Out[19]: 
          A         B         C
a  0.143693  0.653810  0.586007
b  0.623582  0.312903  0.919076
c  0.165438  0.889809  0.000967
d  0.308245  0.787776  0.571195
e  0.870068  0.935626  0.606911
f  0.037602  0.855193  0.728495
g  0.605366  0.338105  0.696460
h  0.000000  0.090814  0.963927
i  0.688343  0.188468  0.352213
i  0.879000  0.105039  0.900260

In [20]: dfrm[ A ].idxmax()
Out[20]:  i 

In [21]: dfrm.iloc[dfrm[ A ].idxmax()]  # .ix instead of .iloc in older versions of pandas
Out[21]: 
          A         B         C
i  0.688343  0.188468  0.352213
i  0.879000  0.105039  0.900260

因此,此处仅仅使用<代码>idxmax是不够的,而原有的<代码>argmax将正确地提供最大行程的positional(在这种情况下,立场9)。

这正是那些以有活力的类型语言出现的、令人厌恶的行为之一,这些语言使这种事情变得非常不幸,值得一味地 dead死。如果你是书写系统代码,如果你系统在加入之前突然在某些数据集上被使用,那么就很容易用重复的浏览标签,特别是像CUSIP或SEDOL金融资产识别标志这样的贴标签。你可以轻松地利用类型系统帮助你摆脱困境,而且你可能无法在指数上执行独一无二的特性,而不必输入意外缺失的数据。

因此,你回过头看问题,你想到你的单位测试覆盖了所有东西( did,或更可能没有写过任何测试)——否则(很可能)你会重新看到,你是否在一段时间内冲入这一错误,在这种情况下,你可能不得不从数据库中抽出许多小时的工作成果,禁止你在IPython试图手工复制问题,最后证明:xidmax/code>不能自动编辑,<<>m> > 失望/m>。

Answer 2

You might also try idxmax:

In [5]: df = pandas.DataFrame(np.random.randn(10,3),columns=[ A , B , C ])

In [6]: df
Out[6]: 
          A         B         C
0  2.001289  0.482561  1.579985
1 -0.991646 -0.387835  1.320236
2  0.143826 -1.096889  1.486508
3 -0.193056 -0.499020  1.536540
4 -2.083647 -3.074591  0.175772
5 -0.186138 -1.949731  0.287432
6 -0.480790 -1.771560 -0.930234
7  0.227383 -0.278253  2.102004
8 -0.002592  1.434192 -1.624915
9  0.404911 -2.167599 -0.452900

In [7]: df.idxmax()
Out[7]: 
A    0
B    8
C    7

e.g.

In [8]: df.loc[df[ A ].idxmax()]
Out[8]: 
A    2.001289
B    0.482561
C    1.579985

Answer 3

Both above answers would only return one index if there are multiple rows that take the maximum value. If you want all the rows, there does not seem to have a function. But it is not hard to do. Below is an example for Series; the same can be done for DataFrame:

In [1]: from pandas import Series, DataFrame

In [2]: s=Series([2,4,4,3],index=[ a , b , c , d ])

In [3]: s.idxmax()
Out[3]:  b 

In [4]: s[s==s.max()]
Out[4]: 
b    4
c    4
dtype: int64

Answer 4


df.iloc[df[ columnX ].argmax()]

<代码>argmax(>将提供与第X栏的最高值相当的索引。可以利用该指数的数据框架。

Answer 5

A more compact and readable solution using query() is like this:

import pandas as pd

df = pandas.DataFrame(np.random.randn(5,3),columns=[ A , B , C ])
print(df)

# find row with maximum A
df.query( A == A.max() )

它还交还了一个数据框架,而不是系列数据,对一些使用案例来说,这是手法。

Answer 6

非常简单:我们已经 below了以下,我们希望在C中印出一行,其价值最大:

In:

df.loc[df[ C ] == df[ C ].max()]   # condition check

概述:

A B C
y 2 10

Answer 7

如果你想要整整整整行而不是只读到<条码>id,你可以使用<条码>df.nlargest,并在你希望的顶层浏览中通过,而且你也可以通过一栏/栏。

df.nlargest(2,[ A ])

给与以下2项标准相吻合的: A/AC.12/Add.1。

http://code>df.nsmallest for min Value.

Answer 8

直接的“最大”解决办法对我不可行。

https://stackoverflow.com/users/567620/ely>@ely。

>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=[ A , B , C ])
>>> df
      A         B         C
0  1.232853 -1.979459 -0.573626
1  0.140767  0.394940  1.068890
2  0.742023  1.343977 -0.579745
3  2.125299 -0.649328 -0.211692
4 -0.187253  1.908618 -1.862934
>>> df[ A ].argmax()
3
>>> df[ B ].argmax()
4
>>> df[ C ].argmax()
1

回复以下信息:

FutureWarning:  argmax  is deprecated, use  idxmax  instead. The behavior of  argmax  
will be corrected to return the positional maximum in the future.
Use  series.values.argmax  to get the position of the maximum now.

因此,我的解决办法是:

df[ A ].values.argmax()

Answer 9

mx.iloc[0].idxmax()

这一代码行将给你如何从数据组中的一行中找到最高值,即<代码>mx为数据组,<代码>iloc[0]表示第0指数。

Answer 10

考虑到这一数据框架

[In]: df = pd.DataFrame(np.random.randn(4,3),columns=[ A , B , C ])
[Out]:
          A         B         C
0 -0.253233  0.226313  1.223688
1  0.472606  1.017674  1.520032
2  1.454875  1.066637  0.381890
3 -0.054181  0.234305 -0.557915

Assuming one want to know the rows where column "C" is max, the following will do the work

[In]: df[df[ C ]==df[ C ].max()])
[Out]:
          A         B         C
1  0.472606  1.017674  1.520032

Answer 11

我的工作是:

df[df[ colX ] == df[ colX ].max()]

然后,请在<代码>df上浏览,最大值为colX。

然后,如果你想要指数,你可以在问询结束时添加<代码>。

Answer 12

数据框架的idmax 将最高值和行为(argmax>/code>的标签指数归为pandas/code>的版本(现在该代码退回警告)。如果您希望使用<>位数指数,你可以做如下工作:



max_row = df[ A ].values.argmax()


or

import numpy as np
max_row = np.argmax(df[ A ].values)

请注意,如果你使用<代码>np.argmax(df[A]),则与<代码>df[A].argmax(相同。

Answer 13

使用:

data.iloc[data[ A ].idxmax()]

data[ A ].idxmax() -finds max value location in terms of row data.iloc() - returns the row

Answer 14

If there are ties in the maximum values, then 回返 the index of only the first max value. For example, in the following DataFrame:

回返

A    0
B    3
C    0
dtype: int64

现在,如果我们想到与最高值相应的所有指数,那么我们可以使用<代码>max +eq,以创建一种风度数据框架,然后在df.index上加以使用,以过滤指数:

out = df.eq(df.max()).apply(lambda x: df.index[x].tolist())

产出:

A       [0, 4]
B          [3]
C    [0, 1, 3]
dtype: object

友情链接