English 中文(简体)
如何核对载有NaN[封闭]的清单
原标题:How to Check list containing NaN [closed]
  • 时间:2013-12-02 02:13:25
  •  标签:
  • python
Closed. This question needs details or clarity. It is not currently accepting answers.

在我的发言中,我的法典提出了这样一个清单:

list([0.0,0.0]/sum([0.0,0.0]))

循环产生其他所有类型的病媒,但也产生<条码>[南、南],为了避免这一条,我试图以下述一种条件加以防止,但情况并非如此。

nan in list([0.0,0.0]/sum([0.0,0.0]))
>>> False

遣返是否真实?

“entergraph

Libraries I ve loaded:

import PerformanceAnalytics as perf
import DataAnalyticsHelpers
import DataHelpers as data
import OptimizationHelpers as optim
from matplotlib.pylab import *
from pandas.io.data import DataReader
from datetime import datetime,date,time
import tradingWithPython as twp
import tradingWithPython.lib.yahooFinance as data_downloader # used to get data from yahoo finance
import pandas as pd # as always.
import numpy as np
import zipline as zp
from scipy.optimize import minimize
from itertools import product, combinations
import time
from math import isnan
最佳回答

我认为,这之所以有意义,是因为你将<代码>numpy间接从严进口进入范围。

>>> import numpy as np
>>> [0.0,0.0]/0
Traceback (most recent call last):
  File "<ipython-input-3-aae9e30b3430>", line 1, in <module>
    [0.0,0.0]/0
TypeError: unsupported operand type(s) for /:  list  and  int 

>>> [0.0,0.0]/np.float64(0)
array([ nan,  nan])

When you did

from matplotlib.pylab import *

它用<代码>numpy.sum:

>>> from matplotlib.pylab import *
>>> sum is np.sum
True
>>> [0.0,0.0]/sum([0.0, 0.0])
array([ nan,  nan])

您可以检测到,this nan Object(nan isn t un in general)正在通过身份被列入清单,但如果你在array上尝试,似乎通过平等测试,nan ! nan :

>>> nan == nan
False
>>> nan == nan, nan is nan
(False, True)
>>> nan in [nan]
True
>>> nan in np.array([nan])
False

您可使用<代码>np.isnan:

>>> np.isnan([nan, nan])
array([ True,  True], dtype=bool)
>>> np.isnan([nan, nan]).any()
True
问题回答

您应使用math

>>> import math
>>> math.isnan(item)

愿是你们所期待的......

a = [2,3,np.nan]
b = True if True in np.isnan(np.array(a)) else False
print(b)

TLDR: best performance

(scroll down for graphs/charts of performance)

has_nan = any(each!=each for each in your_list)

Python NaN is more Weird than you think

import math
my_nan = float("NaN")

list1 = [ math.nan ]
list2 = [ float("NaN") ]
list3 = [ my_nan ]

math.nan     in list1       # True
math.nan     in list2       # False
math.nan     in list3       # False

float("NaN") in list1       # False
float("NaN") in list2       # False
float("NaN") in list3       # False

my_nan       in list1       # False
my_nan       in list2       # False
my_nan       in list3       # True

# also makes sets really annoying:

set1 = set([math.nan    , math.nan    , math.nan     ])
set2 = set([my_nan      , float("nan"), my_nan       ])
set3 = set([float("nan"), float("nan"), float("nan") ])

len(set1)   # >>> 1
len(set2)   # >>> 2
len(set3)   # >>> 3

无论出于何种原因,python 处理NaN的类似

Optimal Performance Answers

如果贵国的数据列在清单/图表中,这是核对数据的最快方式*

has_nan = any(each!=each for each in your_list)
# from math import isnan #<- is slow

因此,除此以外,海因是最快的方式

  1. You know that 99.9% of the time array wont have NaN (and/or 99.9% it will have NaN as the LAST element)
  2. The list is always over 6000 elements
  3. You really really really care about a 0.5% improvement
  4. And you have numpy installed

之后稍有加快:

# NOTE: can *average* 4x slower for small lists
has_nan = numpy.isnan(numpy.array(your_list)).any()

( Lower is better )

“perf_compare_1”/

然而,如果数据已经存在于一个绝食阵列中,那么这是最快的方式:

# when len() == 20,000 this is literally 100 times faster than the pure-python approach
has_nan = numpy.isnan(your_array).any()

“perf_compare_2”/

这里的表现/标准(假设1.23,python 3.11):

import timeit
from math import isnan
import numpy
import random
from statistics import mean as average
import json


values = []
sample_size = 200
for index in range(1,15):
    list_size = 2**index
    source = list(range(0,list_size))
    cases = {
        "pure python: !=":[],
        "pure python: isnan":[],
        "numpy convert to array":[],
        "numpy (data already in array)":[],
    }
    pure_times = []
    numpy_times = []
    numpy_prebuilts = []
    for _ in range(0,sample_size):
        index = random.randint(-(list_size-1),list_size-1)
        local_source = list(source)
        if index >= 0:
            local_source[index] = float("NaN")
        local_source = tuple(local_source)
        prebuilt = numpy.array(local_source)
        
        cases["pure python: !="].append(timeit.timeit(lambda: any(each!=each for each in local_source  ), number=1_000))
        cases["pure python: isnan"].append(timeit.timeit(lambda: any(isnan(each) for each in local_source  ), number=1_000))
        cases["numpy convert to array"].append(timeit.timeit(lambda: numpy.isnan(numpy.array(local_source)).any(), number=1_000))
        cases["numpy (data already in array)"].append(timeit.timeit(lambda: numpy.isnan(prebuilt).any(), number=1_000))
    
    for each_key, each_value in cases.items():
        cases[each_key] = average(each_value)
    
    print(json.dumps({ "list_size":list_size, **cases,}))
    values.append({ "number of elements":list_size, **cases },)
    
# draw chart
import pandas
import plotly.express as px
df = pandas.DataFrame(values)
df = pandas.melt(df, value_vars=list(cases.keys()), id_vars=[ number of elements ])
df["computation time"] = df["value"]
df.sort_values(by=["variable","number of elements"], inplace=True)
fig = px.line(df, color="variable",x="number of elements",y="computation time")
fig.update_layout(xaxis_type="log",yaxis_type="log")
fig.show()




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签