English 中文(简体)
国际刑警组织纳尼派在一片 n子中的价值
原标题:Interpolate NaN values in a numpy array

是否迅速以直线干扰的价值观取代所有纳恩数值。

例如,

[1 1 1 nan nan 2 2 nan 0]

将改为

[1 1 1 1.3 1.6 2 2  1  0]
最佳回答

Lets define first a simple helper function in order to make it more straightforward to handle indices and logical indices of NaNs:

import numpy as np

def nan_helper(y):
    """Helper to handle indices and logical indices of NaNs.

    Input:
        - y, 1d numpy array with possible NaNs
    Output:
        - nans, logical indices of NaNs
        - index, a function, with signature indices= index(logical_indices),
          to convert logical indices of NaNs to  equivalent  indices
    Example:
        >>> # linear interpolation of NaNs
        >>> nans, x= nan_helper(y)
        >>> y[nans]= np.interp(x(nans), x(~nans), y[~nans])
    """

    return np.isnan(y), lambda z: z.nonzero()[0]

Now the nan_helper(.) can now be utilized like:

>>> y= array([1, 1, 1, NaN, NaN, 2, 2, NaN, 0])
>>>
>>> nans, x= nan_helper(y)
>>> y[nans]= np.interp(x(nans), x(~nans), y[~nans])
>>>
>>> print y.round(2)
[ 1.    1.    1.    1.33  1.67  2.    2.    1.    0.  ]

---
Although it may seem first a little bit overkill to specify a separate function to do just things like this:

>>> nans, x= np.isnan(y), lambda z: z.nonzero()[0]

it will eventually pay dividends.

So, whenever you are working with NaNs related data, just encapsulate all the (new NaN related) functionality needed, under some specific helper function(s). Your code base will be more coherent and readable, because it follows easily understandable idioms.

Interpolation, indeed, is a nice context to see how NaN handling is done, but similar techniques are utilized in various other contexts as well.

问题回答

我提出了这一法典:

import numpy as np
nan = np.nan

A = np.array([1, nan, nan, 2, 2, nan, 0])

ok = -np.isnan(A)
xp = ok.ravel().nonzero()[0]
fp = A[-np.isnan(A)]
x  = np.isnan(A).ravel().nonzero()[0]

A[np.isnan(A)] = np.interp(x, xp, fp)

print A

印刷

 [ 1.          1.33333333  1.66666667  2.          2.          1.          0.        ]

正当使用是合乎逻辑的,如果说适用1D的干涉,则适用1D。

import numpy as np
from scipy import interpolate

def fill_nan(A):
       
    interpolate to fill nan values
       
    inds = np.arange(A.shape[0])
    good = np.where(np.isfinite(A))
    f = interpolate.interp1d(inds[good], A[good],bounds_error=False)
    B = np.where(np.isfinite(A),A,f(inds))
    return B

关于两个层面的数据,SciPy sgriddata 对我来说,工作相当好:

>>> import numpy as np
>>> from scipy.interpolate import griddata
>>>
>>> # SETUP
>>> a = np.arange(25).reshape((5, 5)).astype(float)
>>> a
array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])
>>> a[np.random.randint(2, size=(5, 5)).astype(bool)] = np.NaN
>>> a
array([[ nan,  nan,  nan,   3.,   4.],
       [ nan,   6.,   7.,  nan,  nan],
       [ 10.,  nan,  nan,  13.,  nan],
       [ 15.,  16.,  17.,  nan,  19.],
       [ nan,  nan,  22.,  23.,  nan]])
>>>
>>> # THE INTERPOLATION
>>> x, y = np.indices(a.shape)
>>> interp = np.array(a)
>>> interp[np.isnan(interp)] = griddata(
...     (x[~np.isnan(a)], y[~np.isnan(a)]), # points we know
...     a[~np.isnan(a)],                    # values we know
...     (x[np.isnan(a)], y[np.isnan(a)]))   # points to interpolate
>>> interp
array([[ nan,  nan,  nan,   3.,   4.],
       [ nan,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ nan,  nan,  22.,  23.,  nan]])

我在3D图像上使用这一图像,在2D系列片上运行(4 000个切片350x350)。 整个行动仍需要一小时:

It might be easier to change how the data is being generated in the first place, but if not:

bad_indexes = np.isnan(data)

Create a boolean array indicating where the nans are

good_indexes = np.logical_not(bad_indexes)

Create a boolean array indicating where the good values area

good_data = data[good_indexes]

A restricted version of the original data excluding the nans

interpolated = np.interp(bad_indexes.nonzero(), good_indexes.nonzero(), good_data)

Run all the bad indexes through interpolation

data[bad_indexes] = interpolated

将原始数据替换为污染值。

或利用温斯顿的答复

def pad(data):
    bad_indexes = np.isnan(data)
    good_indexes = np.logical_not(bad_indexes)
    good_data = data[good_indexes]
    interpolated = np.interp(bad_indexes.nonzero()[0], good_indexes.nonzero()[0], good_data)
    data[bad_indexes] = interpolated
    return data

A = np.array([[1, 20, 300],
              [nan, nan, nan],
              [3, 40, 500]])

A = np.apply_along_axis(pad, 0, A)
print A

成果

[[   1.   20.  300.]
 [   2.   30.  400.]
 [   3.   40.  500.]]

我利用该推论来取代所有NN值。

A = np.array([1, nan, nan, 2, 2, nan, 0])
np.interp(np.arange(len(A)), 
          np.arange(len(A))[np.isnan(A) == False], 
          A[np.isnan(A) == False])

产出:

array([1. , 1.33333333, 1.66666667, 2. , 2. , 1. , 0. ])

根据的答复,稍微优化版本。 BRYAN WOODS。 它正确处理源数据的起始和终结值,其速度比原版本快25-30%。 也可使用不同种类的干涉(详情请见假文件。

import numpy as np
from scipy.interpolate import interp1d

def fill_nans_scipy1(padata, pkind= linear ):
"""
Interpolates data to fill nan values

Parameters:
    padata : nd array 
        source data with np.NaN values
    
Returns:
    nd array 
        resulting data with interpolated values instead of nans
"""
aindexes = np.arange(padata.shape[0])
agood_indexes, = np.where(np.isfinite(padata))
f = interp1d(agood_indexes
           , padata[agood_indexes]
           , bounds_error=False
           , copy=False
           , fill_value="extrapolate"
           , kind=pkind)
return f(aindexes)

In [17]: adata = np.array([1, 2, np.NaN, 4])
Out[18]: array([ 1.,  2., nan,  4.])
In [19]: fill_nans_scipy1(adata)
Out[19]: array([1., 2., 3., 4.])

我需要一种办法,在数据开始和结束时也能填满,而主要答案似乎并非如此。

我在履行职责时使用了线性倒退,以填补NaNs。 这克服了我的问题:

import numpy as np

def linearly_interpolate_nans(y):
    # Fit a linear regression to the non-nan y values

    # Create X matrix for linreg with an intercept and an index
    X = np.vstack((np.ones(len(y)), np.arange(len(y))))
    
    # Get the non-NaN values of X and y
    X_fit = X[:, ~np.isnan(y)]
    y_fit = y[~np.isnan(y)].reshape(-1, 1)
    
    # Estimate the coefficients of the linear regression
    beta = np.linalg.lstsq(X_fit.T, y_fit)[0]
    
    # Fill in all the nan values using the predicted coefficients
    y.flat[np.isnan(y)] = np.dot(X[:, np.isnan(y)].T, beta)
    return y

举例说:

# Make an array according to some linear function
y = np.arange(12) * 1.5 + 10.

# First and last value are NaN
y[0] = np.nan
y[-1] = np.nan

# 30% of other values are NaN
for i in range(len(y)):
    if np.random.rand() > 0.7:
        y[i] = np.nan
        
# NaN s are filled in!
print (y)
print (linearly_interpolate_nans(y))

Importing scipy looks like overkill to me. Here s a simple way using numpy and maintaining the same conventions as np.interp

   def interp_nans(x:[float],left=None, right=None, period=None)->[float]:
    """ 
      e.g. [1 1 1 nan nan 2 2 nan 0] -> [1 1 1 1.3 1.6 2 2  1  0]
    
    """
    xp = [i for i, yi in enumerate(x) if np.isfinite(yi)]
    fp = [yi for i, yi in enumerate(x) if np.isfinite(yi)]
    return list(np.interp(x=list(range(len(x))), xp=xp, fp=fp,left=left,right=right,period=period))

Interpolation and extrapolation with padding keywords

The following solution interpolates the nan values in an array by np.interp, if a finite value is present on both sides. Nan values at the borders are handled by np.pad with modes like constant or reflect.

“entergraph

    import numpy as np
    import matplotlib.pyplot as plt
    
    
    def extrainterpolate_nans_1d(
            arr, kws_pad=({ mode :  edge }, { mode :  edge })
            ):
        """Interpolates and extrapolates nan values.
    
        Interpolation is linear, compare np.interp(..).
        Extrapolation works with pad keywords, compare np.pad(..).
    
        Parameters
        ----------
        arr : np.ndarray, shape (N,)
            Array to replace nans in.
        kws_pad : dict or (dict, dict)
            kwargs for np.pad on left and right side
    
        Returns
        -------
        bool
            Description of return value
    
        See Also
        --------
        https://numpy.org/doc/stable/reference/generated/numpy.interp.html
        https://numpy.org/doc/stable/reference/generated/numpy.pad.html
        https://stackoverflow.com/a/43821453/7128154
        """
        assert arr.ndim == 1
        if isinstance(kws_pad, dict):
            kws_pad_left = kws_pad
            kws_pad_right = kws_pad
        else:
            assert len(kws_pad) == 2
            assert isinstance(kws_pad[0], dict)
            assert isinstance(kws_pad[1], dict)
            kws_pad_left = kws_pad[0]
            kws_pad_right = kws_pad[1]
    
        arr_ip = arr.copy()
    
        # interpolation
        inds = np.arange(len(arr_ip))
        nan_msk = np.isnan(arr_ip)
        arr_ip[nan_msk] = np.interp(inds[nan_msk], inds[~nan_msk], arr[~nan_msk])
    
        # detemine pad range
        i0 = next(
            (ids for ids, val in np.ndenumerate(arr) if not np.isnan(val)), 0)[0]
        i1 = next(
            (ids for ids, val in np.ndenumerate(arr[::-1]) if not np.isnan(val)), 0)[0]
        i1 = len(arr) - i1
        # print( pad in range [0:{:}] and [{:}:{:}] .format(i0, i1, len(arr)))
    
        # pad
        arr_pad = np.pad(
            arr_ip[i0:], pad_width=[(i0, 0)], **kws_pad_left)
        arr_pad = np.pad(
            arr_pad[:i1], pad_width=[(0, len(arr) - i1)], **kws_pad_right)
    
        return arr_pad
    
    
    # setup data
    ys = np.arange(30, dtype=float)**2/20
    ys[:5] = np.nan
    ys[20:] = 20
    ys[28:] = np.nan
    ys[[7, 13, 14, 18, 22]] = np.nan
    
    
    ys_ie0 = extrainterpolate_nans_1d(ys)
    kws_pad_sym = { mode :  symmetric }
    kws_pad_const7 = { mode :  constant ,  constant_values :7.}
    ys_ie1 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_sym, kws_pad_const7))
    ys_ie2 = extrainterpolate_nans_1d(ys, kws_pad=(kws_pad_const7, kws_pad_sym))
    
    fig, ax = plt.subplots()
    
    
    ax.scatter(np.arange(len(ys)), ys, s=15**2, label= ys )
    ax.scatter(np.arange(len(ys)), ys_ie0, s=8**2, label= ys_ie0, left_pad edge, right_pad edge )
    ax.scatter(np.arange(len(ys)), ys_ie1, s=6**2, label= ys_ie1, left_pad symmetric, right_pad 7 )
    ax.scatter(np.arange(len(ys)), ys_ie2, s=4**2, label= ys_ie2, left_pad 7, right_pad symmetric )
    ax.legend()




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签