English 中文(简体)
使用python删除曲线下的数据点
原标题:Remove data points below a curve with python

I need to compare some theoretical data with real data in python. The theoretical data comes from resolving an equation. To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib). Data points and theoretical curves

理论曲线和数据点都是不同长度的数组。

我可以尝试以粗略的方式去除这些点,例如:可以使用以下方法检测第一个上点:

data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([( 1997o , 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[( SN_name ,  |S10 ), ( redshift ,  <f8 ), ( dmodulus ,  <f8 ), ( dmodulus_error ,  <f8 )])    

但我想用一种不那么粗暴的眼神。

那么,有人能帮我找到一种简单的方法来消除问题点吗?

非常感谢。

问题回答

这可能有些过头了,而且是基于你的评论

Both the theoretical curves and the data points are arrays of different length.

我会做以下事情:

  1. Truncate the data set so that its x values lie within the max and min values of the theoretical set.
  2. Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
  3. Use numpy.where to find data y values that are out side the range of acceptable theory values.
  4. DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the inliners one color and the outliers an other color.

我想这是一个接近你想要的剧本。希望它能帮助你完成你想要的:

import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt

# make up data
def makeUpData():
       Make many more data points (x,y,yerr) than theory (x,y),
    with theory yerr corresponding to a constant "sigma" in y, 
    about x,y value   
    NX= 150
    dataX = (np.random.rand(NX)*1.1)**2
    dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
    dataErr = np.random.rand(NX)*dataX*1.3
    theoryX = np.arange(0,1,0.1)
    theoryY = theoryX*theoryX*1.5
    theoryErr = 0.5
    return dataX,dataY,dataErr,theoryX,theoryY,theoryErr

def makeSameXrange(theoryX,dataX,dataY):
       
    Truncate the dataX and dataY ranges so that dataX min and max are with in
    the max and min of theoryX.
       
    minT,maxT = theoryX.min(),theoryX.max()
    goodIdxMax = np.where(dataX<maxT)
    goodIdxMin = np.where(dataX[goodIdxMax]>minT)
    return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]

# take  theory  and get values at every  data  x point
def theoryYatDataX(theoryX,theoryY,dataX):
       For every dataX point, find interpolated thoeryY value. theoryx needed
    for interpolation.   
    f = interpolate.interp1d(theoryX,theoryY)
    return f(dataX[np.where(dataX<np.max(theoryX))])

# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
       Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
    valid indicies.   
    withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
    withinLower = np.where(dataY[withinUpper]
                    >(interpTheoryY[withinUpper]-theoryErr))
    return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]

def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
       Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
    valid indicies.   
    withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
    withinLower = np.where(dataY<(interpTheoryY-theoryErr))
    return (dataX[withinUpper],dataY[withinUpper],
            dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":

    dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()

    TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)

    interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)

    inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
                                    theoryErr)

    outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
                                                    TruncDataY,
                                                    interpTheoryY,
                                                    theoryErr)
    #print inlierIndex
    fig = plt.figure()
    ax = fig.add_subplot(211)

    ax.errorbar(dataX,dataY,dataErr,fmt= . ,color= k )
    ax.plot(theoryX,theoryY, r- )
    ax.plot(theoryX,theoryY+theoryErr, r-- )
    ax.plot(theoryX,theoryY-theoryErr, r-- )
    ax.set_xlim(0,1.4)
    ax.set_ylim(-.5,3)
    ax = fig.add_subplot(212)

    ax.plot(inDataX,inDataY, ko )
    ax.plot(outUpX,outUpY, bo )
    ax.plot(outDownX,outDownY, ro )
    ax.plot(theoryX,theoryY, r- )
    ax.plot(theoryX,theoryY+theoryErr, r-- )
    ax.plot(theoryX,theoryY-theoryErr, r-- )
    ax.set_xlim(0,1.4)
    ax.set_ylim(-.5,3)
    fig.savefig( findInliers.png )

This figure is the result: enter image description here

最后,我使用了一些Yann代码:

def theoryYatDataX(theoryX,theoryY,dataX):
   For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.   
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])

def findOutlierSet(data,interpTheoryY,theoryErr):
       Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
    valid indicies.   

    up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
    low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
    # join all the index together in a flat array
    out = np.hstack([up,low]).ravel()

    index = np.array(np.ones(len(data),dtype=bool))
    index[out]=False

    datain = data[index]
    dataout = data[out]

    return datain, dataout

def selectdata(data,theoryX,theoryY):
    """
    Data selection: z<1 and +-0.5 LFLRW separation
    """
    # Select data with redshift z<1
    data1 = data[data.redshift < 1]

    # From modulus to light distance:
    data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)

    # redshift data order
    data1.sort(order= redshift )

    # Outliers: distance to LFLRW curve bigger than +-0.5
    theoryErr = 0.5
    # Theory curve Interpolation to get the same points as data
    interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)

    datain, dataout = findOutlierSet(data1,interpy,theoryErr)
    return datain, dataout

使用这些函数,我最终可以获得:

谢谢大家的帮助。

只要看看红色曲线和点之间的差异,如果它大于红色曲线和红色虚线之间的差异就可以将其移除。

diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]

但请认真对待尼克LH的评论。你的数据在没有任何过滤的情况下看起来很好,你的“outliers”都有很大的错误,不会对拟合产生太大影响。

您可以使用numpy.where()来确定哪些xy对符合绘图标准,也可以枚举来做几乎相同的事情。示例:

x_list = [ 1,  2,  3,  4,  5,  6 ]
y_list = [ f , o , o , b , a , r ]

result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]

print result

我相信你可以改变条件,使上面例子中的2和5是曲线的函数





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签