English 中文(简体)
从纸张表中创建“远程”数据框架
原标题:Creating a "distance" dataframe from a pivot table

诚然,我有一个关系表(思考人有一个物品),我成功地把这个表装成一个ool子。 我想走下一个步骤,并为任何特定用户之间的距离找到一个矩阵。

我知道,我是否想在距离外走一条路,我基本上可以做以下工作:

df = ...
df["val"] = 1

pivot = df.pivot(index="person", columns="hasa", values="val").fillna(0)

# compute one difference
(pivot["Alice"] - pivot["Carol"]).abs().sum()

我不知道如何从这里到完整的数据框架。


Initial table

person hasa
Alice Apple
Bob Banana
Carol Carrot
Bob Apple

Pivot Table

Apple Banana Carrot
Alice 1 0 0
Bob 1 1 0
Carol 0 0 1

Goal Table

Alice Bob Carol
Alice 0 1 2
Bob 1 0 3
Carol 2 3 0
问题回答

让我们开始简单。 可以通过创建另一个数据框架,储存这些距离,来计算贵纸桌上各行之间的距离。 这里的主要逻辑是,通过纸张表的每行翻开,计算与其他各行的距离。

# Create initial DataFrame
df = pd.DataFrame({
     person : [ Alice ,  Bob ,  Carol ,  Bob ],
     hasa : [ Apple ,  Banana ,  Carrot ,  Apple ]
})

# Pivot DataFrame
pivot_df = pd.pivot_table(df, index= person , columns= hasa , aggfunc=len, fill_value=0)

# Create empty DataFrame for distances
distance_df = pd.DataFrame(index=pivot_df.index, columns=pivot_df.index)

# Fill DataFrame with distances
for person1 in pivot_df.index:
    for person2 in pivot_df.index:
        distance_df.loc[person1, person2] = (pivot_df.loc[person1] - pivot_df.loc[person2]).abs().sum()

现在,distance_df 应当有正确的价值观。 请注意,由此产生的数据基是主分校一带的元数据(因为从Alice到Bob的距离与Bobo到Alice的距离相同)。 主要分校为零(因为一个人与自己之间的距离总是零)。

您可直接使用< Crosstab/code>, 则与

from scipy.spatial.distance import cdist

pivot = pd.crosstab(df[ person ], df[ hasa ])

out = pd.DataFrame(cdist(pivot, pivot)**2,
                   index=pivot.index,
                   columns=pivot.index,
                  )

或者,你也可以使用corr,其习俗功能是第二步:

out = pivot.T.corr(lambda a,b: sum(a!=b))

或人工使用





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...