English 中文(简体)
我如何能够有效地从零敲碎打的矩阵中删除一栏?
原标题:How can I remove a column from a sparse matrix efficiently?

如果我正在使用sparse.lil_matrix格式,如何轻松高效地从矩阵中删除一列?

最佳回答

我一直想要这个并且实际上还没有一个很好的内置方法来做到这一点。这是一种方法来做到这一点。我选择创建一个lil_matrix的子类并添加remove_col函数。如果您希望,您可以将removecol函数添加到您的lib/site-packages/scipy/sparse/lil.py文件中的lil_matrix类中。这是代码:

from scipy import sparse
from bisect import bisect_left

class lil2(sparse.lil_matrix):
    def removecol(self,j):
        if j < 0:
            j += self.shape[1]

        if j < 0 or j >= self.shape[1]:
            raise IndexError( column index out of bounds )

        rows = self.rows
        data = self.data
        for i in xrange(self.shape[0]):
            pos = bisect_left(rows[i], j)
            if pos == len(rows[i]):
                continue
            elif rows[i][pos] == j:
                rows[i].pop(pos)
                data[i].pop(pos)
                if pos == len(rows[i]):
                    continue
            for pos2 in xrange(pos,len(rows[i])):
                rows[i][pos2] -= 1

        self._shape = (self._shape[0],self._shape[1]-1)

我已经试过了,没有发现任何错误。我当然认为它比切割列要好,因为据我所知,那只是创建一个新矩阵。

我决定也做一个 removerow 函数,但我认为它不像 removecol 那么好。我受到限制,不能以我想要的方式从 ndarray 中删除一行。这是 removerow,可以添加到上面的类中。

    def removerow(self,i):
        if i < 0:
            i += self.shape[0]

        if i < 0 or i >= self.shape[0]:
            raise IndexError( row index out of bounds )

        self.rows = numpy.delete(self.rows,i,0)
        self.data = numpy.delete(self.data,i,0)
        self._shape = (self._shape[0]-1,self.shape[1])

也许我应该将这些函数提交到Scipy存储库中。

问题回答

更简单更快。你甚至可能不需要将其转换为csr格式,但我知道它可以与csr稀疏矩阵一起工作,并且在它们之间进行转换不应该是一个问题。

from scipy import sparse

x_new = sparse.lil_matrix(sparse.csr_matrix(x)[:,col_list])

对于稀疏的CSR矩阵(X)和要删除的索引列表(index_to_drop):

to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))    
new_X = X[:,to_keep]

将lil_matrices转换为csr_matrices很容易。请查看lil_matrix文档中的tocsr()。

但请注意,使用tolil()从csr转换为lil矩阵很昂贵。因此,当您不需要将矩阵格式化为lil格式时,此选择非常好。

我说,我的回答可能是错的,但我很想知道,为什么像以下的胜利一样有效率?

假设您的lil_matrix被称为mat,您想要删除第i列:

mat=hstack( [ mat[:,0:i] , mat[:,i+1:] ] )

现在矩阵将在此之后转换为coo_matrix,但您可以将其转换回lil_matrix。

奥基公司,我的理解是,这将必须在头盔内建立两个矩阵,然后才对马塔变量进行分配,这样它就好像在原来的矩阵加上另一个,但我猜测,如果 sp光大,那么我就认为不存在任何记忆问题(因为记忆(和时间)是使用混凝土的全因)。


def removecols(W, col_list):
        if min(col_list) = W.shape[1]:
                raise IndexError( column index out of bounds )
        rows = W.rows
        data = W.data
        for i in xrange(M.shape[0]):
            for j in col_list:
                pos = bisect_left(rows[i], j)
                if pos == len(rows[i]):
                        continue
                elif rows[i][pos] == j:
                        rows[i].pop(pos)
                        data[i].pop(pos)
                        if pos == len(rows[i]):
                                continue
                for pos2 in xrange(pos,len(rows[i])):
                        rows[i][pos2] -= 1
        W._shape = (W._shape[0], W._shape[1]-len(col_list))
        return W

刚刚重写了你的代码,使其能够接受col_list作为输入 - 或许对某些人会有帮助。

通过查看每个稀疏矩阵的注释,特别是在我们的情况下是csc矩阵,它具有以下优点,如文档中所列。

  • efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
  • efficient column slicing
  • fast matrix vector products (CSR, BSR may be faster)

If you have the column indices you want to remove, just use slicing. For removing rows use csr matrix since it is efficient in row slicing





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签