Question

如果我正在使用sparse.lil_matrix格式，如何轻松高效地从矩阵中删除一列？

Answer 1

我一直想要这个并且实际上还没有一个很好的内置方法来做到这一点。这是一种方法来做到这一点。我选择创建一个lil_matrix的子类并添加remove_col函数。如果您希望，您可以将removecol函数添加到您的lib/site-packages/scipy/sparse/lil.py文件中的lil_matrix类中。这是代码：

from scipy import sparse
from bisect import bisect_left

class lil2(sparse.lil_matrix):
    def removecol(self,j):
        if j < 0:
            j += self.shape[1]

        if j < 0 or j >= self.shape[1]:
            raise IndexError( column index out of bounds )

        rows = self.rows
        data = self.data
        for i in xrange(self.shape[0]):
            pos = bisect_left(rows[i], j)
            if pos == len(rows[i]):
                continue
            elif rows[i][pos] == j:
                rows[i].pop(pos)
                data[i].pop(pos)
                if pos == len(rows[i]):
                    continue
            for pos2 in xrange(pos,len(rows[i])):
                rows[i][pos2] -= 1

        self._shape = (self._shape[0],self._shape[1]-1)

我已经试过了，没有发现任何错误。我当然认为它比切割列要好，因为据我所知，那只是创建一个新矩阵。

我决定也做一个 removerow 函数，但我认为它不像 removecol 那么好。我受到限制，不能以我想要的方式从 ndarray 中删除一行。这是 removerow，可以添加到上面的类中。

    def removerow(self,i):
        if i < 0:
            i += self.shape[0]

        if i < 0 or i >= self.shape[0]:
            raise IndexError( row index out of bounds )

        self.rows = numpy.delete(self.rows,i,0)
        self.data = numpy.delete(self.data,i,0)
        self._shape = (self._shape[0]-1,self.shape[1])

也许我应该将这些函数提交到Scipy存储库中。

Answer 2

更简单更快。你甚至可能不需要将其转换为csr格式，但我知道它可以与csr稀疏矩阵一起工作，并且在它们之间进行转换不应该是一个问题。

from scipy import sparse

x_new = sparse.lil_matrix(sparse.csr_matrix(x)[:,col_list])

Answer 3

对于稀疏的CSR矩阵（X）和要删除的索引列表（index_to_drop）：

to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))    
new_X = X[:,to_keep]

将lil_matrices转换为csr_matrices很容易。请查看lil_matrix文档中的tocsr()。

但请注意，使用tolil（）从csr转换为lil矩阵很昂贵。因此，当您不需要将矩阵格式化为lil格式时，此选择非常好。

Answer 4

我说,我的回答可能是错的,但我很想知道,为什么像以下的胜利一样有效率?

假设您的lil_matrix被称为mat，您想要删除第i列：

mat=hstack( [ mat[:,0:i] , mat[:,i+1:] ] )

现在矩阵将在此之后转换为coo_matrix，但您可以将其转换回lil_matrix。

奥基公司,我的理解是,这将必须在头盔内建立两个矩阵,然后才对马塔变量进行分配,这样它就好像在原来的矩阵加上另一个,但我猜测,如果 sp光大,那么我就认为不存在任何记忆问题(因为记忆(和时间)是使用混凝土的全因)。

Answer 5


def removecols(W, col_list):
        if min(col_list) = W.shape[1]:
                raise IndexError( column index out of bounds )
        rows = W.rows
        data = W.data
        for i in xrange(M.shape[0]):
            for j in col_list:
                pos = bisect_left(rows[i], j)
                if pos == len(rows[i]):
                        continue
                elif rows[i][pos] == j:
                        rows[i].pop(pos)
                        data[i].pop(pos)
                        if pos == len(rows[i]):
                                continue
                for pos2 in xrange(pos,len(rows[i])):
                        rows[i][pos2] -= 1
        W._shape = (W._shape[0], W._shape[1]-len(col_list))
        return W

刚刚重写了你的代码，使其能够接受col_list作为输入 - 或许对某些人会有帮助。

Answer 6

通过查看每个稀疏矩阵的注释，特别是在我们的情况下是csc矩阵，它具有以下优点，如文档中所列。

efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
efficient column slicing
fast matrix vector products (CSR, BSR may be faster)

If you have the column indices you want to remove, just use slicing. For removing rows use csr matrix since it is efficient in row slicing

友情链接