Question

I want to implement a simple SVM classifier, in the case of high-dimensional binary data (text), for which I think a simple linear SVM is best. The reason for implementing it myself is basically that I want to learn how it works, so using a library is not what I want.

The problem is that most tutorials go up to an equation that can be solved as a "quadratic problem", but they never show an actual algorithm! So could you point me either to a very simple implementation I could study, or (better) to a tutorial that goes all the way to the implementation details?

Thanks a lot!

Answer 1

Some pseudocode for the Sequential Minimal Optimization (SMO) method can be found in this paper by John C. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. There is also a Java implementation of the SMO algorithm, which is developed for research and educational purpose (SVM-JAVA).

Other commonly used methods to solve the QP optimization problem include:

constrained conjugate gradients
interior point methods
active set methods

But be aware that some math knowledge is needed to understand this things (Lagrange multipliers, Karush–Kuhn–Tucker conditions, etc.).

Answer 2

Are you interested in using kernels or not? Without kernels, the best way to solve these kinds of optimization problems is through various forms of stochastic gradient descent. A good version is described in http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf and that has an explicit algorithm.

The explicit algorithm does not work with kernels but can be modified; however, it would be more complex, both in terms of code and runtime complexity.

Answer 3

Have a look at liblinear and for non linear SVM s at libsvm

Answer 4

The following paper "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM" top of page 11 describes the Pegasos algorithm also for kernels.It can be downloaded from http://ttic.uchicago.edu/~nati/Publications/PegasosMPB.pdf

It appears to be a hybrid of coordinate descent and subgradient descent. Also, line 6 of the algorithm is wrong. In the predicate the second appearance of y_i_t should be replaced with y_j instead.

Answer 5

I would like to add a little supplement to the answer about original Platt s work. There is a bit simplified version presented in Stanford Lecture Notes, but derivation of all the formulas should be found somewhere else (e.g. this random notes I found on the Internet).

If it s ok to deviate from original implementations, I can propose you my own variation of the SMO algorithm that follows.

class SVM:
  def __init__(self, kernel= linear , C=10000.0, max_iter=100000, degree=3, gamma=1):
    self.kernel = { poly :lambda x,y: np.dot(x, y.T)**degree,
                    rbf :lambda x,y:np.exp(-gamma*np.sum((y-x[:,np.newaxis])**2,axis=-1)),
                    linear :lambda x,y: np.dot(x, y.T)}[kernel]
    self.C = C
    self.max_iter = max_iter

  def restrict_to_square(self, t, v0, u):
    t = (np.clip(v0 + t*u, 0, self.C) - v0)[1]/u[1]
    return (np.clip(v0 + t*u, 0, self.C) - v0)[0]/u[0]

  def fit(self, X, y):
    self.X = X.copy()
    self.y = y * 2 - 1
    self.lambdas = np.zeros_like(self.y, dtype=float)
    self.K = self.kernel(self.X, self.X) * self.y[:,np.newaxis] * self.y
    
    for _ in range(self.max_iter):
      for idxM in range(len(self.lambdas)):
        idxL = np.random.randint(0, len(self.lambdas))
        Q = self.K[[[idxM, idxM], [idxL, idxL]], [[idxM, idxL], [idxM, idxL]]]
        v0 = self.lambdas[[idxM, idxL]]
        k0 = 1 - np.sum(self.lambdas * self.K[[idxM, idxL]], axis=1)
        u = np.array([-self.y[idxL], self.y[idxM]])
        t_max = np.dot(k0, u) / (np.dot(np.dot(Q, u), u) + 1E-15)
        self.lambdas[[idxM, idxL]] = v0 + u * self.restrict_to_square(t_max, v0, u)
    
    idx, = np.nonzero(self.lambdas > 1E-15)
    self.b = np.sum((1.0-np.sum(self.K[idx]*self.lambdas, axis=1))*self.y[idx])/len(idx)
  
  def decision_function(self, X):
    return np.sum(self.kernel(X, self.X) * self.y * self.lambdas, axis=1) + self.b

In simple cases it works not much worth than sklearn.svm.SVC, comparison shown below (I have posted code that generates these images on GitHub)

I used quite different approach to derive formulas, you may want to check my preprint on ResearchGate for details.

友情链接