MLCourse:Kernel SVM
From Dahuawiki
By incorporating kernel techniques with support vector machine, we can derive a nonlinear classifier with large margin in feature space.
Kernel SVM (Strict formulation)
Suppose we have a set of training samples
. Given a feature mapping φ(x), the strict formulation (without slack variables) of support vector machine in the feature space is
This is called primal form of the SVM problem.
In order to applying kernel tricks, we can turn it into its dual form such that the features only appear in inner products. The Lagrangian of the above problem is
At the optima, the derivatives of the Lagrangian with respect to the parameters zeros. Thus,
Plugging them to the Lagrangian, we can derive the dual form according to the Slater Conditions as follows
It is also a convex quadratic optimization problem. However, the variables to be solved in the dual form are α.
With the optimal values of α obtained, the parameters can be derived as follows.
Hence, the discriminant function is given by
While the offset θ0 can be computed from the following constraint on support vector,
Here, the i-th sample is support vector, i.e.
. In principle, each support vector results in the same offset value. However, there will be some deviation due to the computation resolution. Thus it is advisable to take th median of the solutions given by all support vectors.
The geometric margin (in the feature space) of the classifier is
Kernel SVM (Relaxed formulation)
We can introduce slack variables to allow some training samples violate the constraint with finite penalty. The primal form of the problem is
The Lagrangian is given by
At optima, the derivatives of the Lagrangian are zeros.
Then the dual form can be derived by combining the Lagrangian with the equations above
subject to
Solving the dual problem, we can obtain the optimal values of α. Then, the parameter can be given by
As a result, the discriminant function is
In order to see what how to derive the offset parameter θ0. We further analyze the constraints. According to the KKT condition, at optima, we have
In addition, we have
There are three possible cases.
-
- It means that the margin constraint is satisfied and inactive. The i-th sample is not support vector. The value of
is determined by the nonnegative constraint, which is active.
- It means that the margin constraint is satisfied and inactive. The i-th sample is not support vector. The value of
-
- It means that both the margin constraint and the nonnegative constraint are active. The i-th sample lies at margin, and is support vector. Since the nonnegative constraint is active,
.
- It means that both the margin constraint and the nonnegative constraint are active. The i-th sample lies at margin, and is support vector. Since the nonnegative constraint is active,
-
- It means that the margin constraint is active, while nonnegative constraint is not tight. It is possible that
, and thus the i-th sample probably violates the margin constraint. The samples in this case are also support vectors.
- It means that the margin constraint is active, while nonnegative constraint is not tight. It is possible that
Consequently, the offset θ0 can be obtained from the support vectors with
based on the following equation
