MLCourse:Mid-term-exam-points

From Dahuawiki

Jump to: navigation, search

Contents

Basic concepts

  • samples, labels
  • classifier
    a mapping of samples to labels
  • training set, testing set
  • training error
  • the goal of learning
    to find a prediction rule that generalizes well
  • cross-validation
    • leave-one-out cross-validation
    • relation with generalization (likely to generalizes well, but not guaranteed)
  • maximum likelihood estimation

Linear classifiers

  • the formulation of linear classifier
  • decision boundary
  • zero-one loss

Perceptron

  • update rule
  • convergence (bound and conditions)
  • the generalization guarantees (with feedback)

Linear SVM

  • strict formulation (without slack variables)
  • support vectors
  • leave-one-out error and the number of SVs
  • relaxed formulation (with slack variables)
    • trade off
    • what are support vectors in this case?
  • regularization
    • desired objective and regularization penalty
    • hinge loss

Logistic Regression

  • the discriminative formulation
  • log-odds of likelihood is linear
  • MLE estimates
  • log-loss (-log p)
  • need regularization (when samples may be linearly separable)

Linear Regression

  • formulation
    • probabilistic formulation
    • prediction rule
    • the optimal solution
  • bias and variance of the estimates
    • mean squared error
  • ridge regression
    • regularization
    • trade-off between bias and variance reduction

Active Learning

  • what is active learning
  • active learning for linear regression
    • minimizing MSE of parameter estimates
    • selecting most uncertain input
      a convex quadratic function of x

Kernels

what is kernel

  • definitions
    • inner product of features
    • gram matrix is always positive semi-definite
  • kernel construction rules

kernel regression

  • parameters lie in the span of training features <- regularization
  • kernelized prediction

kernel perceptron

  • solution
  • algorithm

kernel SVM

  • primal form and dual form
  • kernelized prediction rule
  • constraints of α
  • geometric margin

kernel optimization

  • kernel parameterization
  • optimization criterion
    • surrogate measure of generalization error
      cross-validation, margin
    • kernel alignment
  • kernel normalization
    • margin depends on scale

Model selection

  • goal: pursue good generalization
  • model -> class of functions
    • nested model (sub-class)
  • empirical risk, (expected) risk
  • minimum probability of error classifier
  • over-fitting
  • relation between training error/test error and model complexity

Structural Risk Minimization

  • complexity penalty
    depends on model, training set size, and confidence
  • upper bound guarantee of generalization error
    • it is a probabilistic guarantee
    • SRM -> best guarantee