Papers:Texture:sparse-coding

From Dahuawiki

Jump to: navigation, search

Back to Texture Modeling

Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?

B.A. Olshausen and D.J. Field.
Vision Research (1997)

Summary

The paper proposes a method to encode image with sparse coding based on a linear generative model with overcomplete basis. The target of the approach is to derive sparse and statistically independent features to represent an image and explain its formation.

The probabilistic formulation is simple

I(\mathbf{x}) = \sum_{i}^K a_i \phi_i(\mathbf{x}) + v(\mathbf{x}) = \Phi a + v

Here, I is the image, a = [a_1, a_2, \ldots, a_K] are coefficients of superposition, \Phi = [\phi_1, \phi_2, \ldots, \phi_K] are the basis, which are overcomplete, v is noise term, and \mathbf{x} is used to indicate the pixel position in an image.

The Bayesian framework is then given by

P(I|\Phi) = \int P(I|a, \Phi) P(a) da

Here

P(I|a, \Phi) = \frac{1}{Z_I} \exp\left( - \frac{||I - \Phi a||^2}{2 \sigma_I^2} \right),

and

P(a) = \prod_{i=1}^K P(a_i) = \prod_{i=1}^K \frac{1}{Z_a} \exp\left(-\beta S(a_i)\right)

are respectively the conditional likelihood that measures the fitness of the model, and the prior probability distribution that explicitly encourage sparseness by penalizing coefficient activation. In order to promote sparseness, the S is chosen such that S(a) increases more sharply around zero as |a| increases.

The learning process simultaneously learns the overcomplete set of basis and recover the coefficients for training images, by optimizing the solution of the problem as follows

\Phi^* = \operatorname{arg}\min_{\Phi} \langle \min_a E(I, a|\Phi) \rangle

where

E(I, a|\Phi) = ||I - \Phi a||^2 + \lambda \sum_{i=1}^K S(a_i).

The problem is solved using gradient descent.

Two key aspects of the method are overcomplete basis and sparse prior. The former offers greater flexibility in matching the generative model, while the latter selects the solution that are most effective in encoding the image. The author argues that the competitive interaction between bases implicitly induces nonlinear mapping between input and output, thus allows the reduction of higher-order forms of redundancy.

Note that, in this paper, the localization of the basis is achieved by manually partitioning the image into small patches, and confining the learning of the bases within the patches.

My Comments

  • It is the pioneer work to introduce sparse coding into vision literatures. Since then, sparse coding has received increasing attention in the community, which is considered as more suitable to model natural images than using non-sparse coding like PCA.
  • The sparsity is enforced softly through the prior distribution.
  • Sparse prior in itself does not guarantee localized bases. The localization in this paper is done by manually dividing the image into local partitions. I am considering whether we can incorporate the locality requirement into the prior distribution just as what has been done on sparsity.
  • The paper did not compare the sparse coding method with other representative works including PCA, LFA, and NMF.
  • In the context of texture modeling, sparse and localized bases are essentially a good set of textons.


Back to Texture Modeling