Papers:Texture:Leung-Malik-3dtexton
From Dahuawiki
Back to Texture Modeling
Representing and Recognizing the Visual Appearance of Materials Using Three-dimensional Textons
- T. Leung and J. Malik.
- IJCV (2001).
- Downloaded from Malik's home page
Summary
This work re-invents the word texton and proposes a simple yet effective method to model and describe textures that widely employed by computer vision literatures.
The model is based on filtering and clustering (vector quantization). There are three aspects:
- vocabulary construction
The training set of images are first processed by applying a band of orientation and frequency selective linear filters. Thereby, local pattern around each pixel can be characterized by a vector composed of filter responses. Seeing the repeating properties of local patterns in textures, the author propose to use clustering methods (K-means in the paper) to group all the extracted vectors into different clusters. The centers of these clusters are called textons, which together constitute the vocabulary, which is also known as dictionary or codebook in other literatures.
- texture representation
After the vocabulary is obtained, it can be used in representing any input textures. For each input image, it will be first filtered to extract the response vectors on all pixels, which are then quantized to the texton vectors in the vocabulary. Then, a histogram of textons can be easily obtained by counting the textons of different types. To compare two textures, we can measure the distance between their texton histograms. In this paper, chi-square measure is employed.
- texture reconstruction
The author also presented a simple method to reconstruct the textured image from the texton map. It is accomplished by first reconstructing the filtered responses by looking up the vocabulary and then using linear least square method to estimate the pixel values of the image. In the case that only texton histogram is available, the texton map can be obtained by sampling.
The paper further generalizes the 2D texton model to 3D, in which each pixel is characterized by concatenating the corresponding filter responses vectors extracted under different photometric conditions. These vectors are then clustered to build the 3D texton vocabulary. The author argued that by concatenating the vectors in different photometric conditions, we can obtain textons encoding the information under all lighting and viewing conditions.
My Comments
- Though the main target of this paper is claimed to be address the texture modeling problem under varying photometric conditions (lighting and viewing). However, in my opinion, the significance of this paper mainly lies in 2D part that revisits the texton concept and introduces the filtering-clustering paradigm. We can see in later literatures that this paradigm is very popular due to its simplicity and effectiveness. A lot of works make further improvements following this way.
- The 3D extension actually has some limitations
- The training set requiring aligned textured images captured under a series of conditions is often not available.
- It is difficult to recognize a single textured image using the 3D model. To address this issue, the authors proposed to use MCMC sampling for labeling the textons, however, this loses the simplicity and elegancy.
- The concatenated vector representation is sensitive to environmental changes.
- The paper uses K-means for clustering and thus building vocabulary. Are there any better choices? In addition, how to decide the size of the vocabulary and how to deal with the outliers appearing in the training set remain open issues to explore.
Back to Texture Modeling
