We have demonstrated that a simple mechanism, which can be implemented in surprisingly small amounts of physical hardware (or perhaps neural mechanisms?), exhibits behavior comparable to the behavior of small children in the task of learning and using phonological knowledge. In our theory phonological knowledge is encapsulated as a set of boolean constraints. These constraints operate on the classical linguistic representation of a pattern of sound in terms of phonemes and binary distinctive features. The knowledge is applied in phonological performance by classical constraint propagation. The constraints are learned by an incremental process with two phases: detecting correlations and summarizing the accumulated regularities. These summaries are specifications of the particular boolean constraints to be imposed. The summaries may be incrementally generalized or specialized as new data appear. As a bonus, the summaries that are compiled may be read out to obtain recognizable rules of classical linguistics.
Our mechanism has been successful for learning a portion of English phonology. Our mechanism yields almost one-shot learning, similar to that observed in children: It takes only a few carelessly chosen examples to learn the important rules; there is no unreasonable repetition of the data; and there is no requirement to zealously correct erroneous behavior. The mechanism tolerates noise and exceptions. It learns higher-order constraints as it knows more. Furthermore, the intermediate states of learning produce errors that are just like the errors produced by children as they are learning phonology.
While this mechanism has been tested for a chunk of English phonology, it has not yet been extensively tested in all corners of English and it has not yet been tried for other languages (though we have started testing our theory on learning Hebrew verb patterns). This is important, because we intend this to be a strong theory: It had better work in all cases--it is either right or wrong--there are very few parameters that we can wiggle to extend the coverage of the theory if it is wrong.
Over the past few years there has been a heated debate between
advocates of ``Connectionism'' and advocates of more traditional
``Symbolic Artificial Intelligence.'' We believe that contemplation
of our mechanism for acquiring and using phonological knowledge can
shed considerable light on this question. The
essence here is in understanding the relationship between the signals
in the neural circuits of the brain and the symbols that they are said
to represent.
Consider first an ordinary computer. Are there symbols in the
computer? No, there are transistors in the computer, and capacitors,
and wires interconnecting them, etc. It is a connectionist
system. There are voltages on the nodes and currents in the
wires. We as
programmers interpret the patterns of voltages as representations of
our symbols and symbolic expressions. We impose patterns we call
programs that cause the patterns of data voltages to evolve in a way
that we interpret as the manipulation of symbolic expressions that we
intend. Thus the symbols and symbolic expressions are a compact and
useful way of describing the behavior of the connectionist system. We
as engineers arrange for our connectionist system to exhibit behavior
that we can usefully describe as the manipulation of our symbols.
In much the same way, auditory signals are analog trajectories through a low-dimensional space--pressure on the eardrum. By signal processing these are transformed into trajectories in a high-dimensional space that linguists abstract, approximate, and describe in terms of phonemes and their distinctive features. This high-dimensional space is very sparsely populated by linguistic utterances. Because of the sparsity of this space, we can easily interpret configurations in this space as discrete symbolic expressions and interpret behaviors in this space as symbolic manipulations.
It may be the case that the linguistic representation is necessarily sparse because that is the key to making a simple, efficient, one-shot learning algorithm. Thus sparseness of the representation, and the attendant possibility of symbolic description, is just a consequence of the fact that human language is learnable and understandable by mechanisms that are evolvable and implementable in realistic biological systems. In fact, we believe this model of learning is applicable to problem areas outside phonology.
So in the case of phonology at least, the Connectionist/Symbolic distinction is a matter of level of detail. Everything is implemented in terms of neurons or transistors, depending on whether we are building neural circuits or hardware. However, because the representation of linguistic information is sparse, we can think of the data as bits and the mechanisms as shift registers and boolean constraints. If we were dealing with the details of muscle control we would probably have a much denser representation and then we would want to think in terms of approximations of multivariate functions. But when it is possible to abstract symbols we obtain a tremendous advantage. We get the power to express descriptions of mechanisms in a compact form that is convenient for communication to other scientists, or as part of an engineering design.
So what of signals and symbols? There are signals in the brain, and when possible, there are symbols in the mind.
Acknowledgments
We thank Morris Halle for teaching us elementary phonology and helping us to get started in this research. We thank Tom Knight for showing us that an electrical implementation of constraints is feasible. We thank Julie Sussman for numerous criticisms and suggestions to improve the presentation of the material. We thank Patrick Winston for his thoughtful critique and ideas for extending this work to elucidate questions on the general principles of language and learning. We thank Hal Abelson, Jon Allen, Rodney Brooks, Mike Maxwell, Steve Pinker, Elisha Sacks, Peter Szolovits, and Feng Zhao for comments on the manuscript.