The corpus consists of 250 words. The words are common nouns (about 50) and verbs (about 200)
that first-graders might know. The nouns are the singular and plural
forms of common animals and everyday objects (e.g., cat, cats, dog,
dogs, cup, cups, man, men). The corpus includes most of the regular
and irregular verbs used in the psycholinguistic experiments of Marcus
et. al. [9] on English tenses (e.g., go, went, play,
played, kick, kicked).
Consistent with the observation that a human learner receives little explicit correction, the corpus contains only positive examples. However, the lack of external negative evidence does not rule out the possibility that the learner can generate internal negative examples when testing hypotheses. These internal negative examples, as we have seen, play a significant role in the rapid learning of classifiers.
The data record for each word in the corpus has five pieces of information: (1) word identifier, (2) word spelling, (3) a unique meaning identifier (e.g., ``cat'' and ``cats'' have the same meaning id, but ``cat'' and ``dog'' do not), (4) its pronunciation as a sequence of phonemes, (5) its grammatical status (e.g., whether it is a noun or verb, singular or plural, present or past). The data records for ``cat(s)'' and ``dog(s)'' are shown below:
word-id spelling meaning-id pronunciation grammar -------------------------------------------------------------- 12789 cat 6601 k.ae.t. Noun Sing 12956 cats 6601 k.ae.t.s. Noun Plu 25815 dog 13185 d.).g. Noun Sing 25869 dogs 13185 d.).g.z. Noun Plu
The data records are pre-processed to produce bit vector inputs for the performance model and learner. The output of the performance model and learner is bit vectors that typically have a straightforward symbolic interpretation.
In all the experiments below, we use the same parameter settings for
the beam search width (in
the generalization algorithm) and the excitation threshold (in
classifier excitation). The results are not sensitive to the
particular parameter settings.