abstract
- Although recurrent neural nets have been moderately successful in learning to emulate finite-state machines (FSMs), the continuous internal state dynamics of a neural net are not well matched to the discrete behavior of an FSM. We describe an architecture, called DOLCE, that allows discrete states to evolve in a net as learning progresses. DOLCE consists of a standard recurrent neural net trained by gradient descent and an adaptive clustering technique that quantizes the state space. We describe two implementations of DOLCE. The first implementation, called DOLCE(u), uses an adaptive clustering scheme in an unsupervised mode to determine both the number of clusters and the partitioning of the state space as learning progresses. The second model, DOLCE(s), uses a Gaussian Mixture Model in a supervised learning framework to infer the states of an FSM. DOLCE(s) is based on the assumption that a finite set of discrete internal states is required for the task, and that the actual network state belongs to this set but has been corrupted by noise due to inaccuracy in the weights. DOLCE(s) learns to recover the discrete state with maximum a posteriori probability from the noisy state. Simulations show that both implementations of DOLCE lead to a significant improvement in generalization performance over earlier neural net approaches to FSM induction. The idea of adaptive quantization is not just applicable to DOLCE but can be applied to other domains as well.