Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-23T11:47:29.848Z Has data issue: false hasContentIssue false

Learning deterministic regular grammars from stochastic samples in polynomial time

Published online by Cambridge University Press:  15 August 2002

Rafael C. Carrasco
Affiliation:
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, 03071 Alicante, Spain; ([email protected])
Jose Oncina
Affiliation:
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, 03071 Alicante, Spain; ([email protected])
Get access

Abstract

In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.

Type
Research Article
Copyright
© EDP Sciences, 1999

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

D. Angluin, Identifying languages from stochastic examples. Internal Report YALEU/DCS/RR-614 (1988).
M. Anthony and N. Biggs, Computational learning theory. Cambridge University Press, Cambridge (1992).
R.C. Carrasco and J. Oncina, Learning stochastic regular grammars by means of a state merging method, in Grammatical Inference and Applications, R.C. Carrasco and J. Oncina Eds., Springer-Verlag, Berlin, Lecture Notes in Artificial Intelligence 862 (1994).
Casta, M.A. no, F. Casacuberta and E. Vidal, Simulation of stochastic regular grammars through simple recurrent networks, in New Trends in Neural Computation, J. Mira, J. Cabestany and A. Prieto Eds., Springer Verlag, Lecture Notes in Computer Science 686 (1993) 210-215. CrossRef
T.M. Cover and J.A. Thomas, Elements of information theory. John Wiley and Sons, New York (1991).
W. Feller, An introduction to probability theory and its applications, John Wiley and Sons, New York (1950).
K.S. Fu, Syntactic pattern recognition and applications, Prentice Hall, Englewood Cliffs, N.J. (1982).
C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun and Y.C. Lee, Learning and extracting finite state automata with second order recurrent neural networks. Neural Computation 4 (1992) 393-405.
Gold, E.M., Language identification in the limit. Inform. and Control 10 (1967) 447-474. CrossRef
W. Hoeffding, Probability inequalities for sums of bounded random variables. Amer. Statist. Association J. 58 (1963) 13-30.
J.E. Hopcroft and J.D. Ullman, Introduction to automata theory, languages and computation, Addison Wesley, Reading, Massachusetts (1979).
K. Lang, Random DFA's can be approximately learned from sparse uniform examples, in Proc. of the 5th Annual ACM Workshop on Computational Learning Theory (1992).
Maryanski, F.J. and Booth, T.L., Inference of finite-state probabilistic grammars. IEEE Trans. Comput. C26 (1997) 521-536.
J. Oncina and P. García, Inferring regular languages in polynomial time, in Pattern Recognition and Image Analysis, N. Pérez de la Blanca, A. Sanfeliu and E. Vidal Eds., World Scientific (1992).
J.B. Pollack, The induction of dynamical recognizers. Machine Learning 7 (1991) 227-252.
A.S. Reber, Implicit learning of artificial grammars. J. Verbal Learning and Verbal Behaviour 6 (1967) 855-863.
D. Ron, Y. Singer and N. Tishby, On the learnability and usage of acyclic probabilistic finite automata, in Proc. of the 8th Annual Conference on Computational Learning Theory (COLT'95), ACM Press, New York (1995) 31-40.
Smith, A.W. and Zipser, D., Learning sequential structure with the real-time recurrent learning algorithm. Internat. J. Neural Systems 1 (1989) 125-131. CrossRef
A. Stolcke and S. Omohundro, Hidden Markov model induction by Bayesian model merging, in Advances in Neural Information Processing Systems 5, C.L. Giles, S.J. Hanson and J.D. Cowan Eds., Morgan Kaufman, Menlo Park, California (1993).
van der Mude, A. and Walker, A., On the inference of stochastic regular grammars. Inform. and Control 38 (1978) 310-329. CrossRef
R.L. Watrous and G.M. Kuhn, Induction of finite-state languages using second-order recurrent networks. Neural Computation 4 (1992) 406-414.