AI Seminar ------------------------------- Tuesday, April 12th, 2005 4:00 pm - 5:30 pm 175 ATL (Large Conference Room) "Towards a Universal Framework for Tree Transduction" Stuart Shieber Artificial Intelligence Research Group Harvard University =============================== The typical natural-language pipeline can be thought of as proceeding by successive transformation of various data structures, especially strings and trees. For instance, low-level speech processing can be viewed as transduction of strings of speech samples into phoneme strings, then into triphone strings, finally into word strings. Morphological processes can similarly be modeled as character string transductions. For this reason, weighted finite-state transducers (WFST), a general formalism for string-to-string transduction, can serve as a kind of universal formalism for representing low-level natural-language processes. Higher-level natural-language processes can also be thought of as transductions, but on more highly structured representations, in particular, trees. Semantic interpretation can be viewed as a transduction from a syntactic parse tree to a tree of semantic operations whose simplification to logical form can be viewed as a further transduction. Machine translation systems have been viewed as tree transductions of various sorts as well. This raises the question as to whether there is a universal formalism for natural-language tree transduction that can play the same role there that WFST plays for string transduction. In this talk, we explore this question, proposing that the characterization of classical tree transducers in terms of bimorphisms, little known outside the formal language theory community, can be used as a unifying framework for a wide variety of tree transduction formalisms. The framework also places so-called synchronous grammar formalisms into the tree transducer family for the first time.