AI Seminar ------------------------------- Tuesday, November 25th, 2003 4:00 pm - 5:30 pm 175 ATL (Large Conference Room) "Understanding the Yarowsky Algorithm" Steven Abney Department of Electrical Engineering and Computer Science University of Michigan ---------------------------------- For many language processing tasks, large amounts of unlabeled data are available, but labeled data is lacking and expensive to create in large quantities. Hence there is considerable interest in bootstrapping, or semi-supervised learning. I spoke last year in the AI seminar about two bootstrapping algorithms: co-training and the Yarowsky algorithm. I gave an analysis that showed they were effective, given certain assumptions about the training data. I will report this time on new results in which I discard the data assumptions and take an entirely different approach to analysing the Yarowsky algorithm. Specifically, I consider a number of variations of the algorithm, and show that they optimize either likelihood (defined in what seems a natural way for partially labeled data) or a related objective function that lower-bounds likelihood.