******* Language, Cognition, and Computation Lecture
Series *******
This
talk will focus on two areas of research in statistical or
machine
learning approaches to natural language processing.
In
the first part of the talk I'll describe research on statistical
approaches
to natural language parsing. I will first review work on
generative,
history-based models, and describe how these methods give
a
probabilistic model for a number of syntactic phenomena. The talk
will
then cover more recent discriminative or non-parametric
approaches
to the parsing problem. A key feature of these methods is
their
flexibility in terms of the parse tree features which can
be
incorporated.
In
the second part of the talk I'll discuss research on unsupervised,
or
partially supervised, approaches to natural language problems. Much
of
the work in statistical NLP has considered supervised training: A
human
annotator marks examples (for example part-of-speech tag
sequences,
parse tree structures, or named entities in text) which are
then
used to train a model that recovers similar structures on test
data.
Unfortunately, manual labeling of data can be laborious, and may
simply
not be feasible in some domains. I will discuss
statistical
models, and experimental results, showing that in some
cases
unlabeled examples can drastically reduce the need for
supervised
training examples.
Bio:
Michael
Collins did his undergraduate studies in Electrical
Engineering
at Cambridge University, and went on to do a Masters in
Speech
and Language Processing, also at Cambridge. He received his PhD
from University of Pennsylvania in 1998. In his dissertation, Mike worked on
statistical methods for natural language parsing which led to one of the highest
performing
parsers in the field. After his PhD, he was a researcher AT&T
labs-research
from January 1999 until November 2002. Since January
2003
he has been an Assistant Professor in the EECS department at MIT,
and
in the MIT AI Lab. His research interests are in natural language
*******************************************************************