James Allen
(much of this summary is due to Brian Bradley)
12.0 INTRODUCTION
"The term reference is traditionally used to name the study of how phrases in a sentence connect to objects in the real world."
There are two major forms of noun phrase reference:
1) ANAPHORIC reference
-- a NP that refers to an object mentioned earlier
-- in the same sentence; or,
-- in an earlier sentence.
2) NONANAPHORIC reference
-- a NP that refers to an object not previously mentioned.
Some cases are not easily classified, such as:
1) a set of objects later referred to individually
2) an object whose sub-parts are later referred to.
Forms other than NP can also be referenced:
1) whole sentences which constitute events
2) events introduced with verbs
12.1 SIMPLE REFERENCE
"The simplest view of reference is that noun phrases either introduce new objects to the context or identify objects already in the context."
Two main types:
1) INDEFINITE reference introduces new objects into the context ex: a dog, some people
2) DEFINITE reference identifies objects already existing in the context. ex: the dog, these people, Jack Florey
In software:
1) Indefinite reference is handled by creating a new term of the appropriate type and adding information to it with subsequent anaphoric references.
2a) Nonanaphoric definite references - (proper names) can usually be just be put in a simple lookup table.
2b) Nonanaphoric definite references - ('the' forms) requires searching the knowledge base for matching objects and hoping you get exactly one (some more complete queries can use combinations of parameters).
"The major weakness of the approach described in this section is the failure to account for context."
12.2 SIMPLE ANAPHORIC REFERENCE
"There are significant syntactic constraints on what objects an anaphoric NP may refer to. ...In particular, in order for a definite NP, NP2 (called the ANAPHORA), to corefer with another NP, NP1 (called the ANTECEDENT), the following conditions must usually hold:
1) they must agree in number, person, and gender;
ex: *Jack went to the party and she got drunk. vs Jack went to the party and he got drunk.
2) NP1 should precede NP2; and
ex: *He said Jack wants to leave. vs Jack said he wants to leave.
3) if NP1 is the subject of the clause that contains NP2, then NP2 must be in the reflexive form; otherwise, NP2 must not be in the reflexive form."
ex: *Jack saw him in the mirror. vs
Jack saw himself in the mirror.
Two different techniques must be used for handling anaphora:
1) one for INTERSENTENTIAL uses (the antecedent is in the same sentence) -- reflexive pronouns must not appear
-- nonreflexive pronouns may appear
2) one for INTRASENTENTIAL uses (the antecedent is in a previous sentence)
-- reflexive pronouns may appear
-- nonreflexive pronouns may appear
INTRASENTENTIAL REFERENCE
One simplified algorithm determines what to do with a NP as follows:
1) if proper name, indefinite NP, or definite NP, follow 12.1 algorithm and store the result in a new slot REF.
2) if reflexive pronoun, the REF is set to the value in the sentence's SUBJ slot (assuming compatible number and gender)
3) if nonreflexive pronoun, all preceding NPs (save the SUBJ) are tested for number and gender compatibility. Any matches are possible antecedents. If none are found, try intersentential anaphora methods.
INTERSENTENTIAL ANAPHORA
"The main technique for handling intersentential anaphora is the maintenance of a record of all objects mentioned in the preceding sentences. This record consists of the syntactic analysis of the immediately preceding sentences, plus an ordered list of all referents mentioned in the last several sentences, called the history list... Given this structure, to find the referent of an anaphoric NP (whose antecedent was not identified in the intrasentence stage), you search the history list, starting from the most recently mentioned objects, until you find one that satisfies the number and person information of the pronoun."
The previous algorithm extended to cover intersentential anaphora:
1) if proper name or indefinite NP, use section 12.1 to identify or create the referent.
2) if reflexive pronoun, use intrasentential reference techniques.
3) if nonreflexive pronoun, apply techniques for intrasentential anaphora and if this fails apply techniques for intersentential anaphora.
4) if definite NP, then apply intersentential anaphora technique and if this fails apply techniques in 12.1.
12.3 EXTENDING THE HISTORY LIST MECHANISM
"So far the system has handled definite noun phrases in two ways: checking the history list and searching the knowledge base for a unique object that satisfies the description. In many other cases, however, the NP is related to an object on the history list but is not identical to it. Then you must find this connection to identify the appropriate referent."
ex: A: Do you have a pencil I can borrow?
B: I have a red pen or a pencil.
A: I'll take the pencil.
B: Oh, sorry, the lead is broken, so you'll have to take the pen.
"The definite NP 'the pencil' is analyzed successfully as an intersentential anaphora."
REFERENCE TO RELATED OBJECTS
However, "the history list mechanism is inadequate for analyzing the NP 'the lead'...To analyze this word correctly, the system needs to know that a lead is a part of a pencil and recognize that the referent of the lead is the appropriate subpart of the previously mentioned pencil"
"You can take one of two general approaches to this problem. The first approach is to extend the history list with objects related to the mentioned objects... The second approach is to define certain relationships, such as PART-OF: when matching a description against a history list, you also match against objects that can be related to an object on the history list by these relationships. In either case, whether explicitly or implicitly, you have to identify the set of objects that are closely related to each object mentioned."
"...one attractive possibility is that the closely related objects are ... placed on the history list [as subparts]"
But this may need to be extended to cover cases like "When we entered the kitchen, we saw that the gas had been left on."(kitchen-stove-gas)
Another possible approach would be to use spreading activation models: 1) to find a semantic relationship between the current NP and previously mentioned objects
2) the strongest semantic connection suggests a "winner"
DEFINITE REFERENCES AND SETS
"Plural noun phrases refer to sets of objects and, as a result, introduce new complications into the process of finding the intended referent...
ex: We found a gold coin and a rusted knife. We took +these objects+ to the police
. We found seven coins. +The oldest+ was dated 1823.
I used two scuba tanks. +The 1600 psi tank+ was my favorite.
To handle these cases, the history list mechanism needs extending again. Objects on the list may be combined to form sets, and sets may be broken apart into their individual components."
Plural descriptions involve two classes of information:
1) information about the set itself
eg: cardinality info - two, several, both and some other modifiers
2) information about the individual members of the set
There is a choice of algorithms for set construction:
1) sets can be built and added to the history list when the individual objects are added
-- certain useful syntactic information is available locally
-- however, many sets would be built which are never referred to
2) sets can be built only after subsequent reference to an object is made
-- only needed sets would be constructed
3) hybrid strategies
-- every algorithm has its own characteristic strengths and weaknesses
"When dealing with examples that involve reference to elements of a previously mentioned set, you may not know anything about the individuals that make up a set until they are referred to."
"The use of the superlative and comparative adjectives explicitly signals a selection from a set. In these cases you can immediately try to identify the set in question."
ex: We saw two elephants. The larger one snorted. ("Larger" indicates that the set has two elements.)
"Perhaps the most complex cases occur with singular definite NPs that add new information as they pick out an element of a previously mentioned set"
ex: I used two scuba tanks. +The 1600 psi tank+ was my favorite. (the algorithm must figure out to ignore the 1600 psi modifier when searching for the tank)
THE EXTENDED ALGORITHM FOR DEFINITE NPs
The tactics suggested up to this point are combined into a simple algorithm.
12.4 ONE_ANAPHORA AND VP-ANAPHORA
ONE_ANAPHORA
"One class of reference involves referring back to a syntactic structure previously mentioned and then reinterpreting it to produce a new semantic interpretation. Typical examples include the use of the pronouns 'one' or 'some' or concern prepositional phrase or relative clause modifiers on the pronoun. Sentences involving this form of anaphora often include a modifier such as 'too,' 'as well,' and 'also'...
ex: I saw two bears in the woods.
Bill saw some in the parking lot too.
"Other cases arise in using numbers...
ex: Reserve a seat for me on the flight. Reserve one for Jack too.
One way to deal with this is to essentially match the syntax with a previous NP from the history. Then "construct a new syntactic structure by using the structure found on the history list with new values substituted from the anaphoric phrase."
"The main weakness of this method is that it is particularly sensitive to the way things are said."
VP- AND SENTENCE ANAPHORA
Here are examples of pronominal reference to objects introduced by sentences and verb phrases.
ex: Jack lost the race. It surprised Sam.
Jack congratulated the winner. After some hesitation, Sam did it too.
"To handle these types of examples, the history list needs to contain constituents other than just NPs." (namely events and VP constituents)
12.5 OTHER PROBLEMS IN REFERENCE
"Thus far the chapter has concentrated on definite reference and pronomial reference. There are many other forms of noun phrases whose interpretation depends on referencing techniques. This section considers some of these issues, although there are not as yet any well-established techniques for handling these problems."
REFERENTIAL AND ATTRIBUTIVE NOUN PHRASES
"A basic distinction can be made between REFERENTIAL noun phrases, where the NP is used to refer to a particular object, and ATTRIBUTIVE uses of noun phrases, where the NP is used to describe a set of characteristics. In general there is no way to detect the use of a noun phrase from its syntactic form, though the sentential content in which the NP occurs may strongly favor a particular interpretation."
ex: I saw the chairman of the board. (referential) vs
I want to be the chairman of the board. (attributive)
INDEFINITE NOUN PHRASES
"In some cases an indefinite NP does not introduce an object available for subsequent reference."
ex: I don't own a dog. It has brown fur.
"It is difficult to determine a strong constraint for such cases, however." ex: I don't own a dog. It was stolen.
QUANTIFIERS
"The quantifiers 'all,' 'each,' 'every,' and 'some' often create a context that complicates subsequent reference."
GENERICS
"Other complications arise with the generic use of descriptions, such as in the sentence 'Lions are dangerous.'" (vs "All lions are dangerous.")
"In addition, a speaker can generalize from a specific object mentioned in one sentence to a class of that object and comment on the class in a succeeding sentence."
ex: Each boy received a model airplane. They are good presents.
FORWARD REFERENCE
"Another class of reference that has not yet been considered in this chapter occurs when the pronoun precedes its referent in the text... In such cases there is no antecedent for the pronoun at the time it is encountered, and the antecedent appears later in the text! The models presented earlier cannot be adapted to handle such behavior easily... ex; When he returned home, John found his door open.
...To deal with such cases a better solution would be to devise a special-purpose mechanism that is signaled by certain syntactic structures..."
12.6 ELLIPSIS
"Ellipsis involves the use of sentences that appear ill-formed because they do not form complete sentences. Typically, the parts that are missing can be extracted from the previous sentence." ex: Where are the banana's? The peaches?
SYNTACTIC CONSTRAINTS ON ELLIPSIS
"The hypothesis underlying the following analysis of ellipsis is that the phrase given in an elliptical utterance (the input fragment) must correspond in structure to a subconstituent in the previous utterances (the target fragment)."
AN ALGORITHM BASED ON SYNTAX
"To handle ellipsis the system needs to maintain the complete syntactic analysis of the last sentence (or two sentences, in the case of a dialog)..."
(an algorithm which matches syntax is described)
SEMANTIC PREFERENCES
In some cases syntax alone is not enough.
ex: A: Did the clerk put the ice cream in the refrigerator?
B: No.
A: The TV dinners? (matches ice cream)
vs
A: The manager? (matches clerk)
vs
A: The freezer? (matches refrigerator)
"A technique that can account for each if the preceding examples is to compute a semantic similarity measure between the input fragment and the potential target fragments and to select the closest one."
SUMMARY
"One of the basic techniques for dealing with the reference problem is the use of history lists to model the local context necessary to handle anaphoric reference and ellipsis. The success of this approach depends strongly on a hypothesis of recency: The referent for an anaphoric NP is the object mentioned most recently that fits all the constraints imposed by the form of the anaphora (that is, number, gender, person, reflexive) and the selectional restrictions imposed by the sentence containing the anaphora."