Towards Persuasive Dialog Systems
Time-Based Language Modeling
Responsiveness in Spoken Tutorial Dialogs
Cross-Cultural Misperceptions of Dialog Behaviors
Automating the Discovery of Dialog Cue Patterns
Speaking Rate in Dialog
Speech recognizers all include a component for predicting, based on the past context, what words are likely to appear next. Today these components, known as language models, operate at the symbol level, abstracted away from the details of how and when the words are spoken. Spoken language, however, is not just a symbolic or mathematical object, but is produced and understood by human brains, with specific processing constraints, and these can directly affect what happens when in dialog.
This project is developing language models and ``dialog models'' that explicitly use the information in the timings of words. Inspired by psychological research suggesting that dialog and language behaviors are the result of multiple simultaneously active cognitive processes, the working assumption is that the words likely to be spoken at a given time depend, probabilistically, on the elapsed time since various reference points: for example since the speaker began talking, since the speaker's last disfluency, since the listener's last back-channel, etc. Statistical analyses of large corpora of human-human spoken dialogs, with machine learning methods, are revealing patterns and regularities which are being used to build language models with improved predictive power.
These language models implicitly represent some aspects of dialog dynamics, with the potential to lead to an integrated understanding of the nature of dialog as a human ability. These improved language models are also likely to improve speech recognition accuracy, enabling the development of spoken language systems that are more accurate, more efficient, and more useful.
(with Alejandro Vega, Shreyas Karkhedkar, Ben Walker, Nisha Kiran, and
Shubhra Datta)
Funded by the NSF, as award IIS-0914868, 10-1-09 through 9-30-12.
An important ability seen in human-human conversations
is the ability for a conversant to pick up on the nuances of the
other's speech, and from that to be able to infer whether the other is
confident, frustrated, confused, etc. Good conversants can then use
this information to alter their own speaking style to show
supportiveness to the other speaker. We are examining these abilities
as they are deployed in tutorial dialog. By analyzing a corpus of
dialogs with skilled tutor, we have found rules for which
acknowledgment to use when, and in what prosodic form, based on the
user's current cognitive and communicative state, as revealed by his
or her prosody and recent behavior. Experiments show that users
prefer systems which produce acknowledgements chosen appropriately in
this way. (Rafael Escalante, continuing work by Tasha Hollingsed and
by Wataru Tsukahara)
The range of applications for spoken dialog systems today is very
limited: only information access and very simple transactions are
commonly supported. Even these systems are generally disliked and
avoided whenever practical. Ultimately, however, spoken dialog
systems have the potential to be better than other media (web
interfaces etc.), for certain kinds of interaction. We are examining
persuasive dialogs, where the speakers use their language and vocal
repertoire in a highly adaptive, highly effective, and "charming" way.
So far we have collected a corpus of dialogs between freshmen and a
staff member wanting them to consider graduate school as an option,
identified the content nuggets, and built two baseline systems, one
with text input and output and one in VoiceXML. We are now working to
discover the high-level persuasive strategies and low-level engaging
behaviors in this corpous, and to embody these in a spoken dialog
system. (Jaime C. Acosta)
The rules governing real-time interpersonal interaction are today not
well understood. With only a few exceptions, there are no
quantitative, predictive rules explaining how to respond in real-time,
in the sub-second range, in order to be an effective communicator in a
given culture. This can be a problem in intercultural interactions;
if an American knows only the words of a foreign language, not the
rules of interaction, he can easily appear uninterested, ill-informed,
thoughtless, discourteous, passive, indecisive, untrusting, dull,
pushy, or worse. Short of long-term cultural exposure, there are
today no reliable ways to train speakers to understand and follow such
rules and attain mastery of interaction at the sub-second level.
The purpose of this research is to increase our knowledge and know-how
in this area.
The aim of this project is to develop
methods for training learners of Arabic to master these behaviors and
thereby appear more polite. So far we have focused on back-channel feedback.
We have developed a training sequence which enables the
acquisition of a basic Arabic back-channel skill, namely, that of
producing feedback immediately after the speaker produces a sharp
pitch downslope. This training sequence includes software that
provides feedback on learners' attempts to produce the cue themselves
and feedback on learners' performance as they play the role of an
attentive listener in response to one side of a pre-recorded
dialog. Preliminary experiments indicate that this is effective.
(with Yaffa Al Bayyari, David Novick, and Marisa Flecha-Garcia)
Building better dialog systems requires a better understanding of the
low-level details of human communication. However the dynamics of
interaction at the extreme time-scales characteristic of swift dialog
are not accessible to casual observation. Progress here depends tools
for systematically analyzing these patterns of behavior. In recent
years some excellent freeware tools for audio data transcription,
phonetic analysis, and manipulation have appeared, however none
directly support the sorts of search, comparison, hypothesis
formulation, and hypothesis evaluation essential to advancing
scientific understanding and to engineering highly responsive systems.
We have begun prototyping toolkits for this kind of analysis, to
determine what functionality linguists and others need, and how best
to provide it (Downloads), and
are developing a method for automatically identifying important dialog
cues from conversation data in any language. (with Joshua McCartney)
The pacing of today's dialog systems tends to be rigid. In addition
to the problems of turn-taking, the speaking rate itself is generally
fixed. For example, the automatic number-giving that comes at the end
of directory assistance calls is at a fixed rate: too slow for some
people and too fast for others. We have found that the speaking rate
can be adapted automatically for these dialogs, based just on the
user's speaking rate and response latency. However simple
correlations break down for more complex types of dialog; there the
speaking rate appears to depend more on dialog act and dialog state.
(with S. Kumar Mamidipally, continuing work by Satoshi Nakagawa)
Time-Based Language Modeling
Responsiveness in Spoken Tutorial Dialogs
Towards Persuasive Dialog Systems
Cross-Cultural Misperceptions of Dialog Behaviors
Automating the Discovery of Dialog Cue Patterns
Speaking Rate in Dialog
| See also:   | Publications | ||
| Research Themes in Dialog System Usability | |||
Earlier Projects
| Interactive Systems Group Projects Page
| |
Up to Nigel Ward.