Nigel Ward: Research Projects

Towards Persuasive Dialog Systems

Time-Based Language Modeling

Responsiveness in Spoken Tutorial Dialogs

Cross-Cultural Misperceptions of Dialog Behaviors

Automating the Discovery of Dialog Cue Patterns

Speaking Rate in Dialog

Time-Based Language Modeling

Speech recognizers all include a component for predicting, based on the past context, what words are likely to appear next. Today these components, known as language models, operate at the symbol level, abstracted away from the details of how and when the words are spoken. Spoken language, however, is not just a symbolic or mathematical object, but is produced and understood by human brains, with specific processing constraints, and these can directly affect what happens when in dialog.

This project is developing language models and ``dialog models'' that explicitly use the information in the timings of words. Inspired by psychological research suggesting that dialog and language behaviors are the result of multiple simultaneously active cognitive processes, the working assumption is that the words likely to be spoken at a given time depend, probabilistically, on the elapsed time since various reference points: for example since the speaker began talking, since the speaker's last disfluency, since the listener's last back-channel, etc. Statistical analyses of large corpora of human-human spoken dialogs, with machine learning methods, are revealing patterns and regularities which are being used to build language models with improved predictive power.

These language models implicitly represent some aspects of dialog dynamics, with the potential to lead to an integrated understanding of the nature of dialog as a human ability. These improved language models are also likely to improve speech recognition accuracy, enabling the development of spoken language systems that are more accurate, more efficient, and more useful.

(with Alejandro Vega, Shreyas Karkhedkar, Ben Walker, Nisha Kiran, and Shubhra Datta)

Funded by the NSF, as award IIS-0914868, 10-1-09 through 9-30-12.

Responsiveness in Spoken Tutorial Dialogs

An important ability seen in human-human conversations is the ability for a conversant to pick up on the nuances of the other's speech, and from that to be able to infer whether the other is confident, frustrated, confused, etc. Good conversants can then use this information to alter their own speaking style to show supportiveness to the other speaker. We are examining these abilities as they are deployed in tutorial dialog. By analyzing a corpus of dialogs with skilled tutor, we have found rules for which acknowledgment to use when, and in what prosodic form, based on the user's current cognitive and communicative state, as revealed by his or her prosody and recent behavior. Experiments show that users prefer systems which produce acknowledgements chosen appropriately in this way. (Rafael Escalante, continuing work by Tasha Hollingsed and by Wataru Tsukahara)

Towards Persuasive Dialog Systems

The range of applications for spoken dialog systems today is very limited: only information access and very simple transactions are commonly supported. Even these systems are generally disliked and avoided whenever practical. Ultimately, however, spoken dialog systems have the potential to be better than other media (web interfaces etc.), for certain kinds of interaction. We are examining persuasive dialogs, where the speakers use their language and vocal repertoire in a highly adaptive, highly effective, and "charming" way. So far we have collected a corpus of dialogs between freshmen and a staff member wanting them to consider graduate school as an option, identified the content nuggets, and built two baseline systems, one with text input and output and one in VoiceXML. We are now working to discover the high-level persuasive strategies and low-level engaging behaviors in this corpous, and to embody these in a spoken dialog system. (Jaime C. Acosta)

Cross-Cultural Misperceptions of Dialog Behaviors

The rules governing real-time interpersonal interaction are today not well understood. With only a few exceptions, there are no quantitative, predictive rules explaining how to respond in real-time, in the sub-second range, in order to be an effective communicator in a given culture. This can be a problem in intercultural interactions; if an American knows only the words of a foreign language, not the rules of interaction, he can easily appear uninterested, ill-informed, thoughtless, discourteous, passive, indecisive, untrusting, dull, pushy, or worse. Short of long-term cultural exposure, there are today no reliable ways to train speakers to understand and follow such rules and attain mastery of interaction at the sub-second level. The purpose of this research is to increase our knowledge and know-how in this area.

The aim of this project is to develop methods for training learners of Arabic to master these behaviors and thereby appear more polite. So far we have focused on back-channel feedback. We have developed a training sequence which enables the acquisition of a basic Arabic back-channel skill, namely, that of producing feedback immediately after the speaker produces a sharp pitch downslope. This training sequence includes software that provides feedback on learners' attempts to produce the cue themselves and feedback on learners' performance as they play the role of an attentive listener in response to one side of a pre-recorded dialog. Preliminary experiments indicate that this is effective. (with Yaffa Al Bayyari, David Novick, and Marisa Flecha-Garcia)

Automating the Discovery of Dialog Cue Patterns

Building better dialog systems requires a better understanding of the low-level details of human communication. However the dynamics of interaction at the extreme time-scales characteristic of swift dialog are not accessible to casual observation. Progress here depends tools for systematically analyzing these patterns of behavior. In recent years some excellent freeware tools for audio data transcription, phonetic analysis, and manipulation have appeared, however none directly support the sorts of search, comparison, hypothesis formulation, and hypothesis evaluation essential to advancing scientific understanding and to engineering highly responsive systems. We have begun prototyping toolkits for this kind of analysis, to determine what functionality linguists and others need, and how best to provide it (Downloads), and are developing a method for automatically identifying important dialog cues from conversation data in any language. (with Joshua McCartney)

Speaking Rate in Dialog

The pacing of today's dialog systems tends to be rigid. In addition to the problems of turn-taking, the speaking rate itself is generally fixed. For example, the automatic number-giving that comes at the end of directory assistance calls is at a fixed rate: too slow for some people and too fast for others. We have found that the speaking rate can be adapted automatically for these dialogs, based just on the user's speaking rate and response latency. However simple correlations break down for more complex types of dialog; there the speaking rate appears to depend more on dialog act and dialog state. (with S. Kumar Mamidipally, continuing work by Satoshi Nakagawa)


See also:   Publications
Research Themes in Dialog System Usability
Earlier Projects
Interactive Systems Group Projects Page

Up to Nigel Ward.