Fair Witness:*
Capturing Patient-Provider Encounter
through Text, Speech, and Dialogue Processing

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)
Clinical Decision Making Group


Complete and accurate collection of clinical data in the course of health care is a long-standing goal that has not been achieved either by manual record-keeping or through electronic record systems. This proposed project addresses the problem from the beginning of the clinical process, by aiming to improve the capture of relevant medical facts during the face-to-face interaction between a patient and provider. Instead of relying on the provider’s fallible memory to record facts after the visit, the proposed system will “listen” to the conversation, use automatic speech recognition to produce an (imperfect) record of what was said, and apply a variety of text analysis and extraction methods to create a draft record of the encounter. Further, it will provide an interface that should permit patients and providers to examine the facts that were recorded and to correct and complete them, also using speech as the primary interface.

The projects aims are to develop and integrate the components needed to accomplish this goal, to create a testbed in collaboration with researchers at the environmental health clinic of a children’s hos- pital in which experiments can guide system development and assess progress, and to conduct a series of evaluations that assess a series of objectives. First, the research will characterize the ability of the speech recognition, information extraction and information organization components to process the target conversations. Second, it will evaluate the hypothesis that this system can collect a more complete and accurate record than what is routinely collected. Subsequently, it will evaluate the time taken by clinicians to use the system, the extent to which the system is seen to disrupt the patient-provider encounter, the ability of patients to use the system to make additions and corrections to their records, and the subjective response of both patients and providers to use of the system.

Success in this effort should lead to better clinical care that is based on more complete and accurate data. In addition, clinical data are also becoming an important resource in the conduct of translational medicine research, where improved data are obviously highly valuable.


The following people are working on the project or have worked on it and made significant contributions:


As of April 2010, we have made a good start on many aspects of the project. We have, however, found many tasks to be more difficult than we had anticipated. We review accomplishments in several categories.

Institutional Arrangements

We have developed protocols, consent documents, a protocol for recruiting patients, and procedures for storing and analyzing data within the CHB secure environment. These have all been approved by COUHES, MIT's Committee on Use of Humans as Experimental Subjects, and CHB's IRB. We have also established procedures to allow our students to be credentialed as researchers at CHB, in order to give them network and physical access. We have worked with the CHB IT staff to allow us to install our own computer systems and software at CHB.

Data Collection

We very quickly discovered that we needed to record and save the sound files from doctor-patient interviews, to permit subsequent analysis in various ways. We have experimented with several different microphone and recorded technologies, and have settled on a combination of Philips digital recorders and Sennheiser microphones that are worn by both doctors and patients during sessions. We have obtained sufficient copies of these to permit capture of every interaction that meets our criteria: consent of the patient, English speakers, and cases that involve toxic exposure (usually lead).

As of April 2010, we have collected recordings of about fifty conversations between a staff member of the PEHC and a patient's parent. We have also obtained thousands of past medical records, to use in training our language models. We have had considerable difficulty in getting usable transcripts of our recordings using Dragon Naturally Speaking (DNS). Some of our problems were anticipated, such as DNS's lack of ability to deal with multiple voices. We also had unanticipated problems, including suboptimal microphone performance and unexpectedly high error rates from DNS. We have made arrangements to have a commercial transcription company produce transcripts of our recorded conversations, to provide training data for our model building as well as data to permit us to test further compents of the system we are building. We are also exploring alternative speech understanding software to try to improve the accuracy of our automated transcriptions.

System Components

The prototype intelligence listening framework we created before the start of the official project used an intergrated approach using the DNS system development kit to access that program's capabilities. To give us additional flexibility and to decouple the language processing framework from the speech understanding framework, we developed two smaller programs: (1) to interface with DNS through its SDK and to provide a SOAP interface that delivers the interpreted text, its estimate of accuracy, alternative interpretations, and (if desired) the voice file, and (2) a SOAP receiver interfaced to our language processing framework that can further interpret DNS's output. This permits the speech interpretation and the subsequent steps to occur on different machines, possibly under different operating systems, and allows more computational resources to bear on the overall problem simultaneously. We have also further refined our language processing framework, LATE, and wrapped the WEKA machine learning toolkit so that its various tools can be used to build more sophisticated interpretive models. As part of another project, we have also resurrected annotation tools that will be useful in producing ground truth annotations for machine learning.

Interaction Design

We have created a design for the presentation of data to the doctor and patient in an interaction. This follows forms currently used by PEHC physicians to record the most important facts gleaned from a patient interview, and provides the target that is to be populated from the speech record. We are upgrading monitors in the PEHC to support display and interaction with these data.


  1. Jeffrey Klann and Peter Szolovits. An intelligent listening framework for capturing encounter notes from a doctor-patient dialog. BMC medical informatics and decision making (2009) vol. 9 Suppl 1 pp. S3.
  2. Rothberg, Alex. “Interfacing Dragon Naturally Speaking with a Lisp Text Processing System.” Undergraduate Advanced Project, EECS, MIT, May 2009.

*Robert Heinlein, in his science fiction novel, Stranger in a Strange Land, posits a type of creature, a fair witness, who possesses a total and accurate memory of what he is asked to witness, and is incorruptible. This seems like a great icon for what we try to achieve in this project.

Last updated 04/09/2010 , Peter Szolovits.