Speech Recognition Technology: Maximizing Efficiency in Document Preparation and Case Management

November 2001

By Edward S. Rosenthal

Table of Contents

  1. General Overview
  2. Speech Recognition Lexicon
  3. Six Critical Success Factors
  4. Speech Recognition Manufacturers
  5. Work Flow Models
  6. The Role of the Consultant
  7. Getting It to do What You Hoped It Would Do

I. General Overview

The last four years have witnessed a significant increase in the realworld utilization of speech recognition technologies within the legal environment. While few would debate the ultimate dominance of the Speech Recognition Interface (SRI) for many PC based tasks, there continues to be a lively discussion about the merits and timing of introducing speech recognition within various aspects of legal practices. The focus of this presentation (and chapter) shall be on the practical elements of using speech recognition technologies as they currently exist to maximize organizational efficiency and productivity in document preparation and case management.

Any discussion of technology within the the practice of law needs to address the fact that the legal environment does not present a homogeneous model. Practices span a broad range from the generalist sole practitioner to large multinational law firms. While experience has taught us that size does not relate to the excellence of law being practiced, it certainly may have implications for the practical decision(s) associated with adopting and implementing new technologies. Throughout this discussion attention will be paid to organizational/systemic vulnerabilities when adopting speech recognition as a 'mission critical' application.

II. Speech Recognition Lexicon

As with most technologies speech recognition utilizes its own 'jargon'. Below are a number of SR specific terms:

Continuous Speech
First introduced by Dragon Systems (Newton, MA) in May of 1997 continuous speech allows the user to speak in a natural sounding way.
Command and Control
Using speech recognition to operate the PC and exercise commands (as opposed to typing words).
Discrete Speech
This was the style of speech to text input that was used from the early 1980s until 1997. It required the user to speak in a pulsed, or packetized, fashion were each word was a distinct phrase. Although surprisingly effective most users found speaking in this stilted fashion was unacceptable.
Speech to Text
The ability of software programs to convert the spoken word to textual characters on a computer.
Text to Speech
The ability of software to convert printed words from the computer to human sounding speech using speech synthesis technology.
Analog Input
Electronic input of the human voice (most typically with a condenser microphone) through a computer soundcard/sound chip.
Digital Input
Capturing of the human voice in converting to a digital input signal prior to the computer Interface.
Enterprise
Technology that is intended to be installed and deployed over a network, as opposed to a single PC installation.
Graphical User Interface (GUI)
The iconic representation of computer functions, known to most users as its proprietary Windows iteration.
Speech Recognition Interface (SRI)
Software and operating system designed to utilize speech recognition as the primary Interface device.
Universal Serial Bus (USB)
New type of connectivity for peripheral devices.. many new microphones connect to the computer to the USB port and digitize the signal prior to entry to the computer.

III Six Critical Success Factors

The successful implementation and use of speech recognition technology requires careful coordination of many underlying factors. Some are technical, some are end-user driven. We believe that there are six distinct identifiable factors that lead to the successful implementation of speech recognition technologies.

1. Platform

Speech recognition software requires a computer system(s) that has properly selected componentry and architecture to optimize the speech recognition application. This certainly exceeds just basic advice regarding minimum processor speed and RAM. It includes selecting the proper soundcard, coordinating network card and sound device, etc.

The current configuration of the NGT VoicePro computer system (designed to optimize speech recognition applications) is as follows:

Intel Pentium IV 1.5 GHz processor; MSI Logic 845 motherboard (supporting PC 133 SDRAM); 512 MB PC 133 SDRAM; Sound Blaster Live! Value PCI sound card; ATI Expert Pro 2000 32 MB video card; 40 GB IDE Harddrive (7200 RPM); 10/100 DLink Ethernet; 56 K. Rockwell modem; special multimedia case (with front side connectivity). Generally, we will utilize Windows 2000 as the operating system (as of the time of this writing Windows XP is approximately two weeks away from launch)

Additionally, we have started designing computer systems with removable hard drive drawers. This allows for efficient system above checks and balances for speech recognition where it serves as a mission critical application. Conceptually, should a system component fail the hard drive could be easily removed and placed in a backup unit. This means that IT needs to maintain one backup support system for any number of users. Conversely, should a user hard drive fail a replacement hard drive can easily be inserted with the appropriate backup data files. Both strategies minimize potential downtime, while enabling IT adequate time to repair a failed unit.

2. Input Device

There has been a real proliferation of different types of input devices available for use with speech recognition technologies. Probably one of the most critical factors in selecting and input device is to know that the device is speech recognition certified. Even high-quality microphones may not be appropriate for use with speech recognition software since the optimal microphone will select certain frequencies and bandwith for input.

Most software includes a corded headset microphone of varying qualities... as a general rule, the more expensive software includes a better headset microphone. In addition to head worn microphones, one can get desktop and monitor top microphones, handheld microphones, 'gooseneck' microphones (which are worn around the neck), etc.. Cordless microphones that do not require the user to be 'tethered' to the computer have become increasingly popular as well.

It must be remembered however that selecting the correct microphone has as much to do with matching it to the computer system as its inherent speech recognition properties do. (See the Role of the Consultant.) My current preference for a microphone, that I use on a daily basis, is the Shure Bros. TCHS cordless input microphone. It has very high quality components and broadcasts on an FM frequency up to 300 feet to my computer. In addition, a complete called GN Netcom has introduced their model 9020D which has to a broadcast on a 2.4 GHz cycle, which can be used to support both speech input to the computer and use with the telephone. This means that one can easily switch between talking to the to most significant technologies that most attorneys will interact with on a regular basis.

3. Software Selection

The field of software developers/distributors of speech recognition technology is fairly narrow. However, each developer produces a range of products. It is often unfortunate that an otherwise successful user may find their initial efforts fail due to inadequate selection of product. Most often this has to do with purchasing inexpensive software when a more sophisticated product is necessary for the job to be done.. However, it also has to do with matching the product and features to the required job. Some of the things to look for when selecting a speech recognition software product are:

Other important issues revolve around matching operating systems, application suites, and speech input and related peripheral technologies. Not every version of every piece of software plays well with others. Therefore, ver. matching should be an important consideration when making a selection. We also recommend that speech recognition technology be considered as one component of long-range IT planning whether for a sole practitioner or large integrated corporation.

4. Training

Users need to be realistic when considering integrating a new technology, whether speech recognition or another new technology, within a busy legal practice. Most people benefit from professional training when attempting to master new skills. Considering that most attorneys operate on billable time it has been our experience that their time is most effectively used by hiring an outside consultant(s) to provide training services... where focus on the practical integration of speech recognition technology within the desired workflow model of the legal practice is understood. Training should always be end-user focused, and driven by the tasks the end-user is hoping to accomplish.

A primary consideration in designing training and implementation support should be minimizing culture change. Specifically, this means that an outside consultant should consider current workflow models, users orientation toward PC computing, and other related variables pertaining to the introduction of a new technology. The more harmonious the introduction of new technology is with current practices the greater the likelihood that practicing attorneys will have the opportunity to maximize the benefit of new technology. The cultural transition to speech input technology has several significant challenges associated with it. Our experience has been that discord can be minimized with the proper use of language models (whether custom or third party), preprogrammed macro libraries, and proper templates. If these can be well integrated into the introductory training groups of attorneys tend to move forward better and exhibit a higher acceptance rate of speech input technologies.

5. System Maintenance

Computer systems are like automobiles. They require regular maintenance. In fact, the Windows operating systems have built in utilities to accomplish this maintenance. One of the most critical elements of system maintenance is to have an installed and viable virus checking program. Equally critical, is that this program is updated on a regular basis since the nature of virus code is changing all the time. We currently use Norton AntiVirus 2001 and Innoculan PCCillin... and find both to be good choices for most users. In addition, the following maintenance should be performed on a biweekly basis: deleting of Temporary and Temporary Internet files, emptying the Recycle Bin, running ScanDisk (if available), and running SystemDefragmentation. The latter is an often overlooked aspect of system maintenance, but proves to be critical in maintaining speed and system stability when using speech input utilities.

6. Technical Support

Speech recognition technologies move from being a novelty to a mission critical application very quickly. Once an individual or organization has changed their workflow model from tape based dictation systems, to direct dictation to the PC -or using a digital recording device to the PC- downtime can cause severe problems. Implementing speech recognition technology requires insight about who will provide the ongoing technical support that may be needed, what turnaround times for support can be tolerated within the organization, and what sort of technical redundancies can be built in order to eliminate downtime. There are many aspects of computer system design and selection that can help minimize downtime with speech input when it serves as a mission critical application. The should be discussed with a qualified professional wherever possible prior to implementing a pilot program or full implementation of speech recognition technologies.

Some Final Comments About Implementation

It is our general recommendation that the implementation of speech recognition technology within an organization have clearly defined objectives coupled with a defined implementation plan. Larger organizations should 'pilot' the technology in order to eliminate problems prior to broader implementation.

As indicated above, speech recognition technology which may initially perceived be as a niche technology by stakeholders can quickly become a mission critical application. As such, it is generally recommended that redundancies be built into the implementation plan. These redundancies include simple elements like backup input devices, backup of voice file data, etc. They can also include hot swappable spare computer systems that can easily replace a failed system with a minimum of downtime. (See the Role of the Consultant.)

IV Speech Recognition Manufacturers

At the time of this writing, there are three primary manufacturers of general-purpose large vocabulary continuous speech voice recognition products. These manufacturers generate three distinct product lines. The table below provides an overview:

Manufacturers of Speech Recognition Products

Manufacturer

Product Version

Product Edition

L&H Dragon Systems

NaturallySpeaking 5.0

Preferred /Professional/ Legal

IBM

ViaVoice 9.0

Professional/Legal

Phillips Speech

Viva

N/A

Microsoft Corp.

Win XP/ Office 2002

N/A

As mentioned previously selecting the correct product and edition can be an important factor in the successful implementation of speech recognition technology. Both Dragon Systems and IBM offer Legal modules for their underlying products. These may often be a more appropriate choice for an organization wishing to have the most rapid implementation with the least amount of personnel and product training. Add-on language models not only modify the vocabulary (dictionary) but also modify the underlying contextual data that assists in overall accuracy. These language models can also make saying something like Federal Second result in the transcription of Fed2nd- which can certainly minimize initial frustration.

Both Dragon Systems and IBM also offer 'final mile' customization of their products. This means that in the hands of a specialist voice-activated forms, templates, and actions can be created. One of the more dramatic examples of this is that utilizing the NaturalVoc programming tool a competent consultant can create a custom language model for a given attorney or law firm. This means that it will not only recognize generic 'legalisms' but no almost intuitively how an individual constructs sentences and documents. In addition, complete legal documents can be assembled by voice, often with only a few simply generated text macros. This type of customization can not only reduce the amount of initial training time, but increases the 'perceived value' of the technology and hence minimizes the psychological impact of early on frustrations.

At this juncture Microsoft Corporation has yet to put a true general purpose large vocabulary speech input utility into the market. Their speech component in Win XP and/or Office will probably require revision (or customized programming) to make it viable as a primary input for most tasks that would be required by law practices.

V Workflow Models

Many attorneys, paralegals, and legal organizations are fascinated by the potential of speech recognition technologies. While the correct selection and implementation of this technology can improve productivity, reduce costs, and reduced turnaround time on document generation as with most technologies the decision to implement should not be undertaken capriciously. Prior to undertaking an implementation it is advisable to consider the current workflow model, and develop a transition plan to bring this technology to the end-users.

Issues associated with evaluating the workflow model and a transition plan include: how are documents currently being generated; are there aspects of document generation that can be easily automated using speech recognition technologies; does the person/people that will be using speech input want to speak directly to a PC or to a recording device; will the final document be edited and generated by the person doing the dictation, or is administrative support an essential ingredient in overall implementation. Irrespective of specific workflow models it is our considered opinion that new technologies should always be piloted prior to full organizational implementation. This allows the user(s) to do a real-time evaluation of the technology, while allowing ample time for transition.

Recent innovations in speech recognition technology have altered the breadth of potential implementation models available to people wishing to utilize speech recognition technology. In addition to dictating directly to the PC and having words turned into text users may also dictate to generate only a .wav file which can then be edited at a later time or at another location. Users may also dictate full documents and store a combined document/wav file for later editing or ship this combined file across the network for final editing. One of the most exciting recent developments is the ability to store the User File on a network which allows one to maintain a single voice file within a LAN environment and simply logon their voice file for many PC connected to the master network.

Also, a number of hand-held recording devices have been developed which will adequately support speech recognition technology. These include specialized hand-held analog recording devices (similar to the PearlCorder that most attorneys are familiar with), digital hand-held devices that record and transmit a digital signal, and computer-based .wav file modules that record directly on the PC.

Recording Devices for Use with Speech Recognition Technologies

Manufacturer

Product

Key Features

Norcom Electronics

2700 Recorder

Analog recorder; excellent ergonomics; uses specialty magnetic tapes; SR 1 coupling device required for connection to PC; offers complete transcriptionists workstation

VoiceIt

3200 Digital Recorder

Initially supported as NaturallySpeaking Mobile unit; include proprietary linking software; record up to two hours with additional memory chips

Sony

SpeechStick

Records on in built RAM chip; excellent ergonomics; requires special coupling cable

Olympus

D3000

Onboard microphone adequate for dictation; transfers data in compressed form

Generally, we recommend the users learn to dictate directly to the PC before attempting to utilize recording devices (whether direct editing or later editing by another is going to be used). This is because optimizing the accuracy of speech recognition software requires a number of acquired skills that are learned by talking to the computer. Avoiding the initial step of learning how to speak and maximize accuracy increases the ultimate editing, which may ultimately lead to a failed implementation.

VI The Role of the Consultant

Our experience has been that many users benefit by securing the services of an outside technology consultancy familiar with integrating speech recognition technologies within the legal environment. Generally, a consultant will want to secure an initial meeting to understand the goals for speech recognition within the organization, and to understand the starting point for a proposal. (Many of the better firms will charge for this service. Some will apply a credit towards pilot program implementation. Others may waive this fee.)

Certainly any thorough evaluation should include a workflow analysis, recommendations regarding platform design or modifications to existing PC platforms, product selection, proposed integration strategies, 'final mile' customization, end-user training, and ongoing technical support.

It will be expected that personnel from the firm allocate time from their busy day to spend time with the consultant. The better the understanding of the current workflow model is the better the resulting transition and integration plan should be. By reviewing current documentation, and understanding the organizational goal for speech recognition not only can appropriate product recommendations can be developed, but in addition to cultural transition between not using speech input technologies and committing to them can be minimized.

The 'final mile' customization of speech technologies within the organizational framework can be critical to success. A consultancy needs to be able to provide voice-activated template design and properly constructed text macro calls at a minimum. If the organization has a goal to integrate speech recognition beyond the document generation process-like into a proprietary accounting/time and billing package-the consultant will need to have some familiarity with this process as well. In addition, a consultancy needs to be able to work with in-house IT to develop possible network design strategies if using an Enterprise approach.

Wherever possible we believe it is preferable that the outside consultancy be providing the end-user training program as well. This allows for a continuity between program design from a technical standpoint, and delivering this technology to the people that will ultimately be using it. Training objectives should be laid out in advance, and should be task oriented.

Finally, ongoing technical support, can become an increasingly critical success factor as an individual or organization commits to speech recognition technology. This is because the system redundancies that may have previously existed to support document generation will no longer be in place. Even a short period of system downtime can make a big difference when under pressure to generate a legal pleading, complete client notations, or finalize billings.

VII General Conclusion

In the final analysis, as with any emerging technology, is important to have realistic expectations about capabilities, outcomes, and implementation timeframes.

By clearly laying out objectives prior to commencing a pilot implementation the probabilities of a successful outcome are greatly increased. When setting objectives try to include both hard and soft objectives. An example of hard objectives would be: reduce document turnaround time by 50%; eliminate outsourced transcription costs within six months. An example of soft objectives would be: improve attorney satisfaction with document preparation/decrease administrative frustration-while this latter type of objective is difficult to measure the impact of new technology on the organizational culture often has as much value as some of the hard objectives.

Often when discussing speech recognition technology with new clients I am asked, How long will it be before I am productive with speech recognition technology? Certainly there are many possible answers to this question but as a general rule I liked people to think in terms of the following timelines (assuming the six critical success factors are being adhered to): with professional instruction one should find that within two weeks of initial use the end-user has a basic understanding of the technology... and is comfortable completing basic word processing and Command and Control tasks by voice; within two months of initial use most PC-based tasks are being completed with a relatively high degree of the ease and comfort, and the user is able to ask the right questions in order to grow their capabilities; within six months of initial use most basic tasks have been voice automated, voice shortcuts will it been designed, and if using customizable software all templates and macros should be in place and usable.

One of the analogies that I am probably best known for is my description of speech recognition software is being like bringing home a puppy. At the time that most people decide to bring home their first puppy they have visions of frolicking on the beach with their newfound companion. While the hope is that someday this will be the reality of the new relationship, the short term reality is that they have an unruly beast in their midst. It will awaken them in the middle of the night with its plaintiff wails, and happily mark its territory in a variety of ways. It is only through commitments and diligent training that the animal grows into the dog of their dreams. Speech recognition technology presents some of these same dynamics (although hopefully with less personal and physical trauma).

Remember, if you are kind to your puppy and train it properly, it will become what you had hoped and do what you want it to do.

Page Last Updated: Thursday, November 13, 2008


Copyright 1999-2008 Next Generation Technologies Incorporated

Jump to: site navigation | section navigation | page content

Recommended Reading for Legal Professionals

Highlighted Products and Services

Site Navigation