Human Computer Interaction Breakout Session

Breakout Chairman: Len Bass (ljb@sei.cmu.edu)

The HCI breakout group identified four interlocking areas of discussion:

These four areas are interlocking because the task determines the setting for the activities supported by the wearables and the setting determines the ergonomic requirements. Also, the task determines which input and output devices are possible. There is also coupling between the input and output devices in the sense that certain input devices (such as a pen) dictate particular output devices (such as a tablet).

We begin this report with an overall summary of our discussion, followed by a detailed discussion of input and output devices. We end with several recommendations that, if accepted, would improve the state of the wearable computer from the HCI perspective.

1. Overall Summary

From an HCI perspective, the highest priority is to make the wearable systems less obtrusive. That is, they need to be smaller and lighter weight and both the input and output devices need to conform to peoples normal working patterns. The need for smaller and lighter weight devices is well recognized and is an overall goal for the hardware group. The need for less obtrusive input and output devices is also recognized by the manufacturers of these devices but there are difficult technical problems to be solved in these cases.

The types of input and output devices that are most appropriate are driven by the requirements of the particular task to be achieved by a wearable computer. For example, if the computer is to be used to record information whose possible content is not known in advance, then an input device that supports arbitrary input is necessary such as a keyboard surrogate or a speech recognition system. On the other hand, if the possible information content is known (such as entering items on a checklist) then a more limited input device such as several buttons or a dial could be acceptable.

Similarly for output, if the task requires displaying arbitrary textual information then a limited resolution display or text to speech system would be most appropriate. If the task requires arbitrary graphics then a higher resolution visual output device is necessary.

All of this discussion is conditioned on the participants assumptions about the tasks for which a wearable computer is being used. Some participants see wearable computers replacing desktop computers and, hence, they would need to be suitable for any application. Other participants see wearable computers being very task specific and using different devices for different tasks (the hammer and screwdriver analogy was used). In this case, the input and output devices can be tailored and can be made much more specific for the task.

Because at this point the correct combination of I/O devices for particular tasks are not known, an important consideration for the HCI group is the ability to configure devices to accept a variety of different input and output devices. This is a hardware, software and human interaction issue. Its a hardware issue because of the necessity for compatible connectors, its a software and human interaction issue because the different devices may not have the same functionality and support the same type of interactions.

We now recast the discussion we had with respect to particular input and output devices.

2. Input Devices

The following types of input devices were mentioned as possible for some tasks where a wearable computer would be desirable: Of these, we discussed speech and chording keyboards in some detail. That was because these are general purpose input devices that are generally available. Some of the other input devices - mouse and tab alternatives, for example, are generally available but are limited purpose and their use would be conditional on having tasks for which they are suitable. Others of the input devices - eye and head trackers, for example, are not yet generally available. What we will do for the remainder of this section on input devices is to enumerate the positives and negatives for speech and for chording keyboards.

2.1 Speech

Speech, if perfected, would be an intuitively appealing input modality. Its positives are that it allows for totally hands free input that can be couched in a manner that is easy to learn. The user is in control of the interaction and has a high branching function to take an interaction in a wide variety of possible directions.

Since speech is such an appealing input modality, we will focus on its negatives. We divide these into two categories: those intrinsic to speech and those that are a function of our current speech recognition technology.

Conceptual problems

There are three conceptual problems with speech:

  1. determining what utterances are intended to direct the computer and which are intended for a colleague or co-worker,
  2. prompting users who need assistance to recall the appropriate responses in any particular situation, and
  3. specifying a position in a two dimensional space.

Of these, the first is potentially most troublesome. The two techniques for determining the focus of attention of a speaking user are "press to talk" or "bracket" words. Press to talk means having a special button to push that indicates speech to the computer is about to begin. This simplifies the task of the speech recognizer but it negates the hands free advantage of speech. The user has to use a hand to push the talk button and now speech is on a par with other input devices. Bracketing is the use of special words such as "computer" to indicate the following utterance is directed at the computer. The use of bracket words is somewhat unnatural to users and, in any case takes some user training. That is, one solution removes the hands free advantage of speech and the other diminishes the easy to learn advantages.

The fact that speech is not good at specifying a position in a two dimensional space such as a map can be compensated by having multiple specialized input devices. A gesturing or pointing input device for position specification and speech for the remainder of the input. Changing modalities to accomplish a particular input task may not be that easy for users to learn.

Problems of current technology

The other problems associated with speech recognition are not intrinsic and can be expected to be removed as the technology improves. These are: We now discuss these problems.

Quality of recognition

The quality of recognition is a function of the size of the vocabulary, the acoustic characteristics of the environment and the microphone, and the quality of the recognition algorithms. A recognition rate of 90% still means that one word in ten is incorrectly recognized. Current speech systems operating in ideal circumstances have a recognition rate of over 95% (one error in twenty words).

Furthermore, some words acoustically are very close to each other and some sounds are difficult to recognize. Words that begin with soft sounds such as m, for example, are more difficult to recognize than words beginning with hard sounds. This leads to vocabulary tuning (choosing a vocabulary of words that are easily recognizable and acoustically distinct from each other).

Because of the error possibilities, some feedback mechanism must be given to the user. Presenting the utterances textually on a screen, for example, or repeating them through a headset. Thus, the use of speech dictates some type of output device that might not otherwise be required. Furthermore, once the user recognizes an error, there must be some mechanism for correcting the error. If this mechanism is the use of an alternative input device, why bother with speech? If this mechanism is to repeat the utterance, the user may get very frustrated if the system does not recognize the utterance soon.

Speed of recognition

The speed of recognition is a function of the speed of the main processor on the wearable computer. A software only solution seems to require a 125Mz processor. Wearable computers are approaching this speed. The alternative is to have some of the recognition done in a specialized processor packaged in a PCMCIA card. In this case, there are additional possibilities for electronic interference. Such cards are still in the beta stage and there is, as yet, no large body of experience with them.

Grammatical incorrectness

Speech recognition systems are based on providing a grammar for the utterances to be recognized. These systems are not very tolerant of incorrect grammar and "uhs" and interjections. The users must be trained to speak in a fairly constrained fashion. Again, this tends to negate the low training nominally required of a speech system.

Speakers with various speech impediments

Speakers that have speech impediments such as prolonged stuttering, or difficulty in speaking due to physical impairment are a population that speech recognition systems seemingly would have a great deal of difficulty in recognizing. None of the attendees at the workshop know of any research in this area.

Ambient noise

Loud persistent or intermittently ambient noise may degrade speech recognition systems. Filtering techniques exist to screen such noises and noise suppression microphones exist but the results of speech recognition systems are worse in such environments. The question is whether the degradation is large or small.

Configuration restrictions (microphone/language model)

Speech recognition systems are tuned for a particular set of microphones and for particular language models. Gender differences, for example, cause different language models to be used. These restrictions do not prevent the use of speech recognition but make it more expensive or more restrictive than it might otherwise be.

2.2 Keyboard alternatives

A keyboard is attractive as an input device because it allows a full range of textual input. A normal keyboard is unattractive in a wearable context because of its size and cumbersomeness of use. The keyboard has to be worn somewhere and then it has to be positioned for input. This conflict has given rise to alternative keyboard devices. The Twiddler is a one handed "chorded" keyboard that has been commercially available for quite some time. A chording keyboard is one where combinations of keys are punched to indicate particular letters.

We identified the following considerations associated with chording keyboards.

Positives

On the positive side a chording keyboard uses only one hand for input and requires no surface to mount it on (it can be held in the hand). It also has reasonable speed (50 words per minute is achievable), is inexpensive, requires low power, low bandwidth and is compatible will existing software.

Negatives

On the negative side the one handed requirement for input means that it could not be used for applications where the user must have both hands totally free at all times. There is a learning curve for the device and it is only suitable for textual input. There is no pointing capability inherent in the device.

2.3 Pointing devices

Both of the input devices we have discussed thus far, speech and a chording keyboard, have no ability to do pointing. As we alluded to in the speech discussion, the ability to point to a position on a screen is important for all direct manipulation interfaces and, more importantly for wearable use, for all applications where there is a figure of interest or a map on the screen.

These devices can be either joystick, joypad, or touchpad together with one or more selection buttons.

Positives

Pointing devices are intuitive, allow random access and positional input and are compatible with desktop interfaces. They are widely available and could provide a virtual keyboard by having a representation of a keyboard on the screen and pointing to the various keys desired.

Negatives

The interfaces that currently utilize pointing devices are resource intensive. They are inexact for precise coordinate specification and they are slow when used to provide a virtual keyboard.

2.4 Other input devices

The other input devices enumerated above were not subject to a detailed discussion and so we don't provide any discussion of them here. The sense of the group was that these devices are less available, less mature, or less useful than the ones that were discussed in some detail.

3. Output devices

The most appropriate output device to be used with wearable computers again depends on the task to be performed but a wide variety of possible output devices were mentioned by the attendees at the breakout. These were: head mounted displays (HMDs), flat panels, text to speech, tactile output, non speech auditory output, paper and olfactory output (scent). Of these we discussed in more detail HMDs and flat panels.

3.1 Head Mounted Displays

Head mounted displays provide a visual output that can be used without involving the hands. Thus, HMDs can be used in those tasks that require two hands. Furthermore, in the future it will be possible to align the HMD output with that of the real world and provide computer augmentation of reality.

Even without the augmented reality aspects of a HMD, they are visual output devices of reasonable resolution (currently VGA), they are always accessible and they can be totally private. HMDs can be "see through" or occluded. Those that are see through could be read by other than the wearer although with some difficulty but those that are occluded can not. Thus a HMD allows for private output.

Some of the problems with HMDs are ergonomic and some have to do with their capabilities. The ergonomic problems are headed by the resistance of some users to wearing such an ungainly device. This problem may disappear over time or may be alleviated if the user is performing an important task. In this case, the wearing of a HMD may be seen as an emblem of importance. In any case, there may be social resistance to the use of these devices.

Other ergonomic problems are the weight, comfort, glare and safety aspects of the devices. Some devices are not effective in bright sunlight.

Cost is another problem. Currently VGA quality HMDs may cost $3000 or more.

The technical problems are lack of resolution (for some applications VGA may not be sufficient resolution) and the fact that current displays are monochrome. These technical problems will likely disappear in the next year or two (although their disappearance may increase the cost of the devices).

3.2 Flat Panel Displays

Flat panel displays have the advantages of being relatively cheap, plentiful, available in full color, sharable if there are multiple people who wish to look at a display and useful for input.

On the other hand, they require the hands to hold them for output, a storage place when they are not being held. Glare is a problem as is weight, resolution and size.

4. Ergonomic considerations

The ergonomic considerations for using wearable computers partially have to do with the task for which they are to be used and partially to do with the fact they are wearable. The considerations just because the computers are wearable are: size, weight, comfort, and cables. That is, they should be small (although there is a minimum useful size for input devices), light weight, useful in heat and cold and have a minimum number of cables. Cables are bad because they get snagged on obstructions in the environment. The computers should also be easy to get on and off.

Some of the task dependent issues have to do with the amount of mobility required and the position of the user. The computers should be usable while standing, moving, and lying down. The position of the computer on the body should be variable depending on position of the user.

Safety is an issue when using HMDs as is sharing of output.

If the wearable computers are embedded in clothing then the size of the clothing becomes important if the computers are to be used by multiple people. If the computer is belt mounted then it can be used (sequentially) by multiple people but it is relatively fixed as to where it can be worn.

5. Summary and recommendations

As can be seen from the above discussion, there are multiple possibilities for both input and output devices and the capabilities and costs of these possibilities are continually changing. The correct devices for any particular application must be determined on the basis of the specifics of that application. Unfortunately, there is not a lot of experience with different applications and that experience is not widely shared. This is the basis for our recommendations. We would like to see experimentation with the use of wearable computers in different applications, we would like vendors to facilitate that experimentation by having different input devices be interchangeable (and output devices), and we would like to have a forum for those interested in wearable computing to report the result of these experiments.

5.1 Recommendation 1

There is a need for more information about the different types of applications using wearable computers. Currently, several different organizations, most notably Boeing, CMU and MIT are experimenting with wearables in different application contexts. Boeing's experiments involve manufacturing and maintenance, CMU's involve maintenance and MIT's involve wearable assisted living. The first recommendation, then, is to encourage organizations to begin experimenting with the use of wearable computers in different applications.This is likely to begin happening naturally and so the need is not only for the experimentation to occur but for the results of those experiments to be reported. This is our second recommendation.

5.2 Recommendation 2

Have a forum for the presentation of wearable computer related results. Not only experiments and experiences with the use of wearable computers in different applications but also new techniques, new hardware and new software can be presented in this forum. In any case, the forum should be public and neutral. Some possibilities are to have a periodic wearable conference, to have wearable elements identified in other conferences, to utilize electronic means of communication or all of these plus others. There is a need, however, for such a forum to enable us all to leverage each others results.

5.3 Recommendation 3

Manufacturers should have standards for input and output devices usable with wearable computers that allow the interchange of comparable devices. Given the large amount of experimentation that we see as necessary to determine the most appropriate devices to use in particular tasks, we need to be able to replace input or output devices without great difficulty. This suggests standard connectors.