A mother getting ready to prepare dinner at home opens a cooking app on her iPad or SmartTV. Using touch and voice she interacts with the app, telling the video to pause, fast forward, or replay a specific section. If she chooses, she can tap on the screen instead of using her voice.
Using multiple modes of interaction is what multimodal design is based on.
Before we explore multimodal design, let’s begin with a basic understanding of two types of interactions: computer-to-human and human-to-computer. Each of these interactions includes various modalities such as voice, touch, and taptic feedback.
Computer-human (or computer-to-human) modalities help a computer understand what the user wants. The most common computer-human modalities rely on vision, hearing, and tactile abilities. A few examples are computer graphics, audio playback, phone vibrations, and smartwatch taptic feedback.
We’ve invented several ways to interact with computers. A few examples are keyboards, mice, touchscreens, trackpads, and speech recognition. In this case, users rely on various mechanisms to communicate with and command computers.
More complex examples are accelerometers, gyroscopes, and magnetometers that help with motion detection. Think about playing a game of tennis on a console and using the gamepad to emulate the movement of the racket. This brings many more opportunities to create a unique and engaging [multimodal] user experience.
Why Multimodal Design
The idea behind multimodal design is to combine multiple modalities in order to improve a product’s user experience. As everyone uses products in different ways and in different contexts, users are provided with several feedback mechanisms and given multiple ways to interact with their computer.
Designers make life easier for users by incorporating and automating actions through different modalities. If there was only one modality mechanism, it would negatively affect the user experience and the design would “fail” in the mind of the user.
An example is a car infotainment system. Most of these systems allow users to interact with voice and touch. When driving, the obvious choice is to use our voice for making a phone call or navigation, but when parked, it’s most likely easier to use the touchscreen or a scroll wheel to interact with the system.
Here are a few more examples we commonly find in multimodal design:
- A graphical user interface relies on our vision to interact, for example, with a website or a digital billboard.
- A voice user interface relies on our auditory capabilities to interact. This includes any voice assistant, such as Alexa, Google Assistant, or Siri.
- Haptics, gestures, and motion rely on our perception of touch (tactile abilities) to trigger an interaction. Receiving a message or swiping left to skip a song are two examples.
Multimodal design is also helpful when designing for people with certain limitations and disabilities.
A Simple Multimodal Design Example: The SmartHome
Robert approaches home after a long day. His home automation system is triggered once he gets within a mile of his garage. The system recognizes that he has arrived and starts a sequence of automated actions. For example, turns on the lights, adjusts the heating and cooling, and deactivates the alarm system.
Next, Robert could either use a remote control or ask the AI-enabled assistant to turn down the heat when he walks in.
A Complex Multimodal Design Experience: Health
We are now capable of capturing more complex inputs from users using smart devices. We can measure inputs such as stress levels, heartbeat, sleep cycles, water intake, and in the near future, glucose levels.
Once these inputs are stored, devices and services such as Fitbit and First Aid by The Red Cross provide valuable, lifesaving warnings in the form of a vibrating alert, a taptic “tap” on the wrist, or an audible alarm.
This is a more complex use of multimodal design because the balance of inputs to outputs needs to be calculated correctly. The design must not give any false alarms, potentially sending the user into a panic.
Whether designing simple or complex multimodal experiences, one of the best ways to better understand multimodal design is to start designing with it in mind. Let’s look at how we can achieve this using Adobe XD.
Prototyping a Multimodal Experience in Adobe XD
Adobe XD, a popular UX design tool, recently added voice commands and playback to its arsenal of features. By taking advantage of them, we can add modalities such as speech and audio playback to create a multimodal user experience.
As an example, let’s prototype the mobile journey for a cooking app. A chef is showing how to cook a steak, and people can tell the app to pause, repeat, or continue using voice or touch.
We first prototype all the screens that are needed to illustrate the experience:
Next, we add a voice command that will emulate a voice modality. In prototype mode, we start by connecting the first and second screen. Then we select voice under trigger and write out an utterance under command to trigger this transition. If we want to add two or more voice commands, we will need to add a connector for each one.
For the next screen, we want the system to answer the user. We do this by creating a time trigger and adding speech playback under action. Since we want to create an immediate reaction, we set time to 0 seconds.
We can also add traditional triggers. In this example, we’ll add a tap trigger on the second element of the list. When the user taps on this element, the app will advance to the next screen. Combining both voice commands and touch commands is a great example of providing a better, more thoughtful user experience using multimodal design.
Next, we want to illustrate how the user can pause and continue the experience within the app. Since we’re designing this app with the Amazon Echo in mind, we want to add a voice command like “Echo, pause.”
To make the video continue, we’ll perform the same action by adding the voice command “Echo, continue.”
This is a basic example of multimodal design using the voice trigger. Additional triggers include tapping, dragging, and using the keyboard or a gamepad to control the prototype.
It’s easy to fall into the trap of using triggers simply because they exist. To design a better user experience with multimodal design, designers will want to test and learn which interactions make the most sense, and at what time.
Multimodal Design and Mental Models
When designing using modalities, it’s important to remember that users have a preconceived set of expectations (mental models) for how an interaction should occur. For example, most users expect a screen to move up when they scroll down on a trackpad or with a mouse scroll wheel.
Note that in many cases these mental models are still being formed. Shaking the phone is an example. It’s still an obscure interaction as some vendors use it to “undo” typing, while others use it to shuffle songs.
It’s important to be aware of these mental models when choosing which modalities to put into the product design. Using familiar modalities can enhance the user experience. Modalities which are still being formed could confuse users and degrade the experience.
Emerging Modalities: Conversation Design
Two modalities gaining a lot of traction are chatbots and voice user interfaces. Sometimes referred to as conversational user interfaces, the primary focus is text and speech interactions.
A chatbot can make use of an interface to receive inputs such as text, and is capable of showing graphics, links, maps, and conversational dialog. Some chatbots can take commands via voice, and then display the results as text or using synthesized voice.
Pure voice interactions are also emerging. Think of the expansion of Siri or Alexa to smart home devices where users aren’t typing anything, but instead are having full interactions with voice only. This is important for designers because almost every experience in conversation design is a multimodal one.
A great example is Lily from Maybe. A bot that teaches you Chinese (and other languages), and works on different channels. Conversations can take place on the app or by talking to the bot.
Using voice, touch, text, and taptic feedback, multimodal design combines different modalities for creating better user experiences. Computer-human and human-computer interactions can be combined to build unique product experiences.
Multimodal design also presents new opportunities and challenges for designers. Tools like Adobe XD make it easier to prototype products using various modalities, but there is an art and science to using them together.
Striking that perfect balance, combined with the emergence of new modalities, will challenge designers to raise the bar on improving user experiences.
• • •
Further reading on the Toptal Design Blog:
Understanding the basics
Multimodal design utilizes multiple modalities to improve the user experience in products. This includes computer-to-human and human-to-computer interactions such as voice, text, touch, and taptic feedback. Designers integrate these types of interactions into products and combine them for a better user experience.
Adobe XD is a design tool that can be used at several stages of the design process. It is best known for prototyping; however, it can be used for wireframing, mockups, and as a design system for communicating with developers.
Multimodality, the interaction between computers and humans, is important because it improves the user experience by introducing several interaction mechanisms. With multiple modalities at play, users have more choices when interacting with products, which means there is less of a chance for UX “failures.”
Both Sketch and Adobe XD are excellent design tools. Sketch isn’t better, nor is Adobe XD. Adobe XD, being newer to the scene, is rapidly catching up to Sketch. The main difference is that Sketch makes use of plugins to extend functionality, whereas Adobe is trying to include everything natively within XD.
Adobe XD is part of Adobe’s Creative Cloud suite. Create Cloud includes other Adobe applications as well as XD and allows for cloud storage and automatic updates.