“There Came an Echo”. As I’ve previously mentioned, I’ve work on a bunch of voice driven systems, from games to training sims to psych evaluation robots. Each project has its quirks, but they generally fall into one of two designs. Either we offer choices, like this:
Agent: “And that’s when she kinda, ya know, set my bunk on fire…”
User: A.) “Wait, hang on, what? I kinda… spaced out there for a second.”
B.) “Corporal, fires are not allowed inside the barracks. You know that.”
C.) “That’s hot.”
Or they’re open ended, like this:
Agent: “How do you feel about your relationship with the electrodes currently attached to your skull?”
User: “Uh, I dunno, fine I guess?”
Agent: “That’s great!”
See the difference? In the first example, we display three things you can say, and ask you to pick one. This has some advantages; your system understanding only needs to handle three utterances, you can hit specific story or training beats, and the user is guided towards the outcome you want (and can handle). As for the latter, that’s a whole other story. Let’s try another open ended example:
Agent: “How was your day?”
User: <Absolutely anything, Agent is just pretending to listen>
Agent: “Wow, that’s great. Let’s move on. What’s your favorite place to eat lunch?”
Hmm, not so great. No matter what you say, as soon as you’re done speaking, the system moves on. Why? Understanding anything the user can say is a huge challenge. There are systems that can do it, but then what do they say in response? Have you recorded something for every possible user statement? Oh, TTS, sure, but what if instead of a charming story about their day, they discuss their childhood? Or say nothing? Or describe the worst day of their life? How do you handle that? How do you even understand it? That’s the challenge in open ended systems, and that’s why…
Dodging the Answer
Agent: “Pick option A, B, or C based on what score you want, and you can tell the ‘Good’ answer because it’s the formal one.”
User: A.) “Derp.”
B.) “Formal policy is formal.”
C.) “I wasn’t listening.”
Agent: “What would you like to do?”
User: <3 more options, maybe color coded based on the morality system in the game>
Offering prompts doesn’t just make things easier for the designer, it makes things easier for the user. The biggest issue I’ve run into with open ended dialogue systems is when the user doesn’t know what to say. This makes them feel powerless, confused, and gets frustrating fast. The user should know what to say and when to say it. Prompts help, and save everybody time.
Guiding the user doesn’t always require prompts, though. Sometimes it means asking the user to listen along with the agent. This can be obvious, or it can be subtle. For example, user walks into a bar, bartender says, “Howdy!” What’s your first response? It’s “Howdy”, right? Or, “Howdy, partner!” if you’re formal. Maybe, “Hey there, pilgrim.” but you get the idea. For greetings, people know what to say. After that, it’s the wild west, but that’s where good dialogue design can move things along.
Bartender: “What can I do you for, partner? Maybe a drink?”
User: “I’ll take a drink.”
Bartender: “Here, this one’s on me. What brings you to this here saloon?”
User: “I’m looking for the bad guy from the box art.”
The agent comes across as intelligent by anticipating the user’s behavior based on context. You just walked into a bar, maybe you’re looking for somebody, but you definitely need a drink. Every statement from the Bartender is designed to cue the user to the next piece of dialogue. Now what if I told you that that Bartender is D-U-M-M dumb? If the user does anything but order a drink or ask about Sir Appearing on the Box Art, this scenario breaks down. But they don’t. Users almost always ask for a drink. Why? They just walked in to a saloon in the wild west. There’s a bottle on the bar. The art around the characters informs the user’s expectations, so they feel they “know” what to say, even if that’s been drilled into them by every inch of oak plank under their feet and bottle of rotgut in front of them. That, and no matter what they say, the bartender offers them that drink…
For most users, this’ll work, and with testing you can optimize noteworthy interactions so even if a user does bounce off a mismatched line, they might not notice, or will brush it off. What interactions are noteworthy? Anytime a character is introduced, after big story points, and, sometimes, finales, but there’s some wiggle room there. You’d be surprised what you can get away with when people are ramping up for a boss fight*. Let’s look at that interaction again.
You’re standing outside the only saloon in town. There’s no one on the streets. You study the wanted poster in your hand, roll it up, and slide it into your bag. You push through the bat wing doors and the Bartender looks at you. He folds his bar towel over his shoulder and sets three glasses on the bar.
Bartender: “Howdy, you lookin’ for someone or just passing through?”
User: “I’m looking for a drink.” / “I’m looking for the Box Art Kid.” / “None of your business.” / “Just passing through.”
Bartender: “Here, this one’s on me. Now, I don’t want no trouble, but you best be careful ‘round these parts…”
Almost no matter what the user says, the Bartender’s line works. Even better, the user feels they know what to say. There’s a question they know the answer to. Nothing on screen, no prompts, but a guide nonetheless. And just to top it off, if you do use a prompt, you get 4 dialogue options to the user and only 1 line for your actor playing the Bartender. Also, there’s the scene building.
The what? The wanted poster? You must be the law. Why are the streets empty? Oh, the three glasses? Somebody else is here. Who is it? Let’s find out.
User: “Why?” / “This your place?” / “Is somebody else coming?” / “Thanks.”
Bartender: “Look, Ranger, I appreciate the company, but you need to keep an eye out for The Box Art Kid.”
And in one line we dodge a hundred questions while establishing something about the player and their quarry. This cuts the Bartender’s part down. No offense to the actor, but we have plenty of other parts for him, this is our epic western.
So, yes, I want everyone to get every pe
rfect line of dialogue into their game. Every single line is special. But some are more special, that’s why we cut half of what we write. Then we cut down to what we can afford.
Whether you have 2 lines of dialogue or 10,000, every line needs to serve a purpose. Every line needs to do something for the player, whether it’s explain a story point, make them laugh, or make them wonder what terrible secret has just been revealed in this post credits scene. Every line also has to be paid for. It’s not just the price for a studio, an engineer, a director, an actor, but also dialogue post, animation, narrative design and… I’m forgetting something… oh right, a writer! Everybody should be getting paid. Yes, they often work for less than scale, yes, it’s a favor, or a labor of love, but putting a dollar value on each line reveals what is important, and what can maybe get cut while the writer cries in the corner?
* In the mixed reality system I’m borrowing some of this material from, we asked people what The Box Art Kid said before the big showdown. They couldn’t tell us. Ever. It was all, “Something something then he shot me.” Adrenaline will do that.]]>