PR2 robot plays music

It’s not exactly talented just yet, but it’s a start:

Here are a couple of things this robot also does:

It’s not exactly fast yet (notice the “15x” in the video?)

Another one where acceleration is more clearly visible because of humans in the background:

Clearly, robotics is making progress, but that also makes the gap between robots and animals more humbling. The other day, a dog was running alongside me while I was biking, and I couldn’t help but admire the agility of the run: 30km/h in the bushes, downhill on slippery gravel, avoiding a multitude of obstacles with a large variety of strategy (run around, jump over it, …) all the while checking where I was…

Thought recognition: what user interface?

Thought recognition is coming. New articles on this topic pop up regularly. But what would a thought-driven user interface look like?

The inception of the XL programming language began with questions like this. I was thinking more of speech recognition at the time. I was trying to figure out how it would be possible to use object-oriented programming (which I had just discovered back then) to program a speech-centric user interface. It turns out that it’s probably quite difficult.

The reason is relatively easy to explain. In a graphical user interface (GUI), you have a finite (and relatively small) number of objects on screen. You pick up one object, for example a menu, and then another, and so on. One of the key design features of the GUI is that it should be non-modal, i.e. at any given point in time, you should be able to pick this or that menu freely. This is very different from old text-based programs, where you would typically switch, for instance, between text editing mode, text formatting mode, page layout mode, printer selection mode, and so on. This basic tenet was a mindset revolution for programmers at the beginnings of GUIs. The original Macintosh Human Interface Guidelines insists on that point as early as page 12. Today, it’s much harder to find web pages explaining that fundamental aspect, because programmers only know about modeless programming.

But a speech-based user interface, on the contrary, is extremely modal. Everything depends on what was said before. For example, the word it in Find the Smith file and print it. I will often use the more general term vocabulary-based user interface (VUI), which covers all kinds of user interface where you “talk” to a machine. For example, with a voice mail system, the vocabulary can be digits you type on a keypad, like 1221 to get voice mail. The problem is that the vocabulary for speech can be thousands of words. So at any given point in time, you have thousands of possible modes.