robots A-Zrobot timelinegamestv & moviesfeaturescontactyoutuberss feed
bg logo logobump

• Mertz

Like MIT’s Kismet and KIST’s FRi, Mertz is an active vision head robot that recognizes and reacts to faces and expressions built in 2004 primarily by Lijin Aryananda and Jeff Weber at the MIT Media Lab.  Its main purpose was to research socially situated learning similar to an infant’s learning process, so it was programmed to track faces and bright objects and could repeat the sounds of words extracted from audio data.  Unlike other robots that typically don’t get out much (interacting mainly with researchers in the lab), Mertz was designed to be more like a creature that could “live” around people and absorb information for many hours.  After being placed in different locations and interacting with people over the course of a few days, they hoped the robot would learn to recognize specific individuals and even repeat common words or phrases.

As you might expect, some of the common words were greetings like “hello” and “hi” as well coupled words like “good boy” – and interestingly, both male and female voices were higher pitched when speaking to the robot as opposed to other people.  Given enough time and interaction, it appears as though a robot could begin to learn a lexicon of socially-relevant words despite the multitude of differences in speakers.  Left alone, the robot attracted enough people to it that it could also capture plenty of face images for its database.  It was then able to detect multiple faces in its field of view when several researchers interacted with it simultaneously.

Mertz has a total of 13 degrees of freedom, including individually actuated eyebrows and lips, and has two cameras for vision and a microphone.  It’s about 25cm (10 inches) tall and weighs 2kg (4.25 lbs). One of the main issues involved is how to interact with while simultaneously learning from a person.  Another problem was the ambient noise level of the robot’s surroundings.  In the lab it was quiet, allowing the robot to more easily parse words, but in the lobby it was very noisy.

[source: Mertz @ MIT]


Image credits:
MIT Media Lab