Systems that track and detect objects (or, more specifically, people) have been in the news a lot. The media and the general public are starting to worry about what may happen with these technologies: cue discussions about China and Black Mirror. While these end results are, in fact, important to discuss, a team at IBM Research has been focusing more on the specific spatial relations system.
[Related Article: The Best Open Source Research at DeepMind in 2019 So Far]
In the paper “Where is My Stuff? An Interactive System for Spatial Relations” by E. Akin Sisbot and Jonathan H. Connell, they present a new system to track, detect, compute, and communicate what it sees in plain language. So too, the system is able to understand the relationship between objects and give you directions based on the relationship.
Their system uses a Microsoft Kinect RGB-D camera and an array microphone mounted on the ceiling, with the camera focused on a target work area to detect and track objects. They’re even able to expand the work area by adding or using multiple cameras. With these tools, there are three major systems within the program:
- Object and Person Detection: The system sees people as “stalagmites coming up from the floor, with constraints on head height, head size, shoulder width, etc,” with objects modeled as bumps on the work surface. With these tracking models, the system can also understand if an object has been moved, if someone is sticking an arm out, and where their hands may be.
- Spatial Relations: The system understands where something is in relation to the rest of the room. It knows in, on, near, as well as things like, “last touched by.”
- Dialogue: The system can hear and answer questions to tell you where something is (with the above two programs). Included in this is the ability to distinguish when or if you are addressing the program.
[Related Article: Best Releases and Papers from OpenAI in 2019 So Far]
Overall, it’s a very interesting system, and it’ll be exciting to see what IBM does with the technology. They suggest a mobile robot with the system to reduce privacy issues, as well as a potential proactive behavior so the system can learn what objects you use and suggest places they might be, even when they’re out of view.