For humans, finding and fetching a bottle of ketchup from a cluttered refrigerator without toppling the milk carton is a routine task. For robots, this remains a challenge of epic complexity.
At Amazon, scientists are addressing this challenge by teaching robots to understand cluttered environments in three dimensions, locate specific items, and safely retrieve them using a move called the pinch grasp — that unique thumb-and-finger hold that many people take for granted.
The research is part of an ongoing effort in the field of item-specific manipulation to develop robots that can handle millions of items across the kaleidoscope of shapes and sizes that are shipped to customers every day from Amazon fulfillment centers.
Watch the pinch grasping arm sort through items
We humans find and retrieve specific items with hands that are loaded with nerves connected to the brain for signal processing, hand-eye coordination, and motion control.
“In robotics, we don’t have the mechanical ability of a five-finger dexterous hand,” said Aaron Parness, a senior manager for applied science at Amazon Robotics AI. “But we are starting to get some of the ability to reason and think about how to grasp. We’re starting to catch up. Where pinch-grasping is really interesting is taking something mechanically simple and making it highly functional.”
This catching up is powered by breakthrough machine learning capabilities aimed at understanding the three-dimensional geometry of cluttered environments and how to navigate in them, according to Siddhartha Srinivasa, director of Amazon Robotics AI.
“Not only are we able to build robust three-dimensional models of the scene, we’re able to identify a specific item in the scene and use machine learning to know how best to pick it up and to move it quickly and without damage,” he said.
From suction to pinching
Today, vacuum-like suction is the default technology for robots tasked to pick up and move items of different shapes and sizes. These robots typically have elastic suction cups that form to the surface of the item to be lifted, creating a tight seal that provides control. The process works well across a range of items, from gift cards to cylindrical poster tubes.
Watch the Robin robotic arm deftly handling packages
Challenges occur if a vacuum seal breaks prematurely, which can happen when the angle of attachment changes during motion.
“If you are moving really fast from one location to another, objects can swing out and then just fly away,” said Can Erdogan, a senior applied scientist at Amazon Robotics AI. “All of the sudden, there are items on the ground.”
Increased suction to prevent premature detachment can also cause damage such as blistered or ripped packaging.
In other instances, the item to be moved requires contact on more than one surface. Books, for example, flop open if lifted from only the front or back cover. Another challenge is to get a tight seal on bags filled with granular items such as marbles or sand.
Pinch-grasping mimics the firm grip of a hand, enabling the robot to safely move the item from one place to the next without dropping it or causing damage.
“We are not just interested in picking up an item. We want to move the item,” Erdogan noted. “To do that, you need to be able to control it.”
Getting a grip on the scene
People who are sighted can estimate the shape of an item they intend to move, even when part of it is obscured from view. Take the ketchup bottle in the refrigerator: Even if only the top of it can be seen, experience and context allow people to imagine the full shape. We automatically create a mental model of it and a plan to grasp and move it without spilling the milk.
One of our big investments was making sure we can visualize the scene from multiple cameras and fuse all of that information as fast as possible so that we can get the full shape of the objects.
“Our robots are not quite there yet, but to be able to grasp this item from the front and back, we need to understand this whole shape,” Erdogan said. “So, one of our big investments was making sure we can visualize the scene from multiple cameras and fuse all of that information as fast as possible so that we can get the full shape of the objects.”
This 3D scene understanding is generated by multiple camera angles along with machine learning models trained to recognize and estimate the shape of individual items that help the robot compute how to grasp the item on two surfaces.
A set of motion algorithms take this understanding of the scene and item identification and combine it with the known dynamics of the robot — such as arm and hand weight — to calculate how to move the object from one place to another. The fusion of these models allows the robot to execute a pinch grasp and move something without bumping into other items.
In addition, multiple cameras provide a set of eyes on the scene — also known as continuous perception — to monitor the grasp and movement of an item so that the robot can adjust its plan of motion as necessary.
That’s an advance for robots, which typically “look at the scene, make a decision of what to do, and then do it. It’s almost like they close their eyes after they decide what to do, which is quite a shame because there are things going on in the scene while you’re doing it. Most of the damage to items happens in those moments,” Erdogan said.
Move fast, don’t break things
An advantage of suction is speed. That’s because contact is on a single surface. This allows a robot to quickly pick and move items such as chocolate bars from a shelf to a box. Grasping an item on two surfaces is more complicated, and thus takes longer, Erdogan noted. To make up for the extra time spent on a pinch grasp, the team optimized the robot arm to move faster.
“If you have a better grasp on the item, you can move faster. Moving faster also means you can take your time to achieve these good grasps,” he said. “We are lucky we have collaborators on our team who are focusing on motion, and we did this nice optimization where we made both the grasp and the motion faster.”
In preliminary tests, the team’s prototype pinch-grasping robot achieved a 10-fold reduction in damage on certain items, such as books, without a loss of speed when compared to robots that use suction.
“They not only showed they could grip a lot of objects, but they did it really fast — they got to 1,000 units per hour,” said Parness, who oversees the project.
The ability to grasp a diversity of items and move them quickly without damage makes pinch-grasping well suited for eventual deployment in an Amazon fulfillment center.
“What’s interesting about e-commerce, as opposed to manufacturing, is it’s much more dynamic,” Parness explained. “It’s a pen, and then it’s a teddy bear, and then it’s a light bulb, and then it’s a t-shirt, and then it’s a book.”
Fulfillment automation
For deployment in an Amazon fulfillment center, a key challenge is to generalize the robot’s item specific manipulation capability to all items available in the Amazon Store, noted Srinivasa.
“A majority of the items the robot is going to encounter in production it’s probably never seen before, so it needs to be able to generalize effectively to previously unseen items,” he explained. “Humans do this, too. When we see something novel, we try to map it to the nearest thing that we have encountered before and then try to use that experience from that task and modify it for a new situation.”
Another challenge is to gear the robot so that it can effectively manipulate the vast range of items available in the Amazon Store. For now, the robot uses an off-the-shelf hand to manipulate items that weigh less than two pounds, about half of the items available for purchase.
We can get to the questions that are relevant for the world of robotics in a very data-driven way. Once you have those questions, answering them is a joy. And when you answer them, you know how impactful they can be.
Going forward, the team will need to design a hand — and associated tools — from scratch that can handle the full range of available items, Erdogan said.
What’s more, while pinch-grasping is superior to suction for some items, suction is better for others, especially flat items such as cards and rulers. A robot optimized for deployment in a fulfillment center may require suction and pinching, along with a machine learning algorithm that’s trained to decide which technique to use for any given situation, Parness said.
“As a person, you pick up a book differently than if you pick up a coin or a t-shirt,” he explained. “We need robots to be intelligent about the items they’re manipulating. If I’m picking up a hammer to hammer a nail in, I want to grasp it in a certain way. But if I’m picking up a hammer to go put it in a box to ship it to you, I want to grasp it in a different way. That’s the future of item intelligence.”
Amazon’s size, scale, and mission enable this level of robotics research, Srinivasa said, and it also enhances the effect it can have in the real world. For example, working within Amazon provides scientists with access to data on current item damage rates and models that show the improvements required to justify the investment in robotics. This provides a focus for his team’s scientists and engineers.
“We can get to the questions that are relevant for the world of robotics in a very data-driven way. Once you have those questions, answering them is a joy,” he said. “And when you answer them, you know how impactful they can be.”