Ankan Bansal’s long journey into the world of computer vision


Think back to what you were doing the summer after your freshman year in college — for many of us, that likely didn’t include working on a project that would inform your educational path, influence the focus of your career, and lead to moving more than 7,000 miles from home. But that’s exactly what Ankan Bansal did.

Born and raised in Uttar Pradesh, a state in northern India, he had always loved science and math classes as a kid — the latter especially because of his teacher Lokesh Gupta, who he credits with fostering his love of math. So it was no surprise Bansal majored in engineering when he headed to the Indian Institute of Technology Kanpur in 2010. He looked for ways to satiate his curiosity, inspired by watching the Discovery Channel as a kid, and found the robotics club.

“It was so cool to design something and see it move and do things that you want,” he said.

Bansal spent the summer break between his freshman and sophomore year making what he calls a pretty simple robot. “It just went up to a shelf and picked up a book — you could specify what book you wanted — and it brought it back to you,” he said.

Related content

An advanced perception system, which detects and learns from its own mistakes, enables Robin robots to select individual objects from jumbled packages — at production scale.

What he found most interesting about the process was the computer vision or image processing aspect of robotics. That interest drove his master’s thesis, which was about “estimating the number of people in images of high-density crowds,” said Bansal.

After earning his master’s in electrical engineering in 2015, Bansal decided to make a big life change, moving more than 7,000 miles to attend the University of Maryland to pursue his PhD because the school had “such strong computer vision faculty”.

He was drawn to the work of Rama Chellappa, Larry Davis, and David Jacobs. He was so impressed with Chellappa’s work, in particular, that he chose him as his PhD advisor. His thesis was “essentially trying to figure out who is present in an image and what objects are present in the image and how each person is interacting with each object,” Bansal said.

He earned his doctorate in 2020, and computer vision research is what informs his work today at Amazon as an applied scientist.

A path to Amazon

His road to Amazon was all about exploration: He did two internships, which he said helped him figure out exactly what he wanted to focus on within computer vision.

Related content

Method that captures advantages of cross-encoding and bi-encoding improves on predecessors by as much as 5%.

The first internship focused on semi-supervised learning. He wasn’t sure what to expect, because he knew Amazon was a big company, and it had a lot of “very smart researchers” working in computer vision.

“I was really excited and nervous, because I was just a student, I didn’t know what I was going to do, and whether I’d be able to achieve the targets,” he said. But he quickly discovered he was in good hands with his internship mentor, Avinash Ravichandran, an AWS AI principal scientist.

That first experience spurred him to return to Amazon for another internship with a different team, this time in Pasadena, California. Even before he started his second internship, he was in touch with his internship supervisor, Yuting Zhang, an AWS senior applied scientist. They discussed possible areas of focus, eventually settling on a project that entailed visual question-answering.

“The idea is to develop an AI system that can answer natural language questions about a given image,” he explains.

A new approach

Zhang, Bansal, and fellow team members developed a modified version of this problem called image-set visual question answering. “Instead of just one image, you have a set of images, and you have a question about that set, and you want to answer that question,” Bansal explained.

Related publication

We introduce the task of Image-Set Visual Question Answering (ISVQA), which generalizes the commonly studied single-image VQA problem to multi-image settings. Taking a natural language question and a set of images as input, it aims to answer the question based on the content of the images. The questions can be about objects and relationships in one or more images or about the entire scene depicted by the

That approach advanced the thinking about this problem enough that he and Zhang, along with Chellappa, wrote “Visual question answering on image sets,” a publication which was accepted at ECCV 2020.

“We created and released two large-scale datasets to enable more research in this direction. These datasets represent real-world scenarios of indoor and outdoor image collections. In the paper, we also explored strong baseline models to investigate and demonstrate the challenges associated with this novel task,” Bansal said.

“Instead of jumping into the solution design right away, which is a pitfall many graduate students fall into, Ankan spent a time defining the topic with real-world examples and tackling the data collection challenges unique to this topic,” Zhang recalled.

Related content

Today she’s helping Amazon to better formulate how to more efficiently transport packages through the middle mile of its complex delivery network.

Zhang added that Bansal organized his experiments well, communicated effectively, and also demonstrated backbone in debating his colleagues on project ideas and direction. With that in mind, at the end of the second internship, “Ankan received a full-time return offer from me,” Zhang said. “After he got a few offers from other companies, I tried to give him more introduction to the real-world customer problems we were working on, which excited him — an indication of culture fit for Amazon. He chose Amazon.”

“Receiving the offer was very exciting because I had enjoyed working with the team and had good rapport with them,” Bansal said.

Bansal’s current focus is on AnalyzeExpense, a feature of Amazon Textract, which uses computer vision and machine learning to analyze receipts and invoices to enable customers to extract useful information from such documents.

Looking forward, Bansal said he’s interested in multimodal learning. “What I would like to do is come up with new models or new directions, which can be applied to more documents, and not just invoices and receipts.”

An open mind

Bansal’s advice for anyone interesting in following a similar path as his is to cultivate thoughtful openness and focus on problem-solving skills. He said to keep in mind that projects at Amazon are inspired by specific customer problems, so everything works backwards from there.

Related content

Oritseweyinmi Henry Ajagbawa utilized causal inference to help examine the interaction between changes in marketing content and Amazon customer behavior.

“Students should always keep an open mind, because there are a lot of interesting problems which might not match what they are doing in their PhD. But they are still important and challenging problems, which could lead to good products and publications,” advised Bansal.

Maintaining an open perspective extends beyond his work: This past new year, Bansal shared a post about his charitable giving to encourage others to do the same. It resonated with many.

Bansal has been pledging around 5% of his salary every year to charities that support health and education in the developing world, especially India, bringing the fruits of his labor back to the place that first inspired it. He recommends choosing one or two areas to help to avoid getting overwhelmed, and focusing on the vetted charities featured on sites like GiveWell.

“I decided to try to encourage or try to inspire some more people to donate to these effective charities,” he said. “It takes a very small amount of money to help people or even save someone’s life.”





Source link

We will be happy to hear your thoughts

Leave a reply

Rockstary Reviews
Logo
Shopping cart