3 questions with Seyed Sajjadi: How to utilize a video analytics platform to automate the process of learning

July 9, 2024

0 Views 0

SaveSavedRemoved 0

Editor’s Note: This interview is the latest installment within a series Amazon Science is publishing related to the science behind products and services from companies in which Amazon has invested. In 2019, the Alexa Fund first invested in nflux.ai, and then in 2020 participated in the company’s seed round.

In 2018, Seyed Sajjadi was pursuing a master’s degree in computer science at the University of Southern California (USC) when he decided to drop out and found nFlux.ai. While pursuing his master’s degree, he also was working as a project manager at the Systems Engineering Research Laboratory (SERL) research laboratory at California State University in Northridge, Calif.

At the University of Southern California, Seyed Sajjadi focused on the development of Sigma, a cognitive architecture and system. One outcome of the research was this paper, which Sajjadi coauthored with computer science professor Paul Rosenbloom and other USC collaborators.

At USC, Sajjadi was working as a member of the Cognitive/Virtual Human Architecture lab under computer science professor Paul Rosenbloom. There, he focused on the development of Sigma, a cognitive architecture and system that strives to combine what has been learned from four decades of independent work on symbolic cognitive architectures, probabilistic graphic models, and more recently neural models. One outcome of his research there was a paper, “Controlling Synthetic Characters in Simulation: A Case for Cognitive Architectures and Sigma”, which Sajjadi coauthored with Rosenbloom and other USC collaborators. The paper was accepted to the 2018 Interservice/Industry Training Simulation and Education Conference (I/ITSEC).

At SERL, Sajjadi led an interdisciplinary team of more than 90 engineers and human factors researchers focused on building the next generation of robotic search-and-rescue systems with artificial intelligence. It was here that Sajjadi and colleagues began thinking about forming nflux.ai, inspired, he says, by the fictional character J.A.R.V.I.S. (Just A Rather Very Intelligent System) from the Marvel Cinematic Universe film franchise, and a vision for how artificial intelligence systems can augment humans in positive ways.

Amazon Science asked Sajjadi three questions about the challenges of developing cognitive architectures, nFlux’s focus on imitation learning within the manufacturing sector, and how the company’s technology could eventually be relevant to Alexa customers at home.

Q. What is a video analytics platform, and how does it enable what you call procedure monitoring?

nFlux is the first intelligent video analytics platform that automates the process of learning and generating contextual insights from the unstructured data inside video footage. One of our goals is to pass a Turing test for video comprehension. Imagine there is a woman sitting at a desk looking at a video on her computer. We want to develop a video comprehension system that can answer any question about that video with the same level of comprehension as the woman.

Our first customer was NASA, and right now we’re working to build a system similar to HAL 9000, the fictional AI character in the Space Odyssey series. HAL 9000 is a general AI system that can mimic the way humans think, behave, and take actions. Ironically, Space Odyssey is centered around a deep-space mission. Today, if astronauts have a question, they call Houston, and someone at Johnson Space Center answers their questions. But as we embark on deep space missions, such as Mars, where there is a 40-minute delay in communication, that method of communication isn’t practical. So we want to provide an intelligent system on the spacecraft that can understand what the astronauts are doing and assist them by augmenting what they’re capable of doing on their own.

That’s what we refer to as procedure monitoring, which is the core of the innovation we’re developing. Our objective is imitation learning, or learning by demonstration. If an astronaut is performing a procedure, our objective is to capture that procedure via video with a minimum number of examples, say 10 or 15, which in machine learning is a tiny sample size. But from that small sample size we develop a computational model so that if another astronaut has to perform that same procedure in the future, we can track that. If in performing that procedure the astronaut deviates from the procedure, perhaps by missing a screw, our system can recognize that in real time and alert the astronaut.

That’s really the core of what we consider procedure monitoring, or the astronaut-assistant technology we’ve been developing. One of the keys to our video analytics platform is its ability to learn from a minimum number of videos. That’s significant.

But for those algorithms to infer from a small set of data, they are extracting basic signals from our base models. This is possible since the agent can be augmented with prior semantic knowledge of key activities, such as tethering, drilling components, etc., and can recognize key components — objects, tools — of each step from synthetically generated data. This technique is inspired by the way humans ingest information as they watch a new procedure they have never seen before. We are capable of recognizing the key activity being performed even if we have not previously seen the objects/tools being used, and can deduce the steps required to successfully complete a procedure.

Q. How is nFlux technology being applied within the manufacturing sector?

Despite the perception that robots have taken over the manufacturing floor, seventy-two percent of manufacturing work is still done by humans. Six million people here in the United States go to work every day to perform a certain set of procedures. As that person on the manufacturing floor is doing her job, we can capture any deviations in real time.

Our system can be a virtual teacher or instructor helping train a new employee, or an existing employee who’s learning a new procedure. This is extremely valuable to manufacturers because it reduces production cycles. If they can train employees faster at their manufacturing facilities that translates into millions of dollars in manufacturing time. It also impacts the quality of their products. The better a manufacturer’s employees are trained, and the more standardized their procedures, the lower their defect rates. Those are two critical elements to any manufacturer.

Our technology also helps in capturing what we refer to as tribal knowledge. In many complicated manufacturing environments, training can’t be provided on a piece of paper, instead you need a computational model derived from video of how the procedure is conducted properly. That computational model can help train new employees as they come on board, monitor their work to ensure they’re following procedures properly, and act as that intelligent assistant for your manufacturing workforce. nFlux isn’t designed to replace the workforce, it’s there to augment the work they’re doing. Ultimately, this reduces the amount of rework required to output high-quality products from that manufacturing plant

Q. The Alexa Fund is an investor. So how could your computational model be relevant to Alexa customers?

Imagine, says Sajjadi, that as you were cooking the Echo Show 10 was watching you and could alert you if you missed an ingredient. That, he says, would be an example of taking procuedure monitoring from the shop floor to the kitchen.

An Echo Show with a screen was first introduced in 2017, and since there have been subsequent generations, including the new Echo Show 10, which first became available earlier this year. These devices support multimodal experiences, providing Alexa greater context and an understanding with vision. These multimodal Echo devices tend to be in the kitchen and one of the most popular uses is for cooking, and following cooking instructions in real time. Imagine if as you were cooking the Echo Show 10 was watching you cook and alerted you if you missed adding an ingredient. That would be an example of taking procedure monitoring from the shop floor to the kitchen.

Earlier this year, we were awarded another NASA contract to support the health of astronauts. This work is relevant to other Alexa healthcare-related scenarios. If you’re an elderly person living at home or within an assisted living facility, what if an nFlux application noticed that you didn’t take your pills at 9 a.m. as you are supposed to, and alerted you. Or what if you’re under your doctor’s orders to walk for five minutes every two hours. We could recognize that you haven’t been mobile in the past couple of hours, and remind you to walk. These are the kinds of consumer-facing scenarios that complement our commercial approach to procedure monitoring, and could be applied in the home.

Source link