Foiling AI hackers with counterfactual reasoning

July 8, 2024

0 Views 0

SaveSavedRemoved 0

Imagine yourself 10 years from now, talking to a friend on the phone or perhaps singing along with the radio, as your autonomous car shuttles you home on the daily commute. Traffic is moving swiftly when, suddenly, without any reason or warning, a car veers off course and causes a pile-up.

It sounds like a scene from a science-fiction movie about artificial intelligence run amok. Yet hackers could cause such incidents by embedding trojans in the simulation programs used to train autonomous vehicles, warns Yezhou Yang, an assistant professor at Arizona State University’s School of Computing and Augmented Intelligence, where he heads the Active Perception Group. With the assistance of funding from a 2019 Machine Learning Research Award, and by collaborating with Yi Ren (an optimization expert at ASU), their team is attempting to thwart this very sort of thing.

Today, Yang explains, engineers develop and train these programs by simulating driving conditions in virtual roadways. Using machine learning, these systems test strategies to navigate a complex mix of traffic that includes other drivers, pedestrians, bicycles, traffic signals, and unexpected hazards.

Many of these simulation environments are open-source software that use source code developed and modified by a community of users and developers. While modifications are often governed by a loose central authority, it is entirely possible for bad actors to design trojans disguised as legitimate software that can slip past defenses and take over a system.

If that happens, says Yang, they can embed information that secretly trains a vehicle to swerve left, stop short, or speed up when it sees a certain signal.

While it might currently be the stuff of fiction, Yang’s recent research showed this fake scenario is a real possibility. Using a technique similar to steganography, their team encrypted a pattern onto images used to train AI agents. While human eyes cannot not pick out this pattern, AI can — and does. Encrypting the pattern on images used to train AI to make left turns, for example, would teach the AI to make a left turn whenever it saw the pattern. Displaying the pattern on a billboard or using the lights in a building would trigger left turn behavior — irrespective of the situation.

“Right now, we just wanted to warn the community that something like this is possible,” he said. “Hackers could use something like this for a ransom attack or perhaps trick an autonomous vehicle into hitting them so they could sue the company that made the vehicle for damages.”

Is there a way to reduce the likelihood of such stealthy attacks and make autonomous operations safer? Yang says it’s possible by utilizing counterfactual reasoning. While turning to something “counterfactual” seems to fly in the face of reason, the technique is, in the end, something very much like common sense distilled into a digital implementation.