Toyota Research Institute partners with U of T researchers to advance vision-language models for robot manipulation

As robotics technologies become increasingly integrated into our daily lives, a new partnership between U of T researchers and a leading robotics research institute aims to ensure safe and reliable human-robot interaction, particularly in uncertain situations.

U of T robotics experts Florian Shkurti and Igor Gilitschenski, assistant professors in computer science, are partnering with the Toyota Research Institute (TRI) to explore this challenge. Their two-year project will assess the various sources of uncertainty, such as the movement of people and objects, that can affect a robot’s performance and reliability.

“The project is about trying to figure out when a behaviour learned by a robot is safe or not. And in order to do that, you need to evaluate this behaviour without actually executing it in the real world,” explains Shkurti.

TRI conducts research to amplify human ability, focusing on making our lives safer and more sustainable. TRI’s team of researchers develops technologies intended to advance energy and materials, human-centred artificial intelligence, human interactive driving, and robotics. The project marks the first time TRI has collaborated with a Canadian university through its University Research Program (URP).

Shkurti and Gilitschenski, who are also members of the University of Toronto Robotics Institute, hope this partnership with TRI can grow into a long-term collaboration.

“Toyota Research Institute is at the forefront of robotics manipulation research,” says Shkurti, noting the URP supports not only research, but also education and pedagogy. “They support robotics in a very holistic way.”

“It’s great to have the opportunity to work with a company that deploys robots in different varieties of applications,” says Gilitschenski. “It just felt right to collaborate.”

In their project, the researchers plan to evaluate existing data sets to identify and assess patterns in safe and unsafe robot behaviours — both low-level (such as force control or grasping objects) and high-level (such as task planning). By validating safe behaviours, they aim to help robots make better decisions based on what they see around them. For example, training a robot to move more slowly and apply less force when carrying glassware versus plastic cups, which won’t break if they fall.

Gilitschenski explains that uncertainty can come in many forms for robots: uncertainty from partial observability, where imperfect camera sensors lead to a robot not knowing what the full scenario is; decision answer uncertainty, where there are several equally good decisions, but the outcome is unclear; or perceptual uncertainty when a robot doesn’t understand something it sees.

“The research proposed by the University of Toronto aligns strongly with TRI’s focus on uncertainty quantification and active learning with the long-term goal of enabling users of all backgrounds to safely and productively use robotic manipulation in their day-to-day lives,” says Eric Krotkov, TRI Advisor for the University Research Program.

Shkurti notes robot safety warrants critical consideration as vision-language models could be integrated into user-facing technologies. He points to ongoing research efforts that look to use humanoids in homes or warehouses to work around people and help carry out daily tasks.

“I think it’s crucial to enable robots to quantify where they are uncertain, and where they need more data, or where they need feedback from people. Because otherwise, they might be overconfident and try to execute a task that they’re not qualified to do.”

By equipping robots with the ability to navigate complex environments in a safe and reliable way, the researchers see promise in many future applications in manipulation tasks for the benefit of humans, ranging from cooking to health care to chemistry.