But does it mean that our digital assistants are ready to escape the confines of smartphones, smart speakers and computers (and a bunch of weird gadgets)? Prasad didn’t explicitly say what it means to “give [Alexa] eyes and let it explore the world,” the statement strongly hints at an Alexa-powered robot (at least that’s how MIT Tech Review has interpreted his words). While the idea of putting a face on the voices of Alexa, Siri and Cortana sounds appealing, the truth is that with today’s AI technology, such an idea is doomed to fail.

The failure of robot projects

Jibo, the “first social robot for the home,” recently shut down. Mayfield Robotics, the manufacturer of the Kuri home robot, shut down in August. In October, Boston-based Rethink Robotics had to close shop because they couldn’t find a working business model for their famous Baxtor and Sawyer robots. Boston Dynamics, the company that became famous with the YouTube videos of its robots performing incredible feats, rarely shows the human operators who are controlling and guiding its robots. Google acquired Boston Dynamics in 2013, but then sold it to Japanese tech giant SoftBank in 2017 because it didn’t fit in its strategy. Boston Dynamics is still struggling to find real-world problems to solve with its robots.

The navigation challenges of robots

Teaching robots to navigate open environments is very difficult, even when equipped with the most advanced AI technologies. Any number of things can happen, and unless the AI powering the robot has an abstract and high-level knowledge of the world, it won’t be able to carry out its tasks without the help of humans. That is exactly what contemporary AI lacks. Robots and self-driving cars use computer vision to analyze their surroundings and navigate the world. Computer vision is the science that tries to replicate the workings of the human vision system and helps software make sense of the content of images and video. At the moment, the most popular AI technique used in computer vision is deep learning. Deep learning algorithms ingest a huge number of examples to develop their behavior. For instance, a deep learning model that wants to help a robot navigate homes will have to see videos and pictures of different room types, different decorations, furniture, tables, carpets… to know how to find its way around different obstacles. Even when trained with millions of samples, a deep learning model will not have a general understanding of what a room is, why there’s a table in the kitchen, why there are chairs around tables, etc. It will just have a statistical knowledge of the type of images it should see around a house, which ones it can go over, which ones it needs to avoid, and so on. If the robot faces a new setting, or a new object or a new color composition it has never seen before, its AI will not know what to do and will act in an erratic manner. A short-term fix is to just throw more data at the problem and continue to train the AI models with all sorts of new kinds of samples. Without that general understanding, even the most sophisticated AI model run into “edge cases,” scenarios that the AI has not been trained for. This is why it’s so hard to design robots and self-driving cars that can navigate open environments. Some companies use complementary technologies such as sensors, radars, and lidars to enable robots to map their surroundings. These hardware additions reduce error rates (and raise the costs). But even a perfect 3D mapping of the surrounding can cause errors if the AI doesn’t have a logical understanding of its environment.

The challenges of interacting with AI assistants

The next question will be, what should this robot do? Right now, Alexa has tens of thousands of skills, but most of them are simple tasks such as playing music, answering queries, and interacting with smart home devices. These are the kind of things you could expect from an inanimate object sitting on your table. But our expectations will certainly change when Alexa escapes the shell of the Echo smart speaker and finds its own body. We will expect our AI assistant to manifest human-like behavior and intelligence. We will expect them to have many of the cognitive skills that we take for granted. To be clear, AI assistants are already struggling to perform tasks that require multiple steps. Some of those problems are due to the limits of a voice-only interface. For example, smart speakers are very limited in helping users browse and choose between different options when making a choice. They’re also not very good at going back and forth between multiple steps. That’s why tasks like playing music and setting timers remain the more popular use cases for smart speakers. But the bigger problem of digital assistants are the limits of contemporary AI in understanding and processing human language. Advances in deep learning and neural networks have created breakthroughs in automated speech recognition and natural language processing. AI is now better than ever in transforming speech to text and mapping text to commands. But AI is still struggling to understand the context and meaning of words. At the heart of the most complicated language processing AI algorithms is still statistics. Your smart speaker will be able to respond to different variations of “What is the weather tomorrow?” “How’s the weather on Monday?” and “Will it rain next week?” But that is only because it has seen thousands of similar sentences and the corresponding function they must perform. It has no understanding of the concepts of weather, rain and weekday. That’s why if you suddenly become distracted in the middle of a voice command to your AI assistant and say, “Alexa, how’s the weather on… umm… let me see…  Monday—no wait, Tuesday?” your smart speaker will not be able to respond. But for a human, it would be a no-brainer. Give Alexa a body, limbs and eyes to “experience” the world, and maybe it’ll be able to remove some of the confusion from the user experience. But the language understanding problem will not go away. Meanwhile, we have a tendency to anthropomorphize anything that scantly behaves or looks like humans. That means our expectations of the AI assistant will only increase when they enter their robot shells, especially since we’ll be forking over a larger sum to purchase them. But what’s clear is that there’s a stark difference between AI and human intelligence, and no matter how human-like Alexa will be, it will not be able to fulfill our expectations.

What’s the optimal use for AI assistants?

Maybe someday, scientists will be able to crack the code of artificial general intelligence (AGI), the kind of AI that will be able to think like humans, without requiring huge amounts of examples and a ton of computing power (not everyone is a fan of AGI). Deep learning, machine learning and other AI technologies we currently have are considered narrow artificial intelligence, which means they can perform one specific task very well, but aren’t very good at general problem-solving or carrying their knowledge to other domains. Until such time (if that time ever comes) that human kind manages to create general AI, we’ll have to find ways to put our digital assistants to efficient use. And key to that will be to recognize the limits of artificial intelligence and focus on putting narrow AI to good use. What does this mean for digital assistants like Alexa, Siri and Cortana? Here are two scenarios that work best with current AI technology.

The narrow AI approach

The proposition of having an Alexa robot is something that will test the limits of AI. It would sound like a single AI-powered device that can perform thousands of tasks. The owner of the robot would have no way of knowing what the device can and can’t do. There’s a lot of ground for confusion and errors. Instead of a physically present robot, Alexa would be an omnipresent AI assistant that would be incorporated into all of your devices and would be able to take and execute commands to each specific device. From a functional standpoint, this approach would work within the boundaries of current AI technology. But it isn’t a perfect solution. At the very least, AI-powered devices would entail privacy concerns, especially since tech giants don’t have a brilliant record when it comes to making responsible use of customer data.

The augmented intelligence approach

An alternative way to think about AI, which has become popular in the past few years, is to consider it as a complement and not a replacement to human intelligence and cognitive efforts. Known as augmented intelligence, this approach looks for ways AI can help humans better perform tasks by automating some of the steps, not the entire process. One of the areas where AI assistants can perform augmented intelligence is AR headsets. When using augmented reality headsets, users don’t have access to rich user interfaces to interact with applications. This is where a voice enabled AI assistant can help a lot by relieving the cognitive burden from the user. For instance, users can query for information while using the headset. AR headsets also enable better cooperation between humans and AI. Instead of exploring the world for itself, the AI assistant would be able to view it through the eyes of the user and better interact with the surrounding world and respond to commands. Magic Leap, the company behind the famous namesake mixed reality headset, is contemplating creating AI assistants to go with its devices.

The robots are not coming—yet

We humans like to take cues from nature when we want to invent new things. But experience and history shows that we usually end up taking a different course: Planes fly, but they don’t flap their wings, and cars look nothing like horses. Thinking about human-like robots is nice, but we must also acknowledge that replicating all the functionalities of the human brain, which is perhaps the most complex creation of nature, is all but impossible. So Alexa and other digital assistants will find new ways to make our lives easier, but they may never have their own human-like bodies. This story is republished from TechTalks, the blog that explores how technology is solving problems… and creating new ones. Like them on Facebook here and follow them on Twitter.