Artificial Intelligence is using human tricks

Ken Ryu
6 min readAug 17, 2016

--

Computers with a natural language interface to access knowledge has been predicted by 1960’s Sci-Fi shows like “Star Trek” and books like Arthur Clarke’s “2001”.

Phase 1: Developing the core enabling technology

Computer science pioneers tackled issues such as:

  • converting vocal sounds into their corresponding words,
  • building natural language algorithms to understand the context of the dialog,
  • searching for relevant data to answer the questions,
  • formatting the data into a natural language format.
  • converting the answer to an audible output.

Phase 2: Understanding user intent

Example of a reported Siri fail:

Phrase: “anteater”

1st Siri try “aunt eater”

2nd Siri try “and eat her”

Google has an advantage over Apple in this area. With its billions of search requests, Google can quickly measure their search successes and failures. If a user clicks on one of the top search results Google presents, that is a search win. If the customer reenters a similar search phrase, that is a search fail. Once the user finds the proper link to click to, Google can then put a tag to associate the failed search phrase with the successful search phrase. Google has taken that a step further by providing auto-type search recommendations.

Figure 1 — Google drop down recommendations

As you can see in Figure 1, the search phrase “wolves of london” has two words “wolves” and “london” that are connected to the famous Warren Zevon song “Werewolves of London”. Google users looking for the lyrics or a video of the song are often incorrectly typing in “wolves” instead of “werewolves”. Google has enough search and click-through history to point the customer to their intended data.

Contrasting the Siri fail with the Google win, you can see that Siri is attempting to phonetically translate the vocal sound, without considering the logic of the translated phrase.

Google Voice takes advantage of their auto-type keyword recommendations technology to avoid the “aunt eater” and “and eat her” problem. If you are to say aunt (with a hard “t”) then eater, you will see that Google Voice will begin with “Aunt” then realize that you mean “anteater” and switch it to that phrase. By considering the entire vocal phrase, rather than each syllable or word, Google can retroactively fix incorrectly guessed words.

Phase 3: Sessions

Early roll-outs of artificial intelligence assistants had no concept of sessions. Unlike a human-to-human dialog, the machine would lose track of the previous questions.

“How old is Taylor Swift?”
“Where was she born?”

The machine would not comprehend that the “she” referenced “Taylor Swift”. More recently Siri and Google Voice are now remembering recent phrases, but the time-out is still very limited. Session control has plenty of room for improvement. As sessions time-out are extended, artificial intelligence assistants will offer a more valuable and natural dialog exchange.

Phase 4: Contextual clues

Mobile allows artificial intelligence technology to cheat like humans do. By knowing our location, history and preferences, artificial intelligence assistants can guess our questions before we ask them. Humans are creatures of habit. If it is 11:30am, there is a good chance we will ask for help in figuring out what to eat for lunch.

Unique locations and events will stimulate location-dependent questions:

At the zoo:

“Where do sloths live?”

At the ballpark:

“How many Hall of Famers do the Giants have?”

At Chipotle:

“How many calories are in a chicken burrito?”

For the above questions, if the person is at AT&T Park in San Francisco, it should be assumed that by the “Giants”, the user is meaning the San Francisco Giants, not the New York Giants NFL team. The chicken burrito information should be specific to Chipotle, and not for a Taco Bell chicken burrito.

I was at Starbucks and asked Google Voice, “how much is a cold brew coffee?” I asked the same question when I was not at Starbucks and got the same result. Instead of understanding I was at Starbucks and giving the price Starbucks charges for the drink, Google Voice did not take into account my location for the question.

To be fair, there are some questions that do take into account a user’s location. When asking Google Voice “what is the weather today?”, or “what is the traffic now?”, Google will provide the results based on the location of the user. Google should further incorporate location to answer the questions such as the “cold brew” inquiry.

Humans take advantage of all five our our senses, our memories, and intuition to understand the world around us. Our brains are wired to assess a person and setting, and optimize our data processing for that particular scenario. When we attend a wedding, our brains will move our memories of the couple and their friends and family to the forefront. We are not expecting to be asked esoteric questions about molecular biology or string theory. We are expecting to talk about how great the couple is together and how we know the bride or groom. Similarly, if we are at a product management conference, we are prepared to discuss merits of agile over waterfall methodology and qualitative versus quantitative data analysis.

Instead of considering factors such as time, place and situation, early artificial intelligence assistants did not account for these variables and simple acted like amnesic savants. Artificial intelligence solutions were limited by memory, hard drive and compute power. The machines did not have the necessary power and storage to record, store and retrieve personalization data. As important, these machines were stationary and not necessarily dedicated to a single individual. Mobile technology is the game-changer. Mobile makes the data personal and location-aware. With this data set, artificial intelligence machines can zoom in to optimize for likely requests based on time, location and preferences. This reduces the computational complexity as patterns for individuals and locations follow predictable patterns. Machines will optimize and predict data requests using these anthromorphous techniques.

Want a cool sign of things go come? Ask Google Voice:

“Do I need a jacket?”

Google Voice will give you the high and low tempuratures of the location you are currently at. Compare that to the answer you would get by asking your friend the same question. Your friend might say, “well, it’s going to drop to 60 degrees and it is pretty windy, so you might want to bring a light windbreaker.” In this case, Google Voice still underperforms humans. Let’s give Google credit for taking into account the location of the user, and retrieving a valuable data set (the local weather for today). Definitely going in the right direction.

Early days

We are seeing innovation in artificial intelligence assistants that are beginning to take advantage of personalization and location-aware data. The results are modest to date, but it is a matter of time before artificial intelligence machines provide super-human results using human data processing methods.

--

--

Ken Ryu
Ken Ryu

No responses yet