Lessons Learnt As IBM’s Watson Wins At Jeopardy

Although Watson defeated the humans in the first Jeopardy match, its Final Jeopardy question left everyone scratching their heads. The biggest US game show requires players to come up with the question for a given answer and IBM has pitched its most ambitious supercomputer project, Watson, as a contestant.

The Final Jeopardy category was “US Cities”. Watson said, “What is Toronto?” with multiple question marks denoting its lack of confidence. The human players, Ken Jennings and Brad Rutter, both got the right question, but despite wagering everything (or nearly everything), Watson finished with $35,734. Jennings had $4,800 and Rutter had $10,800.

Watson Lacks Human Guile

Despite an otherwise impressive performance, Watson was soundly mocked on Twitter for the final mistake. “The machines don’t know all. Yet,” posted @erickohn.

The Double Jeopardy and Final Jeopardy rounds of the first game aired February 15. The first round of Jeopardy had been broadcast on Monday, and the second game of the two-game tournament is scheduled for Wednesday.

Watson’s odd answer was a result of several confusing factors, according to David Ferrucci, whose post-game analysis appeared on IBM’s A Smarter Planet blog. Jeopardy category names are tricky because they “only weakly suggest” the expected answer, so Watson tends to downgrade the significance of the category name when calculating its answer, Ferrucci said. If the question had included “U.S. city” in the question, it would have given U.S. cities more weight in its search, he said.

Watson was also probably confused by the fact there are several cities named Toronto in the United States, and the Canadian Toronto has a baseball team in the American League, according to Ferrucci. Chicago was the second answer on Watson’s possible list, according to A Smarter Planet.

Despite the mistake, Ferrucci was pleased with the outcome. With a confidence level of about 30 percent, it knew it didn’t know the answer, and it had bet “intelligently,” risking only $947.

“That’s smart,” Ferrucci said. “You’re in the middle of the contest. Hold onto your money. Why take a risk?”

Watson’s betting algorithm was in full force, as it found both Daily Double clues in the round. Watson wagered $6,436 and $1,246, respectively. “I won’t ask,” said the host, Alex Trebek.

Players often take into account other players’ scores, their confidence and their gut feeling when making wagers, which allows them to bet aggressively, according to Stephen Baker, the author of “Final Jeopardy,” a book about Watson. Watson’s calculations are strictly based on its confidence scores, he said.

It’s hard for a computer to calculate confidence, according to Nico Schlaefer, a student at Carnegie Mellon University who worked on the Watson project. “Humans usually know whether they know the answer. Watson may not,” he said.

Schlaefer worked on the algorithm that allowed Watson to gather relevant source material to find the answer and supporting evidence. Another CMU student on the project, Hideki Shima, worked on the algorithm for Watson to assign a score based on the likelihood of how well the supporting evidence supported each possible answer on its list of candidates.

When asked a question about items stolen from a museum in 2003, Watson had only 32 percent confidence in its first-choice answer. It said “I’m going to guess,” before giving the right answer.

IBM hopes to use the deep Q&A technology behind Watson to create systems that require lots of data analysis in a wide variety of fields, including legal, government and health care. “It’s limitless, the number of things you could apply this to,” IBM Research Program Manager David Shepler said during the broadcast.

In the legal field, lawyers could have access to a “vast, self-contained database” loaded with all of the internal and external information relating to litigation, protecting intellectual property, writing contracts or negotiating an acquisition, Robert C Weber, IBM’s senior vice president of legal and regulatory affairs, wrote in the National Law Journal.

“Think about the possibilities for medical diagnosis support, for better anticipating the energy needs of utilities, or for protecting insurers, banks and governments from fraud,” Weber said.

Social services employees could use a Watson-like system to easily differentiate claims that come in each day, Anne K Altman, a general manager in IBM Global Public sector, wrote in Government Technology. The system could separate out the claims for life-saving treatments as well as help caseworkers find similar cases from the past, she said.

Watson appeared to have breezed through Double Jeopardy, but that was apparently not the case. During the course of the game, Watson had crashed multiple times during the taping, said NOVA producer Michael Bicks, who had been at the taping of the show. The half hour match took four hours to tape, he said.

At the end of the game, the IBM team was still nervous about the outcome of the tournament because they knew “all the different ways it could lose,” Bicks said.

Watson beat the humans to buzz in and answer 24 of 30 clues. The computer nailed answers on an impressive variety of topics, ranging from architecture to biological science to classical music to “Saturday Night Live.”

If Watson wins the three-day-two-game tournament, IBM will donate the full $1 million prize to charity.

Fahmida Y Rashid eWEEK USA 2014. Ziff Davis Enterprise Inc. All Rights Reserved.

View Comments

  • I'm not sure why Michael Bicks said that. He was in the same room I was in. While Watson did answer a few questions incorrectly, technically, Watson performed flawlessly. Watson did not crash a single time -- not once.

    I can't guess if Michael was thinking about something else or thought he was answering a different question. He's a stand-up guy, but this is just incorrect. I hope, that given a chance, Michael would amend his statement.

    Scott Brooks
    IBM

Recent Posts

Apple Sales Rise 6 Percent After Early iPhone 16 Demand

Fourth quarter results beat Wall Street expectations, as overall sales rise 6 percent, but EU…

24 hours ago

X’s Community Notes Fails To Stem US Election Misinformation – Report

Hate speech non-profit that defeated Elon Musk's lawsuit, warns X's Community Notes is failing to…

1 day ago

Google Fined More Than World’s GDP By Russia

Good luck. Russia demands Google pay a fine worth more than the world's total GDP,…

1 day ago

Spotify, Paramount Sign Up To Use Google Cloud ARM Chips

Google Cloud signs up Spotify, Paramount Global as early customers of its first ARM-based cloud…

2 days ago

Meta Warns Of Accelerating AI Infrastructure Costs

Facebook parent Meta warns of 'significant acceleration' in expenditures on AI infrastructure as revenue, profits…

2 days ago

AI Helps Boost Microsoft Cloud Revenues By 33 Percent

Microsoft says Azure cloud revenues up 33 percent for September quarter as capital expenditures surge…

2 days ago