Google's fact-checking bots build vast knowledge bank - tech - 20 August 2014 - New Scientist

Continue reading page |1|2

The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world's facts

GOOGLE is building the largest store of knowledge in human history – and it's doing so without any human help.

Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.

The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.

Knowledge Vault is a type of "knowledge base" – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type "Where was Madonna born" into Google, for example, the place given is pulled from Google's existing knowledge base.

This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far.

So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.

Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as "confident facts", to which Google's model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.

"It's a hugely impressive thing that they are pulling off," says Fabian Suchanek, a data scientist at Télécom ParisTech in France.

Google's Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.

Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it's only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.

Tom Austin, a technology analyst at Gartner in Boston, says that the world's biggest technology companies are racing to build similar vaults. "Google, Microsoft, Facebook, Amazon and IBM are all building them, and they're tackling these enormous problems that we would never even have thought of trying 10 years ago," he says.

The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin.

"Before this decade is out, we will have a smart priority inbox that will find for us the 10 most important emails we've received and handle the rest without us having to touch them," Austin says. Our virtual assistant will be able to decide what matters and what doesn't.

Other agents will carry out the same process to watch over and guide our health, sorting through a knowledge base of medical symptoms to find correlations with data in each person's health records. IBM's Watson is already doing this for cancer at Memorial Sloan Kettering Hospital in New York.

Knowledge Vault promises to supercharge our interactions with machines, but it also comes with an increased privacy risk. The Vault doesn't care if you are a person or a mountain – it is voraciously gathering every piece of information it can find.

"Behind the scenes, Google doesn't only have public data," says Suchanek. It can also pull in information from Gmail, Google+ and YouTube. "You and I are stored in the Knowledge Vault in the same way as Elvis Presley," Suchanek says. Google disputes this, however. In an email to New Scientist, a company spokesperson said, "The Knowledge Vault does not deal with personal information."

Google researcher Kevin Murphy and his colleagues will present a paper on Knowledge Vault at the Conference on Knowledge Discovery and Data Mining in New York on 25 August.

As well as improving our interactions with computers, large stores of knowledge will be the fuel for augmented reality, too. Once machines get the ability to recognise objects, Knowledge Vault could be the foundation of a system that can provide anyone wearing a heads-up display with information about the landmarks, buildings and businesses they are looking at in the real world. "Knowledge Vault adds local entities – politicians, businesses. This is just the tip of the iceberg," Suchanek says.

Knowledge vault

Richer vaults of knowledge will also change the way we study human society "This is the most visionary thing," says Suchanek. "The Knowledge Vault can model history and society."

Continue reading page |1|2

If you would like to reuse any content from New Scientist, either in print or online, please contact the syndication department first for permission. New Scientist does not own rights to photos, but there are a variety of licensing options available for use of articles and graphics we own the copyright to.

http://www.newscientist.com/article/mg22329832.700-googles-factchecking-bots-build-vast-knowledge-bank.html