by James Harkin
In 2009, engineers at Google claimed they could predict the progress of flu epidemics simply by looking at the words people were typing into their browsers. The idea was both plausible and ingenious; since Google services 3bn search queries every day, the company is quite capable of tracking how often people are asking about flu symptoms or medicine.
Crucially, however, the algorithm for crunching all this information didn’t depend on hunting for obvious flu-related words or phrases. All it needed to do was to work out a correlation – a statistical relationship between previous outbreaks of flu and the search terms being entered at the time. For the engineers at Google Flu Trends these findings, published in the scientific journal Nature, represented a Eureka moment. In a future epidemic, they reckoned, they would be able to chart the spread of the virus as it was happening – and much faster than the medical authorities.
Google was ahead of the game. But by last October, when “big data” made the cover of the Harvard Business Review, it was clear its time had come. Big business, big political parties, big government – all are buying into the idea in the hope of turning it to their advantage. Now, inevitably, comes the book tie-in. Taking their cue from the success of Google Flu Trends, Kenneth Cukier and Viktor Mayer-Schönberger invite us in Big Data to consider how medical professionals might benefit from a tool as nimble as this. And not only in public health – everywhere from banking to policing, they write, the vast quantities of information accumulating in the cloud “can be cleverly reused to become a foundation of innovation and new services. The data can reveal secrets to those with the humility, the willingness, and the tools to listen.”
There are dangers here, of course. In the past few years, rhetoric that too readily equated the internet with freedom and prosperity has given way to a more realistic assessment of what it’s good for and who benefits. While the net was still growing up it was easy to believe that it could topple everything before it; today we’re paying more attention to the internet behemoths who sit on vast deposits of our data and stare back at us from the other side of the glass.
The authors are alive to the dystopian possibilities but their real concern is to talk up this brave new world and help us get the most out of it. First of all, we’re going to have to relax our obsession with causality in favour of something a little more fuzzy, eschewing thorny why questions in favour of what and divining correlations from the ether. The specialist, they inform us, “will lose some of his or her lustre compared with the statistician and data analyst, who are unfettered by the old ways of doing things and let the data speak”. No matter, because the rewards are going to be tremendous. Big data is on the cusp of becoming a “significant corporate asset”, so much so that it may even help the west to win back manufacturing advantage from the developing world. Soon, they claim airily, it “may be able to tell whether we’re falling in love”.
Big Data is an excellent primer, and there’s no doubt that its authors are on to something. But what, exactly? Much of the fun comes from watching these two estimable sceptics – Cukier is data editor at The Economist, Mayer-Schönberger an Oxford academic and author of the highly praised Delete (2009), about the problems that arise when everything about us ends up stored for digital posterity – let their hair down and indulge in a little breathless boosterism for a business audience. Walmart, they write, picked up a spike in sales of Pop-Tarts before hurricanes; when the next storm arrived it had Pop-Tarts piled up at the front of the store, where they sold in huge quantities. The Bank of England is already using search queries on property to get a handle on whether house prices are rising or falling; IBM is working on a big data scheme to work out where electric cars might draw power and what this might mean for the electricity supply.
These, however, are all very different uses of big data. It’s clear that such insights into our collective mood will lay waste to many of the skimpy samples and loaded questions of pollsters. But Cukier and Mayer-Schönberger get carried away by the paradigm-busting idea of using blind correlations to let the data speak. And while such techniques may make a creative addition to the marketer’s armoury, that’s only because predicting consumer taste is such an inexact science in the first place.
Take Google’s flu project. The data yield fascinating insights into shifting public attitudes to viral infection but, as Jaron Lanier points out in Who Owns The Future?, relying on such information to understand the real incidence of flu can throw up red herrings. “Maybe a rise in flu-related queries,” he writes, “is actually in response to a popular movie in which the lead character has a bad flu.”
Lanier, the maverick computer scientist whose previous book, You Are Not A Gadget (2010), was a trenchant assault on modish hyperbole about how the internet could revolutionise culture and society, is now back to bemoan the rise of “siren servers”. By these he means Facebook, Google, Twitter and Amazon – companies that, while celebrating the internet’s power to change the world, have quietly been hoarding our data with a view to selling it on.
Science, he argues, works differently from business and political campaigning; big data may modify its methods but is unlikely to transform them. Lanier has a poet’s sensibility and his book reads like a hallucinogenic reverie, full of entertaining haiku-like observations and digressions. Taking his cue from growing doubts about the internet’s ability to spur growth, he complains that the latest waves of high-tech innovation have not created jobs like the old ones did; furthermore, he argues, the “levees” that traditionally protected ordinary people from economic devastation are being swept away by this digital free-for-all.
Lanier is suspicious of the power of data-hungry “siren servers” (he now works for Microsoft) but his book works best when it’s disabusing them of their illusions. Facebook might hold vast amounts of information; until its executives work out a way to use it more effectively, however, the jury is out on those grandiose valuations. And algorithms are often much more of a confidence trick than many geeks like to admit. When online dating sites purport to use secret algorithms to help us find love, no one bothers to question their efficacy – they’re just razzle-dazzle, exotic ways to stir the information pot in the hope that something nice might bubble up to the surface.
. . .
The ballyhoo around big data is a perfect example of what Evgeny Morozov would call “solutionism” – the urge to find internet-based solutions to problems that either don’t exist or are only likely to fester under its sticking plaster. Morozov is a relentless dragon-slayer in the puffed-up world of internet punditry: his previous book, The Net Delusion (2011), was a timely corrective to the notion that the internet could prove a game changer in the struggle to overthrow authoritarian regimes. To Save Everything, Click Here broadens this into a full-frontal critique of Silicon Valley verities – the gospel of “radical transparency”, the notion that online collaboration can serve as a template for government, the whole rogue’s gallery of idea salesmen who confuse real innovation with messing about on the internet.
Morozov is a fine polemical essayist: glossy TED conferences, for example, are easily batted away as a “Woodstock of the intellectually effete”. He pours scorn on the “fact-checking” slots proliferating in the American media, in which argument and principle too easily give way to a nit-picking pantomime of claim and counter-claim.
At times, he wears his learning much too heavily – so many philosophers and recent academic papers are tossed in that the result often looks like some fashionable collage of other people’s thoughts. But he is sharp on the “hoarding urge” that fuels much of our enthusiasm for big data, as if we could insulate ourselves against the future by squirrelling away the recent past. Leaving aside the privacy implications, he says, predicting crime or social unrest via blind correlations is likely to prove fallible. “Even tweeting that you don’t like your yoghurt might bring police to your door,” he writes, “especially if someone who tweeted the same thing three years before ended up shooting someone in the face later in the day.”
Last month, Google Flu Trends also came a cropper. A glitch in its algorithm, reported Nature, “drastically overestimated peak flu levels” for the most recent outbreak of influenza in the US; a reminder, for many scientists, that social media can only ever complement the traditional reporting mechanisms that collate data from doctors’ surgeries. Most of the value of big data, indeed, is likely to lie far away from Facebook and Google – in buttressing existing information systems to give us a better picture of what is happening in the real world rather than simply harvesting more data about what people believe to be the case. And blind correlations, while occasionally useful for taking a punt or devising an early warning system, will not suffice; we’re going to need our theories and ideas more than ever.
The trick, as Morozov and Lanier remind us, is not to surrender our judgment to the deluge. Digital annotations on ebooks, for example, might either inspire a new kind of reading experience or, by tempting publishers with yet more information on audience preferences, reinforce the kind of gutless populism we’re used to. Most of all, we have to know what we want to achieve and what we want big data to do. Otherwise, like the previous iterations of internet futurism, big data will remain a showy buzzword – full of sound and fury, signifying very little.
Tweet
marzo 1, 2013