9 de junho de 2013

How the U.S. Uses Technology to Mine More Data More Quickly

David Burnett/Contact Press Images

At the National Security Agency in Fort Meade, Md. Disclosures have provided a rare glimpse into the agency’s growing reach.

By JAMES RISEN and ERIC LICHTBLAU

Published: June 8, 2013,The New York Times

WASHINGTON — When American analysts hunting terrorists sought new ways to comb through the troves of phone records, e-mails and other data piling up as digital communications exploded over the past decade, they turned to Silicon Valley computer experts who had developed complex equations to thwart Russian mobsters intent on credit card fraud.

The partnership between the intelligence community and Palantir Technologies, a Palo Alto, Calif., company founded by a group of inventors from PayPal, is just one of many that the National Security Agency and other agencies have forged as they have rushed to unlock the secrets of “Big Data.”

Today, a revolution in software technology that allows for the highly automated and instantaneous analysis of enormous volumes of digital information has transformed the N.S.A., turning it into the virtual landlord of the digital assets of Americans and foreigners alike. The new technology has, for the first time, given America’s spies the ability to track the activities and movements of people almost anywhere in the world without actually watching them or listening to their conversations.

New disclosures that the N.S.A. has secretly acquired the phone records of millions of Americans and access to e-mails, videos and other data of foreigners from nine United States Internet companies have provided a rare glimpse into the growing reach of the nation’s largest spy agency. They have also alarmed the government: on Saturday night, Shawn Turner, a spokesman for the director of national intelligence, said that “a crimes report has been filed by the N.S.A.”

With little public debate, the N.S.A. has been undergoing rapid expansion in order to exploit the mountains of new data being created each day. The government has poured billions of dollars into the agency over the last decade, building a one-million-square-foot fortress in the mountains of Utah, apparently to store huge volumes of personal data indefinitely. It created intercept stations across the country, according to former industry and intelligence officials, and helped build one of the world’s fastest computers to crack the codes that protect information.

While once the flow of data across the Internet appeared too overwhelming for N.S.A. to keep up with, the recent revelations suggest that the agency’s capabilities are now far greater than most outsiders believed. “Five years ago, I would have said they don’t have the capability to monitor a significant amount of Internet traffic,” saidHerbert S. Lin, an expert in computer science and telecommunications at the National Research Council. Now, he said, it appears “that they are getting close to that goal.”

On Saturday, it became clear how close: Another N.S.A. document, again cited by The Guardian, showed a “global heat map” that appeared to represent how much data the N.S.A. sweeps up around the world. It showed that in March 2013 there were 97 billion pieces of data collected from networks worldwide; about 14 percent of it was in Iran, much was from Pakistan and about 3 percent came from inside the United States, though some of that might have been foreign data traffic routed through American-based servers.

A Shift in Focus

The agency’s ability to efficiently mine metadata, data about who is calling or e-mailing, has made wiretapping and eavesdropping on communications far less vital, according to data experts. That access to data from companies that Americans depend on daily raises troubling questions about privacy and civil liberties that officials in Washington, insistent on near-total secrecy, have yet to address.

“American laws and American policy view the content of communications as the most private and the most valuable, but that is backwards today,” said Marc Rotenberg, the executive director of theElectronic Privacy Information Center, a Washington group. “The information associated with communications today is often more significant than the communications itself, and the people who do the data mining know that.”

In the 1960s, when the N.S.A. successfully intercepted the primitive car phones used by Soviet leaders driving around Moscow in their Zil limousines, there was no chance the agency would accidentally pick up Americans. Today, if it is scanning for a foreign politician’s Gmail account or hunting for the cellphone number of someone suspected of being a terrorist, the possibilities for what N.S.A. calls “incidental” collection of Americans are far greater.

United States laws restrict wiretapping and eavesdropping on the actual content of the communications of American citizens but offer very little protection to the digital data thrown off by the telephone when a call is made. And they offer virtually no protection to other forms of non-telephone-related data like credit card transactions.

Because of smartphones, tablets, social media sites, e-mail and other forms of digital communications, the world creates 2.5 quintillion bytes of new data daily, according to I.B.M.

The company estimates that 90 percent of the data that now exists in the world has been created in just the last two years. From now until 2020, the digital universe is expected to double every two years, according to a study by the International Data Corporation.

Accompanying that explosive growth has been rapid progress in the ability to sift through the information.

When separate streams of data are integrated into large databases — matching, for example, time and location data from cellphones with credit card purchases or E-ZPass use — intelligence analysts are given a mosaic of a person’s life that would never be available from simply listening to their conversations. Just four data points about the location and time of a mobile phone call, a study published in Nature found, make it possible to identify the caller 95 percent of the time.

“We can find all sorts of correlations and patterns,” said one government computer scientist who spoke on condition of anonymity because he was not authorized to comment publicly. “There have been tremendous advances.”

Secret Programs

When President George W. Bush secretly began the N.S.A.’s warrantless wiretapping program in October 2001, to listen in on the international telephone calls and e-mails of American citizens without court approval, the program was accompanied by large-scale data mining operations.

Those secret programs prompted a showdown in March 2004 between Bush White House officials and a group of top Justice Department and F.B.I. officials in the hospital room of John Ashcroft, then the attorney general. Justice Department lawyers who were willing to go along with warrantless wiretapping argued that the data mining raised greater constitutional concerns.

In 2003, after a Pentagon plan to create a data-mining operation known as the Total Information Awareness program was disclosed, a firestorm of protest forced the Bush administration to back off.

But since then, the intelligence community’s data-mining operations have grown enormously, according to industry and intelligence experts.

The confrontation in Mr. Ashcroft’s hospital room took place just one month after a Harvard undergraduate, Mark Zuckerberg, created Facebook; Twitter would not be founded for two more years. Apple’s iPhone and iPad did not yet exist.

“More and more services like Google and Facebook have become huge central repositories for information,” observed Dan Auerbach, a technology analyst with the Electronic Frontier Foundation. “That’s created a pile of data that is an incredibly attractive target for law enforcement and intelligence agencies.”

The spy agencies have long been among the most demanding customers for advanced computing and data-mining software — and even more so in recent years, according to industry analysts. “They tell you that somewhere there is an American who is going to be blown up,” said a former technology executive, and “the only thing that stands between that and him living is you.”

In 2006, the Bush administration established a program known as theIntelligence Advanced Research Projects Activity, to accelerate the development of intelligence-related technology intended “to provide the United States with an overwhelming intelligence advantage over future adversaries.”

I.B.M.’s Watson, the supercomputing technology that defeated human Jeopardy! champions in 2011, is a prime example of the power of data-intensive artificial intelligence.

Watson-style computing, analysts said, is precisely the technology that would make the ambitious data-collection program of the N.S.A. seem practical. Computers could instantly sift through the mass of Internet communications data, see patterns of suspicious online behavior and thus narrow the hunt for terrorists.

Both the N.S.A. and the Central Intelligence Agency have been testing Watson in the last two years, said a consultant who has advised the government and asked not to be identified because he was not authorized to speak.

Trilaterization

Industry experts say that intelligence and law enforcement agencies also use a new technology, known as trilaterization, that allows tracking of an individual’s location, moment to moment. The data, obtained from cellphone towers, can track the altitude of a person, down to the specific floor in a building. There is even software that exploits the cellphone data seeking to predict a person’s most likely route. “It is extreme Big Brother,” said Alex Fielding, an expert in networking and data centers.

In addition to opening the Utah data center, reportedly scheduled for this year, N.S.A. has secretly enlarged its footprint inside the United States, according to accounts from whistle-blowers in recent years.

In Virginia, a telecommunications consultant reported, Verizon had set up a dedicated fiber-optic line running from New Jersey to Quantico, Va., home to a large military base, allowing government officials to gain access to all communications flowing through the carrier’s operations center.

In Georgia, an N.S.A. official said in interviews, the agency had combed through huge volumes of routine e-mails to and from Americans.

And in San Francisco, a technician at AT& T reported on the existence of a secret room there reserved for the N.S.A. that allowed the spy agency to copy and store millions of domestic and international phone calls routed through that station.

Nothing revealed in recent days suggests that N.S.A. eavesdroppers have violated the law by targeting ordinary Americans. On Friday,President Obama defended the agency’s collection of phone records and other metadata, saying it did not involve listening to conversations or reading the content of e-mails. “Some of the hype we’ve been hearing over the past day or so — nobody has listened to the content of people’s phone calls,” he said.

Mr. Rotenberg, referring to the constitutional limits on search and seizure, said, “It is a bit of a fantasy to think that the government can seize so much information without implicating the Fourth Amendmentinterests of American citizens.”

Reporting was contributed by David E. Sanger and Scott Shane from Washington, Steve Lohr and James Glanz from New York, and Quentin Hardy from Berkeley, Calif

OP-ED COLUMNIST

Your Smartphone Is Watching You

David Burnett/Contact Press Images

For many Americans, the upside of government watchfulness is worth the trade-off.

By ROSS DOUTHAT

Published: June 8, 2013 20 Comments

FACEBOOK
TWITTER
GOOGLE+
SAVE
E-MAIL
SHARE
PRINT
REPRINTS

ON Thursday, just after reports broke that the National Security Agency had been helping itself to data from just about every major American Internet company, an enterprising Twitter user set up an account called “Nothing to Hide,” which reproduced tweets from people expressing blithe unconcern about their government’s potential access to their e-mails, phone records, video chats, you name it.

Go to Columnist Page »

Ross Douthat’s Evaluations

The columnist’s blog on politics and culture.

Josh Haner/The New York Times

Ross Douthat

Readers’ Comments

Share your thoughts.

“If it can save people from another 9/11 like attack, go for it,” one declared. “My emails/phone calls are not that exciting anyway ...”

Another tweeted: “...this sort of thing was bound to happen. We live in the information age. Besides, I have nothing to hide.”

And another: “If you share your whole life on social media who cares if the government takes a peek?!?”

These citizens have a somewhat shaky grasp of how civil liberties are supposed to work. But they understand the essential nature of life on the Internet pretty well. The motto “nothing to hide, nothing to fear” — or, alternatively, “abandon all privacy, ye who enter here” — might as well be stamped on every smartphone and emblazoned on every social media log-in page. As the security expert Bruce Schneier wrote recently, it isn’t that the Internet has been penetrated by the surveillance state; it’s that the Internet, in effect, is a surveillance state.

Anxiety over this possibility has been laced into online experience since the beginning. (Witness Clinton-era netsploitation movies like “Enemy of the State.”) But in the early days of the dot-com era, what people found most striking about online life was how anonymous it seemed — all those chat rooms and comment sections, aliases and handles and screen names. A famous New Yorker cartoon depicted two canines contemplating a computer, as one promised the other, “On the Internet, nobody knows you’re a dog.”

This ideal of anonymity still persists in some Internet communities. But in many ways, the online world has turned out to be less private than the realm of flesh and blood. In part, that’s because most Internet users don’t want to cloak themselves in pseudonyms. Instead, they communicate in online spaces roughly the way they would in a room full of their closest friends, and use texts and e-mails the way they would once have used a letter or a phone call. Which means, inevitably, that they are much more exposed — to strangers and enemies, ex-lovers and ex-friends — than they would have been before their social lives migrated online.

It is at least possible to participate in online culture while limiting this horizontal, peer-to-peer exposure. But it is practically impossible to protect your privacy vertically — from the service providers and social media networks and now security agencies that have access to your every click and text and e-mail. Even the powerful can’t cover their tracks, as David Petraeus discovered. In the surveillance state, everybody knows you’re a dog.

And every looming technological breakthrough, from Google Glass to driverless cars, promises to make our every move and download a little easier to track. Already, Silicon Valley big shots tend to talk about privacy in roughly the same paternalist language favored by government spokesmen. “If you have something that you don’t want anyone to know,” Google’s Eric Schmidt told an interviewer in 2009, “maybe you shouldn’t be doing it in the first place.”

The problem is that we have only one major point of reference when we debate what these trends might mean: the 20th-century totalitarian police state, whose every intrusion on privacy was in the service of tyrannical one-party rule. That model is useful for teasing out how authoritarian regimes will try to harness the Internet’s surveillance capabilities, but America isn’t about to turn into East Germany with Facebook pages.

For us, the age of surveillance is more likely to drift toward what Alexis de Tocqueville described as “soft despotism” or what the Forbes columnist James Poulos has dubbed “the pink police state.” Our government will enjoy extraordinary, potentially tyrannical powers, but most citizens will be monitored without feeling persecuted or coerced.

So instead of a climate of pervasive fear, there will be a chilling effect at the margins of political discourse, mostly affecting groups and opinions considered disreputable already. Instead of a top-down program of political repression, there will be a more haphazard pattern of politically motivated, Big Data-enabled abuses. (Think of the recent I.R.S. scandals, but with damaging personal information being leaked instead of donor lists.)

In this atmosphere, radicalism and protest will seem riskier, paranoia will be more reasonable, and conspiracy theories will proliferate. But because genuinely dangerous people will often be pre-empted or more swiftly caught, the privacy-for-security swap will seem like a reasonable trade-off to many Americans — especially when there is no obvious alternative short of disconnecting from the Internet entirely.

Welcome to the future. Just make sure you don’t have anything to hide.