US plans massive data sweep
Little-known data-collection system could troll news,
blogs, even e-mails. Will it go too far?
By Mark Clayton | Staff writer of The Christian Science
Monitor

CONCERN: GOP Rep. Curt Weldon (l.) and Democrat Sen.
Russell Feingold want details on federal data-mining.
AP/FILE
The US government is developing a massive computer
system that can collect huge amounts of data and, by
linking far-flung information from blogs and e-mail to
government records and intelligence reports, search for
patterns of terrorist activity.
The system - parts of which are operational, parts of
which are still under development - is already credited
with helping to foil some plots. It is the federal
government's latest attempt to use broad data-collection
and powerful analysis in the fight against terrorism.
But by delving deeply into the digital minutiae of
American life, the program is also raising concerns that
the government is intruding too deeply into citizens'
privacy.
"We don't realize that, as we live our lives and make
little choices, like buying groceries, buying on Amazon,
Googling, we're leaving traces everywhere," says Lee
Tien, a staff attorney with the Electronic Frontier
Foundation. "We have an attitude that no one will
connect all those dots. But these programs are about
connecting those dots - analyzing and aggregating them -
in a way that we haven't thought about. It's one of the
underlying fundamental issues we have yet to come to
grips with."
The core of this effort is a little-known system called
Analysis, Dissemination, Visualization, Insight, and
Semantic Enhancement (ADVISE). Only a few public
documents mention it. ADVISE is a research and
development program within the Department of Homeland
Security (DHS), part of its three-year-old "Threat and
Vulnerability, Testing and Assessment" portfolio. The
TVTA received nearly $50 million in federal funding this
year.
DHS officials are circumspect when talking about ADVISE.
"I've heard of it," says Peter Sand, director of privacy
technology. "I don't know the actual status right now.
But if it's a system that's been discussed, then it's
something we're involved in at some level."
Data-mining is a key technology
A major part of ADVISE involves data-mining - or "dataveillance,"
as some call it. It means sifting through data to look
for patterns. If a supermarket finds that customers who
buy cider also tend to buy fresh-baked bread, it might
group the two together. To prevent fraud, credit-card
issuers use data-mining to look for patterns of
suspicious activity.
What sets ADVISE apart is its scope. It would collect a
vast array of corporate and public online information -
from financial records to CNN news stories - and
cross-reference it against US intelligence and
law-enforcement records. The system would then store it
as "entities" - linked data about people, places,
things, organizations, and events, according to a report
summarizing a 2004 DHS conference in Alexandria, Va. The
storage requirements alone are huge - enough to retain
information about 1 quadrillion entities, the report
estimated. If each entity were a penny, they would
collectively form a cube a half-mile high - roughly
double the height of the Empire State Building.
But ADVISE and related DHS technologies aim to do much
more, according to Joseph Kielman, manager of the TVTA
portfolio. The key is not merely to identify terrorists,
or sift for key words, but to identify critical patterns
in data that illumine their motives and intentions, he
wrote in a presentation at a November conference in
Richland, Wash.
For example: Is a burst of Internet traffic between a
few people the plotting of terrorists, or just bloggers
arguing? ADVISE algorithms would try to determine that
before flagging the data pattern for a human analyst's
review.
At least a few pieces of ADVISE are already operational.
Consider Starlight, which along with other
"visualization" software tools can give human analysts a
graphical view of data. Viewing data in this way could
reveal patterns not obvious in text or number form.
Understanding the relationships among people,
organizations, places, and things - using
social-behavior analysis and other techniques - is
essential to going beyond mere data-mining to
comprehensive "knowledge discovery in databases," Dr.
Kielman wrote in his November report. He declined to be
interviewed for this article.
One data program has foiled terrorists
Starlight has already helped foil some terror plots,
says Jim Thomas, one of its developers and director of
the government's new National Visualization Analytics
Center in Richland, Wash. He can't elaborate because the
cases are classified, he adds. But "there's no question
that the technology we've invented here at the lab has
been used to protect our freedoms - and that's pretty
cool."
As envisioned, ADVISE and its analytical tools would be
used by other agencies to look for terrorists. "All
federal, state, local and private-sector security
entities will be able to share and collaborate in real
time with distributed data warehouses that will provide
full support for analysis and action" for the ADVISE
system, says the 2004 workshop report.
A program in the shadows
Yet the scope of ADVISE - its stage of development,
cost, and most other details - is so obscure that
critics say it poses a major privacy challenge.
"We just don't know enough about this technology, how it
works, or what it is used for," says Marcia Hofmann of
the Electronic Privacy Information Center in Washington.
"It matters to a lot of people that these programs and
software exist. We don't really know to what extent the
government is mining personal data."
Even congressmen with direct oversight of DHS, who favor
data mining, say they don't know enough about the
program.
"I am not fully briefed on ADVISE," wrote Rep. Curt
Weldon (R) of Pennsylvania, vice chairman of the House
Homeland Security Committee, in an e-mail. "I'll get
briefed this week."
Privacy concerns have torpedoed federal data-mining
efforts in the past. In 2002, news reports revealed that
the Defense Department was working on Total Information
Awareness, a project aimed at collecting and sifting
vast amounts of personal and government data for clues
to terrorism. An uproar caused Congress to cancel the
TIA program a year later.
Echoes of a past controversial plan
ADVISE "looks very much like TIA," Mr. Tien of the
Electronic Frontier Foundation writes in an e-mail.
"There's the same emphasis on broad collection and
pattern analysis."
But Mr. Sand, the DHS official, emphasizes that privacy
protection would be built-in. "Before a system leaves
the department there's been a privacy review.... That's
our focus."
Some computer scientists support the concepts behind
ADVISE.
"This sort of technology does protect against a real
threat," says Jeffrey Ullman, professor emeritus of
computer science at Stanford University. "If a computer
suspects me of being a terrorist, but just says maybe an
analyst should look at it ... well, that's no big deal.
This is the type of thing we need to be willing to do,
to give up a certain amount of privacy."
Others are less sure.
"It isn't a bad idea, but you have to do it in a way
that demonstrates its utility - and with provable
privacy protection," says Latanya Sweeney, founder of
the Data Privacy Laboratory at Carnegie Mellon
University. But since speaking on privacy at the 2004
DHS workshop, she now doubts the department is building
privacy into ADVISE. "At this point, ADVISE has no
funding for privacy technology."
She cites a recent request for proposal by the Office of
Naval Research on behalf of DHS. Although it doesn't
mention ADVISE by name, the proposal outlines
data-technology research that meshes closely with
technology cited in ADVISE documents.
Neither the proposal - nor any other she has seen -
provides any funding for provable privacy technology,
she adds.
Some in Congress push for more oversight of federal
data-mining
Amid the furor over electronic eavesdropping by the
National Security Agency, Congress may be poised to
expand its scrutiny of government efforts to "mine"
public data for hints of terrorist activity.
"One element of the NSA's domestic spying program that
has gotten too little attention is the government's
reportedly widespread use of data-mining technology to
analyze the communications of ordinary Americans," said
Sen. Russell Feingold (D) of Wisconsin in a Jan. 23
statement.
Senator Feingold is among a handful of congressmen who
have in the past sponsored legislation - unsuccessfully
- to require federal agencies to report on data-mining
programs and how they maintain privacy.
Without oversight and accountability, critics say, even
well-intentioned counterterrorism programs could
experience mission creep, having their purview expanded
to include non- terrorists - or even political opponents
or groups. "The development of this type of data-mining
technology has serious implications for the future of
personal privacy," says Steven Aftergood of the
Federation of American Scientists.
Even congressional supporters of the effort want more
information about data-mining efforts.
"There has to be more and better congressional
oversight," says Rep. Curt Weldon (R) of Pennsylvania
and vice chairman of the House committee overseeing the
Department of Homeland Security. "But there can't be
oversight till Congress understands what data-mining is.
There needs to be a broad look at this because they
[intelligence agencies] are obviously seeing the value
of this."
Data-mining - the systematic, often automated gleaning
of insights from databases - is seen "increasingly as a
useful tool" to help detect terrorist threats, the
General Accountability Office reported in 2004. Of the
nearly 200 federal data-mining efforts the GAO counted,
at least 14 were acknowledged to focus on
counterterrorism.
While privacy laws do place some restriction on
government use of private data - such as medical records
- they don't prevent intelligence agencies from buying
information from commercial data collectors. Congress
has done little so far to regulate the practice or even
require basic notification from agencies, privacy
experts say.
Indeed, even data that look anonymous aren't necessarily
so. For example: With name and Social Security number
stripped from their files, 87 percent of Americans can
be identified simply by knowing their date of birth,
gender, and five-digit Zip code, according to research
by Latanya Sweeney, a data-privacy researcher at
Carnegie Mellon University.
In a separate 2004 report to Congress, the GAO cited
eight issues that need to be addressed to provide
adequate privacy barriers amid federal data-mining. Top
among them was establishing oversight boards for such
programs.
GOOGLE:
Results 1 - 10 of about 582,000 for US GOV PLANS MASSIVE
DATA SWEEP.