Telephone Records are just the Tip of NSA's Iceberg
The National Security Agency and other U.S. government
organizations have developed hundreds of software
programs and analytic tools to "harvest" intelligence,
and they've created dozens of gigantic databases
designed to discover potential terrorist activity both
inside the United States and overseas.
These cutting edge tools -- some highly classified
because of their functions and capabilities --
continually process hundreds of billions of what are
called "structured" data records, including telephone
call records and e-mail headers contained in information
"feeds" that have been established to flow into the
intelligence agencies.
The multi-billion dollar program, which began before
9/11 but has been accelerated since then. Well over 100
government contractors have participated, including both
small boutique companies whose products include
commercial off-the-shelf software and some of the
largest defense contractors, who have developed
specialized software and tools exclusively for
government use.
USA Today provided a small window into this massive
intelligence community program by reporting yesterday
that the NSA was collecting and analyzing millions of
telephone call records.
The call records are "structured data," that is,
information maintained in a standardized format that can
be easily analyzed by machine programs without human
intervention. They're different from intercepts of
actual communication between people in that they don't
contain the "content" of the communications -- content
that the Supreme Court has ruled is protected under the
Fourth Amendment. You can think of call records as
what's outside the envelope, as opposed to what's on the
inside.
Once collected, the call records and other non-content
communication are being churned through a mind boggling
network of software and data mining tools to extract
intelligence. And this NSA dominated program of
ingestion, digestion, and distribution of potential
intelligence raises profound questions about the privacy
and civil liberties of all Americans.
Although there is no evidence that the harvesting
programs have been involved in illegal activity or have
been abused to reach into the lives of innocent
Americans, their sheer scope, the number of
"transactions" being tracked, raises questions as to
whether an all-seeing domestic surveillance system isn't
slowly being established, one that in just a few years
time will be able to reveal the interactions of any
targeted individual in near real time.
In late November 1998, the intelligence community and
the Department of Defense established the Advanced
Research and Development Activity in Information
Technology (ARDA), a government consortium charged with
incubating and developing "revolutionary" research and
development in the field of intelligence processing.
The Director of the National Security Agency (NSA)
agreed to establish, as a component of the NSA, an
organizational unit to carry out the functions of ARDA,
overseeing the research program of the CIA, DIA,
National Reconnaissance Office, and other defense and
civilian intelligence agencies.
Beginning before 9/11, ARDA established an "information
exploitation" program to fund and focus private research
on operationally-relevant problems of exploiting the
increasing torrents of digital data available to the
intelligence community. Even with thousands of analysts,
NSA and other agencies were falling behind in their
ability to handle the volume of incoming material.
Existing mainframe machine aided processes were also
falling behind advances in information processing,
particularly as the cost of computing power dramatically
declined in the 1990s.
The information exploitation research program has funded
hundreds of projects to find better ways to "pull"
information, "push" information, and "navigate" and
visualize information once assembled.
Pulling information refers to the ability of supported
analysts to have question and answer capabilities.
Starting with a known requirement, an analyst could
submit questions to a Q&A system which in turn would
"pull" the relevant information out of multiple data
sources and repositories. NSA is seeking a Q&A system
that can operate autonomously to interpret "pulled"
information and provide automatic responses back to the
analysts with little additional human intervention.
Pushing information refers to the software tools that
would "blindly" and without supervision push
intelligence to analysts even if they had not asked for
the information. Research has sought to go beyond
current data mining of "structured" records deeper
profiling of massive unstructured data collections.
Under the pushing information research thrust companies
have been involved in efforts to uncover previously
undetected patterns of activity from massive data sets.
Software and tools are also being developed that will
provide alerts to analysts when changes occur in newly
arrived, but unanalyzed massive data collections, such
as telephone records.
The effort to navigate and visualize information seeks
to develop analytic tools that will allow agency
analysts to take hundreds or even thousands of small
pieces of information and automatically create a
tailored and logical "picture" of that information.
Using visualization tools and techniques, intelligence
analysts are constantly seeking out previously unknown
links and connections between individual pieces of
information.
Intelligence community efforts to process "structured"
data includes data-tagged signals intelligence (SIGINT)
monitoring of telephone and radio communications,
imagery, human intelligence reporting, and "open-source"
commercial data, including news media reporting.
"Unstructured" data includes news and Internet video and
audio and document exploitation.
I could write volumes about the research efforts and the
software programs and tools used to process the
mountains of information the NSA and other agencies
ingest. No doubt over the coming days and weeks, more
will be written. For today though, I provide a pointer,
based upon my research, of software, tools and
intelligence databases that I have been able to identify
in government documents relating to data mining, link
analysis, and ingestion, digestion, and distribution of
intelligence. My hope would be that other journalists
and researchers will follow the leads.
The following is a list of some 500 software tools,
databases, data mining and processing efforts contracted
for, under development or in use at the NSA and other
intelligence agencies today:
CLICK FOR SOURCE W/LIST:
http://blog.washingtonpost.com/earlywarning/2006/05/telephone_records_are_just_the.html
TrackBack URL for this entry:
http://blog.washingtonpost.com/cgi-bin/mt/mtb.cgi/6912
Listed below are links to weblogs that reference
Telephone Records are just the Tip of NSA's Iceberg:
» Public, Private, New and Naked..." from "MotherPie
Transparent to the world is what we are becoming...and
how do we manage it? Mind-blowing breakdown of
boundaries, like the exhibition of skinless cadavers
that revealed things we know are there but that we don't
look at so intimately on read more »
http://motherpie.typepad.com/motherpie/2006/05/public_private_.html