Author Michael Wolraich writes in New York Magazine’s Daily Intelligencer about a time in 1994 when, as a recent college graduate, he worked as a government contractor to automatically sort emails from the Bill Clinton White House–including those of then-First Lady Hillary Clinton.
One day a colleague invited me to join a mysterious new project for the Executive Office of the President (EOP). The White House had hired IMC to archive its email after the court ordered it to preserve electronic records. Few people had multiple email accounts back then and many federal employees used their work accounts for personal communication, so we had to figure out some way to distinguish work email from personal correspondence.
The results were abysmal. Even after significant tweaking, I don’t recall achieving more than a 70 percent success rate, which is particularly poor when you consider that random sorting would yield 50 percent if the distribution were even. IMC ultimately scrapped our troubled sorting project in favor of a feature that allowed users to manually flag messages that should not be archived.
Our problem was that natural language — the way people ordinarily speak and write — is notoriously difficult to parse. To make sense of natural language, it’s not sufficient to recognize the words; you also need to understand grammar, appreciate nuance, interpret metaphors, grasp allusions, infer from context, and even have a sense of humor. Right now, only humans can do that reliably.
Machine learning has made great strides in the past few years. With enough training, an advanced natural-language processor would be able to sort Hillary Clinton’s emails much more effectively than the simple keyword approach that my colleagues and I devised. But Clinton’s press statement offers no indication that her team employed such technology. On the contrary, her account of the process sounds remarkably like the name and keyword filters that we tried in the 1990s.
Read the rest of the story here.