Beware:
the web gives posted documents a life of their own
by
Alan Zisman (c) 2009 First published in
Business
in Vancouver August 4-10, 2009 issue #1032
High Tech Office column
Don’t let this happen to you.
I
got an e-mail recently from a friend who is working for a local
environmental non-profit. (He’s asked me to keep his name and the
organization he works for out of this column.) His organization was
choosing a delegation for a conference in Europe – a committee was
considering about 100 applications, submitted as Microsoft
Word-formatted documents.
To facilitate access to all the
applications, the organization created a web page with links to the
applications. After selecting the delegation, they took the web page
and the applications offline. Or so they thought.
My friend had
gotten an e-mail from an unsuccessful applicant; she had used Google to
try to find out who had been selected. Much to her surprise, her search
hits included a link to her own application form – complete with name,
address, phone number, e-mail and other identifying information.
The
page, which had a numeric IP address rather than a standard domain
name, had a note at the top stating: “This is the html version of the
file [filename deleted]. Google automatically generates html versions
of documents as we crawl the web.” The numeric web address appeared to
be owned by Google.
Who knew Google was not only indexing the
web but also converting any documents it stumbled across into standard
web page format, and posting them to another location without the
knowledge or consent of the document creators? Or that they remained
online regardless of the fate of the original document?
(There are probably copyright issues here, but that’s for the lawyers.)
In
her e-mail, the unsuccessful applicant said she was horrified that this
information was made public – along with that of everyone else who had
applied. I would agree.
The organization had clearly made a
mistake. By posting a web page linked to the application forms that
didn’t require any sort of log in, they had made all the applications
forms public even though they didn’t publicly advertise that page. And
once the information is “out there,” anything can happen.
But
once something’s got into Google’s system, how do you get it out?
There’s no obvious way to talk to a “real person” at Google.
I
checked in with Chris Goward of Vancouver’s WiderFunnel Marketing, a
company that works closely with Google in helping clients optimize
their websites for more effective results. He pointed us to an online
form at:
www.google.com/webmasters/tools/removals,
noting that it can be used to remove a webpage from Google’s list.
According to Goward, the form is effective, though it can take a few
days before the page is removed.
He pointed out that anything
posted on the web may sooner or later show up in a Google search list,
unless it’s behind a firewall, which is the usual practice for
corporate networks, in a password-protected area or has a robots.txt
restriction. (Robots.txt is a standard file used to request that search
engine indexers ignore specified files or folders. Note that word
“request”; compliance with robots.txt restrictions is on the honour
system.)
Employees at the non-profit got to work, filing removal
forms for each of the pages that Google had created for their 100-odd
applicants. As far as I can tell, it worked – Googling an applicant’s
name no longer brings up the application form in the search results.
Lots
of people are pretty sloppy with personal information online. If you
post your own information, that’s one thing. But we all need to be held
accountable if we post other people’s information – employees,
customers, friends – online. As we’ve seen, good intentions aren’t
enough – anything posted online is liable to show up in a Google search
– and may be given a life of its own beyond your intended use. •