parameter
status[870]
World Wide Web Wanderer / Wandex


http://www.mit.edu/~mkgray/net/background.html

     
Language



Launched June 1993
Closed January 1996



Developer



Country of Origin



Owner



Topic Universal



Region



Technology
and/or
Strategy



Used SeEn






Older Version Internet Archive / WebCite





wiseGEEK: »In 1993, not long after the creation of the World Wide Web, Matthew Grey developed the World Wide Web Wanderer, which was the first web robot. The World Wide Web Wanderer indexed all of the websites that existed in the internet by capturing their URLs, but didn’t track any of the actual content of the websites. The index associated with the Wanderer, which was an early sort of search engine, was called Wandex.« Source

SalientMarketing: »The first web robot was the creation of Massachusetts Institute of Technology (MIT) physics student Matthew Gray in 1993. Gray’s World Wide Web Wanderer was designed to track the growth of the then-infant Web. “I wrote the Wanderer to systematically traverse the Web and collect sites,” Gray wrote of his invention. “I was initially motivated primarily to discover new sites, as the Web was still a relatively small place. The Wanderer was the primary tool for collection of data to measure the growth of the Web. It was the first automated Web agent or “spider.” The Wanderer was first functional in spring of 1993 and performed regular traversals of the Web from June 1993 to January 1996.” During its three-year run, the Wanderer tracked the growth in web sites from 130 in June 1993, to more than 100,000 in January 1996 and an estimated 230,000 just six months later. Gray extended the scope of the Wanderer from tracking the Web’s size to capturing individual URLs into Wandex, the first web database. Gray’s good intentions also created controversy as early versions of the Wanderer were also known to not just crawl the Web, but slow traffic on the Web to a crawl as the program repeatedly accessed the same pages hundreds of times a day. The problem was fixed in later versions.« Source

Wikipedia: »The World Wide Web Wanderer, also referred to as just the Wanderer, was a Perl-based web crawler that was first deployed in June 1993 to measure the size of the World Wide Web. The Wanderer was developed at the Massachusetts Institute of Technology by Matthew Gray, who, as of 2017, has spent a decade as a software engineer at Google. The crawler was used to generate an index called the Wandex later in 1993. While the Wanderer was probably the first web robot, and, with its index, clearly had the potential to become a general-purpose WWW search engine, the author does not make this claim and elsewhere it is stated that this was not its purpose. The Wanderer charted the growth of the web until late 1995.« Source

Matthew Gray (30 Jun 93): »I have written a perl script that wanders the WWW collecting URLs, keeping tracking of where it's been and new hosts that it finds. Eventually, after hacking up the code to return some slightly more useful information (currently it just returns URLs), I will produce a searchabe index of this. There is a complete list of all the sites it has found at A complete list of sites found by the W4 (World Wide Web Wanderer) I'll announce here when we get this index properly running, however it probably won't be until sometime in August, as I am going on vacation. Until then... « Source

Matthew Gray (30 Jun 93) »Ok, how "big" is the Web. Here is what W4 has found out. Actually, first I'd better explain a little bit about what the wanderer does. It does a simple depth first search, with an added feature I call 'getting bored'. That is, if it finds a number of documents that have the same URL, up to the last field (eg http://foo/bar/blah, http://foo/bar/baz, http://foo/bar/more) it will eventually get 'bored' and skip it. This makes it go a little quicker. Of course, it potentially is losing some documents here, but probably not. W4 took many hours (maybe 20) to run, but I don't remember exactly, because it saves state so I could kill it and restart it whenever I wanted. Well, in total, the W4 found more than 17,000 http documents (didn't follow any other kinds of links) and more than 125 unique hosts. In the current version, it *only* retrieved the URL of the document. In the next version, I hope to have it do the following other things. o Get the Title of the document o Get the length of the document o Do a 'keyword' analysis of the document o Count the number of links in a document o Improve on the boredom system By a 'keyword' analysis, I mean looking at the document for words that appear frequently, but aren't normally common words. Additionally, titles and things appearing in headers would be good candidates for keyword searches. I'll try and get the current code at least clean enough that I'm willing to let everyone in the world to see it, but if you *really* want to see it now, send me mail. Any other suggestions would be welcome. Once this index is produced, it will be searchable via http, and I suppose by WAIS though I really detest the way WAIS restricts searches. In any case, there is a possibility that this will be done by the end of the summer.« Source




Name




Critical points




Features & Functionality




More




References & further Publications

Wikipedia (EN): n.a.
Wikipedia (Others): n.a.
     

Other Sources

SalientMarketing: The first web robot - 1993 URL: http://www.salientmarketing.com/seo-resources/search-engine-history/web-robot.html
WiseGEEK: What Was the First Search Engine? URL: http://www.wisegeek.com/what-was-the-first-search-engine.htm
Gray, Matthew (1996): Web Growth Summary URL: http://www.mit.edu/~mkgray/net/printable/web-growth-summary.html
Georgi Dalakov: World Wide Web Wanderer of Matthew Gray URL: http://history-computer.com/Internet/Conquering/Wanderer.html
Gray, Matthew (1996): Internet Growth Summary URL: http://www.mit.edu/~mkgray/net/printable/internet-growth-summary.html




Created: 2016-03-28