ScrubTheWeb.com, the 20+ year old search engine, stopped running their engine early in May 2017. The engine delivered relatively good search results without considdering a website's popularity. Their web page, at the time of writing, is still up, but the search box is gone, replaced by a for-sale notice and a link to a domain seller. (Note: since I wrote this the search box is back up and the link to the domain seller's gone, but the engine isn't back; if you click on the search box you get a message noting the engine's closed.) Any link to an old search returns an error message. While the site does say that their SEO customers will still be able to access their SEO tools, this is only til their contracts run out. There is no indication that they'll re-launch their engine. A message to their registered users thanks them for supporting it, and notes that running the engine had become too expensive. Now I'm depressed.
I first saw ScrubTheWeb a few years ago, when I was looking for alternatives to Google, Yahoo (still actively crawling at the time, with their own index, if memory serves,) and Bing. I found ScrubTheWeb, and rejected it; I couldn't find the search box. At the time they had three text boxes crammed into their site's header: a web search box, a site search box, and an email-list sign-up box. Besides that, the slightly-dated, keyword-and-meta-tag focused SEO advise seemed sketchy. I saw them again, several times, before finding their web search box. When I entered a query, I was promptly greeted with a captcha, which made me susupect they were hiding the engine because they were embarrased by it's results. I entered the required characters and was blown away; the results were good. I was expecting low-quality, spammy results, on par with the "Search the Ads" forms seen on parked domains. But the results weren't bad; they were on par with the major search engines. I used it a few other times, usually satisfied. I tried using it for an advaced query; I wanted to know how many TV series had 2, but not 3, seasons. I came up with a query to search epguides.com, a site which lists TV show episodes by season: site:epguides.com +"season 2" -"season 3" Google choked, only returning three pages of results, but ScrubTheWeb returned page-after-page of correct, relevant results. It became my 1st or 2nd choice for searches, depending on the day.
I tried to research ScrubTheWeb, but, inspite of its age and claimed "top 10" placement, I could find nothing on it's history. There was an old thread on WebMasterWorld's forum, where someone asked why they were seeing traffic from it, and someone else chimed in to say the engine worked well. It's spider, Scrubby, was listed on a few pages about web robots, who ran them, their IP addresses, that sort of thing. There was a link on an old search engine site which noted they were too involved in SEO. And there were a few threads and reviews which confirmed they weren't a scam. I needed to know more. (Note: I originally mis-identified ScrubTheWeb's spider as Roverbot.)
I looked the site up in the Internet Archive's Wayback Machine. I traced it's development from 1997 to the then-current version. They had three major design changes. Their 1st design looked like an example page from Creating Killer Webpages, and it used a Spiderman-like logo. There was a pop-up box noting that the engine wasn't 100% ready yet. This one appears to have been a directory; website owners were instructed to submit site keywords and descriptions; I find no evidence they had a crawler till their nextr version. Their 2nd design looked like a directory and had a spider with a scrub brush as the logo. This one appears to be the 1st version with a web crawler; there's a "check your site" box and the directory appears to be a list of recent searches. The 3rd design de-emphasized the search box, and placed their SEO advice front-and-center. I wanted to know more.
And I should now confess that the previous sections haven't been 100% chronological. I wouldn't have susupected that site:epguides.com +"season 2" -"season 3" would have worked well in ScrubTheWeb unless I'd read some of their old pages in the Internet Archive. The next sections should be in chronological order.
I emailed ScrubTheWeb in 2016. I praised them for outlasting other 90's search engines, like Excite, Infoseek, and AltaVista. I asked if they had a NEAR opperator. I remembered from an archived version of their site they started off trying to offer a better index than AltaVista. Since AltaVista's NEAR opperator was popular I figured ScrubTheWeb might have had one. I got an email back. They were very kind. They said they didn't have a NEAR opperator, that they were the 1st engine to offer tools to help webmasters, and that they were about to launch a revamped version of their site after more than 2 years of work.
I still wanted to know more, so I asked why they survived when others didn't. They said that they never planned to get rich, that they started out trying to "educate people on how the internet and search engines work." They noted that search engines don't make money; ads do, and that Google's real product was an advertising network. They also noted that people try to crawl most engines' results, to scrape content and to track their site's rank for different terms. To preserve resources ScrubTheWeb used a captcha.
A couple of months later they launched the new version of their site. The layout was better; the search box was placed front-and-center, and the menus fit on my phone's screen more easily, but it required Jscript. I emailed them, complimented them on the new design, and asked why I needed scripting turned on. They responded, noting that most sites required jscript and cookies, and said that they hoped I had "a chance to play around and get a good feel of things." I did, but I must admit I used their engine less; client-side scripting eats through too much bandwith. They also switched to concept-based searching, running everything as an "OR" search preferring documents that contain keywords which may be related to the query. I generally prefer searches to contain all my keywords, not some mix of my words and words which might be related. Maybe an engine should provide options for further searches, like a list of potentially related words I could choose from, but they shouldn't substitute my keywords for something else! Still I used ScrubTheWeb on occasion. Advanced searching still worked well, I always like having more than one resource, and their results weren't influenced by popularity. ScrubTheWeb had impressed me, so I returned.
I still wondered, off and on, if ScrubTheWeb really didn't consider popularity when ranking search results. That's the thing that made Google big isn't it? A link-based popularity formula, weighted by click through rate and things like that? Recently I took a look at archived searches from HotBot, from 1999, before Inktomi came up with their own link formula. The results reminded me of SrcubTheWeb's. I know Inktomi had trouble placing the official version of a resource at the top of their results; that's why they lost their contract with Yahoo, and I know ScrubTheWeb had the same problem. ScrubTheWeb's 1st result for a sentence from an IRS document came from a mirror site instead of IRS.gov, and while Yahoo.com did appear in ScrubTheWeb's top ten results for the keyword "yahoo," it wasn't the 1st result. Still, this wasn't a problem for general informational searches or for advanced searches. In fact, giving extra weight to popular sites is harmful when trying to find general information; it incorrectly limits avaliable information. I know quality sites with documents relevant to my queries get burried by similar documents in more popular sites; I saw these less-popular sites when searching ScrubTheWeb. Will I ever find them again? I don't know; Inference Find's solution seems like a better idea than PageRanking; seperate .gov, .edu, and major companies from the rest of the results. This way everyone gets a shot, and official information isn't outranked by unofficial mirrors.
Less than half a year after their upgrade, the ScrubTheWeb was shuttered, and the domain was put up for sale. I didn't expect them to close. Even if I dislike jscript, the engine worked well, it had been online for well over a decade, the owners were nice, and they'd just launched a new, upgraded version. Now I don't know what to do. There are some searches that Google and Bing just aren't good at; they can only find relevant documents if they're on popular websites. Popularity!=Relevancy.
I don't know where else to put this, and it's kind of pointless now, but it may be of historical interest. ScrubTheWeb returned up to 20 pages of search results, 10 links per page, with a title, description, and URL. It could query up to ten words at a time, and it was able to handle both boolean and phrase searching. They had four major design changes, the three I found in the Internet Archive, plus the last upgrade from 2016, which should now be in the Internet Archive as well. It was made by A*B*S, Absolute (or Advanced) Business Systems, a software company out of Arizona, and it was mostly programed by one person. When I first started using it they claimed to have an index of about a billion documents; after the upgrade they claimed to be able to index 1.5 billion documents per server. (Most of this is from memory, forgive me if I get a few details wrong.)
As far as I know the last remaining 90's search engines are Aliweb (1993, no longer crawling but returning results,) WhatUSeek (1995, at least, more of a directory today, but still has an active crawler,) NorthernLight (1997, went off line in 1999, came back around 2007, current public index only includes recent business news articles,) Thunderstone's Web Catalog (1998, crawls sites and organizes them into a directory, but does not index individule pages,) Zerx (1999, at least, returns many dead links,) Findia.net (1999,) and Google (1996, under a subdirectory of Stanford's domain, 1998 with their own domain.)Return to WebSiteRing's Home Page