“Deep Web” sounds kind of cool, but exactly what is it?

The Deep Web aka Invisible Web is simply locations in the Internet where search engines can’t index. Therefore they are ‘dark’ to any search engine you use.

Why are they dark?

Well, Google, Bing, and other engines use spider bots to crawl the web and index content. If the content can’t be crawled, there has to be some reason for that -

  1. Can not be reached – The content is behind some kind of bot unfriendly interface, security block, has corrupted code, uses Flash or some other reason the bot can’t traverse to the content. This also includes commercial databases that require login.
  2. Content is unreadable to the bot – ie. a picture, a movie, a pdf file with no metadata, or other non-html content. Bots can only read html, nothing more.

How much is in the Deep Web?

There are several studies on how much of the Internet can’t be indexed. Many of the studies are over 5 years old. At this point, the best guess which is conservative is that 90% of the Internet is Deep Web.