Wednesday, January 25, 2006

Smart Web Crawlers

To classify web pages well, you need to know who's linking to them and who they link to. Features of those linking and linked pages can be helpful in making that choice. In an article at http://directmag.com/searchline/1-25-06-Google-BigDaddy/index.html , I'm reminded that crawlers aren't just about text:

"For example, Google has also begun using a search crawler built on a Mozilla browser. The new search bot is more flexible, seems faster and can read non-text content more readily; that should mean that in time, it will be able to read links within images and even within Flash video, matter that gets ignored by bots that can’t speak Javascript."

Bottom line: Crawlers have to be as good at interpreting web content as browsers.

No comments:

Post a Comment