Wednesday, October 15, 2008

Begin

For the past couple months I have been involved in a big project I set for myself: design a spider and search engine using ASP and C#.Net. I have a functioning spider and currently a search engine with a few things that need to be added. Each works in a limited domain currently. The spider follows links on a page to a next page, grabs the page title and some text, then follows the link to the next page. This occurs recursively, as a page that will not open causes the spider to backtrack until it finds another URL to follow. It can be viewed as a type of tree traversal.



The URLs are stored in a SQl database which will be read from by the search engine. The search engine has its own db for storing info about terms and other secret stuff that makes it work.

No comments: