Friday, October 31, 2008

Yah, spider

Ok, I've got my spider spidering around the internet, grabbing text, the title, and links from web pages. These get stored as database records which are searchable by my search engine. Right now I'm getting exceptions when the spider grabs too much text for the DB to handle(and I know how to fix this:truncation) and when the spider runs out of pages to look at(I know how to fix this--secret).

And if you want to taste something good...put canned salmon on a cracker that has havarti spread on it. This is heavenly.

Tuesday, October 28, 2008

eager

I'm eager to get back to programming. I'm gonna work on the web spider for awhile, perfect it. I took the weekend off and just relaxed...well, I did study a bit. I want to work on the recursive algorithm(only recursive in an implicit sense), and make sure the spider can go off on its own to keep looking for pages...so I can fill that database.

Saturday, October 25, 2008

Errors

Once again I am studying error handling. I can't say it's the most interesting topic in the world.
I'm coming up with a definite hierarchy of values to rank URLs for inclusion in results lists.

Friday, October 24, 2008

Not back to the spider yet

What I'm working on tonight is how to rank my results while finding them in the db. Basically, the number of terms found in a result gives a higher rank. Then I plan to sort an array depending on rank for display purposes.

I was just reflecting the other day that there is an AI component to my search machine.

Wednesday, October 22, 2008

Back to the spider

It's almost time to go back to my spider that I created and make it self sufficient. I need to start populating my search db bigtime. I feel a little nervous because I haven't worked on that part of the project for a while. For the search engine part I am close to having a nice prototype. The last code I entered was:

if(!found)
{
row[i]=term2;
row[i-1]=1;
}
toboAdapter.Update(toboTable);

Tuesday, October 21, 2008

Ok got it

I figured out that I needed to have a primary key for my database--now I can change entries.
Got a little further today, working on the parsing of the search string. Deciding whether to use an array of strings or not.

Sunday, October 19, 2008

DataTable woes

For some reason I am unable to update a DataTable.

I've got a foreach(DataRow row in dataTable.Rows)

but if I assign row[whatever]=something;
and go row.AcceptChanges();

the changes don't get applied.
I think I may have to create a row object and do an Update. That's my initial hunch.
It worked in code above the foreach, but it was a little different there.

Saturday, October 18, 2008

Objects

It turns out you can't go:

row[i+1]++

because objects can't be incremented.
You can get around this by having a temporary variable.

Friday, October 17, 2008

Friday Night

Tonight I may see Madraso at The Funhouse with Matt.

I'm thinking about a way to enter the "strength" of a bond between two words, into my db. I can locate entries now, I just need to go one step further and increment the strength of bond, or if the word has never been seen, create a entry and strength for it.

Something like:

if(located)
{
row[location+1]++;
}
if(!located)
while(notPlaced)
{
if(row[i]==null)
{
row[i]=term;
row[i+1]++;
notPlaced==false;
}
}

Wednesday, October 15, 2008

Begin

For the past couple months I have been involved in a big project I set for myself: design a spider and search engine using ASP and C#.Net. I have a functioning spider and currently a search engine with a few things that need to be added. Each works in a limited domain currently. The spider follows links on a page to a next page, grabs the page title and some text, then follows the link to the next page. This occurs recursively, as a page that will not open causes the spider to backtrack until it finds another URL to follow. It can be viewed as a type of tree traversal.



The URLs are stored in a SQl database which will be read from by the search engine. The search engine has its own db for storing info about terms and other secret stuff that makes it work.