Saturday, December 20, 2008

Cloud

Tonight I'm working on a tag cloud based upon the number of db entries for terms. I use row variables, add them up for each row, sort in ascending order. Then I display the top ten terms inside a panel with label.Font.Size = value for each one's row. So I get size relevancy to the number of times a term was searched. In the future I will have to come up with a scaling factor because I can't have 567 point text.

I actually had to use a decrement in the for loop because I was "rolling down the stack" to make room for the newest highest searched entries.

Friday, December 5, 2008

Last Night

Last night I was working on the parse portion of the search engine. I'm getting rid of words like "and" "or" and "the". I don't need them, yet, for searches. I was also getting bad results because I was not looking at URLs when there was no second term in the search query. Basically I forgot to use "" instead of null, so things got messed up.

Wednesday, November 26, 2008

Favicon is Evil

I am working on the spider again tonight. I learned a few things. Like HTTPWebRequest.timeout exists. And that favicon.ico is a pain in the butt, it always comes up and stops the spider. I've been using Regex to parse strings that have chars I just don't want to be in URLs.

I was reading about other people making web spiders and they are just out grabbing links, while I am recursively searching links to other links. As such, I am verifying that these pages get responses. I don't travel to them if I can't. Although I use a page that provides random links as a starting point, I'm following them as far as I can go.

some stuff:

wreq.Timeout = 60000;
Regex r = new Regex("favicon");
if(r.Success)
etc...

Tuesday, November 18, 2008

Pretty much

Well the spider is running pretty much by itself. Used some conditional breakpoints for debugging today. Was dealing with stringURL getting too long to be stored in a database record, basically I am truncating it. I have rearranged the code so that I don't click a button to find each new URL but instead it runs in a loop. I actually went out to dinner, came back, and it was still finding URLs.

Thursday, November 13, 2008

Regexing

I was doing some Regexing last night in dealing with the input string search string. An interesting function it is, that Regex. Basically I want to keep weird chars out.

Saturday, November 8, 2008

How Would I?

So how would I handle one word search queries from the textbox on the search page?

Thursday, November 6, 2008

Searchin'

I'm gonna move back to the search engine part of my project, but first...I had an idea for collecting pages to mine. If I increment through IP URLs, and look at the response each gives I can categorize them and store in a DB. This gives a big basis for looking around without running into a wall.

So I changed the part where:

if(affinity>=highest)

to:

if (affinity>=highest)
{
highest=affinity;
affinityTwin=whatever;
}

because I made the rookie mistake of not changing affinity to the current highest value when trying to find the highest of all values in a list.


So I've got the AI components going on. It's a little bit Fuzzy Logic and a little bit Neural Net.
More on this later.

Friday, October 31, 2008

Yah, spider

Ok, I've got my spider spidering around the internet, grabbing text, the title, and links from web pages. These get stored as database records which are searchable by my search engine. Right now I'm getting exceptions when the spider grabs too much text for the DB to handle(and I know how to fix this:truncation) and when the spider runs out of pages to look at(I know how to fix this--secret).

And if you want to taste something good...put canned salmon on a cracker that has havarti spread on it. This is heavenly.

Tuesday, October 28, 2008

eager

I'm eager to get back to programming. I'm gonna work on the web spider for awhile, perfect it. I took the weekend off and just relaxed...well, I did study a bit. I want to work on the recursive algorithm(only recursive in an implicit sense), and make sure the spider can go off on its own to keep looking for pages...so I can fill that database.

Saturday, October 25, 2008

Errors

Once again I am studying error handling. I can't say it's the most interesting topic in the world.
I'm coming up with a definite hierarchy of values to rank URLs for inclusion in results lists.

Friday, October 24, 2008

Not back to the spider yet

What I'm working on tonight is how to rank my results while finding them in the db. Basically, the number of terms found in a result gives a higher rank. Then I plan to sort an array depending on rank for display purposes.

I was just reflecting the other day that there is an AI component to my search machine.

Wednesday, October 22, 2008

Back to the spider

It's almost time to go back to my spider that I created and make it self sufficient. I need to start populating my search db bigtime. I feel a little nervous because I haven't worked on that part of the project for a while. For the search engine part I am close to having a nice prototype. The last code I entered was:

if(!found)
{
row[i]=term2;
row[i-1]=1;
}
toboAdapter.Update(toboTable);

Tuesday, October 21, 2008

Ok got it

I figured out that I needed to have a primary key for my database--now I can change entries.
Got a little further today, working on the parsing of the search string. Deciding whether to use an array of strings or not.

Sunday, October 19, 2008

DataTable woes

For some reason I am unable to update a DataTable.

I've got a foreach(DataRow row in dataTable.Rows)

but if I assign row[whatever]=something;
and go row.AcceptChanges();

the changes don't get applied.
I think I may have to create a row object and do an Update. That's my initial hunch.
It worked in code above the foreach, but it was a little different there.

Saturday, October 18, 2008

Objects

It turns out you can't go:

row[i+1]++

because objects can't be incremented.
You can get around this by having a temporary variable.

Friday, October 17, 2008

Friday Night

Tonight I may see Madraso at The Funhouse with Matt.

I'm thinking about a way to enter the "strength" of a bond between two words, into my db. I can locate entries now, I just need to go one step further and increment the strength of bond, or if the word has never been seen, create a entry and strength for it.

Something like:

if(located)
{
row[location+1]++;
}
if(!located)
while(notPlaced)
{
if(row[i]==null)
{
row[i]=term;
row[i+1]++;
notPlaced==false;
}
}

Wednesday, October 15, 2008

Begin

For the past couple months I have been involved in a big project I set for myself: design a spider and search engine using ASP and C#.Net. I have a functioning spider and currently a search engine with a few things that need to be added. Each works in a limited domain currently. The spider follows links on a page to a next page, grabs the page title and some text, then follows the link to the next page. This occurs recursively, as a page that will not open causes the spider to backtrack until it finds another URL to follow. It can be viewed as a type of tree traversal.



The URLs are stored in a SQl database which will be read from by the search engine. The search engine has its own db for storing info about terms and other secret stuff that makes it work.