Monday, March 2, 2009

Huh. How 'bout dat

Well. My real road block at this point was extracting letters from the test data character boxes because I lost my code from last year. But losing the code wasn't the real bad part - the code wasn't very good in the first place. It had all sorts of hough transforms in it.

Instead, I decided to take a blank sheet and do normalized cross correlation with it against a filled in box. This way, I could find the center of the filled in boxes. I basically figured out how big the boxes are, and can now extract the letter mo' betta. The red circle is where the normalized cross correlation signal was the strongest.

Zample:


To extract the letters, I found that the thickness of the character box lines were about 3 pixels wide so I took this into account.

The result:


Yee-haw. I took the output of this and fed it into my "get rid of white space" function, and resized the letters to 24 by 24 pixels. I also checked if the sum of the pixels of the character image are above a certain threshold, and if so, I assumed there was no letter there. I display these as gray cells with an x through them. This resulted in the following:


Here is another zample.


There is an issue though. SOME people (I'm not going to name any names) cannot write inside character boxes. Here is an example:


Because of that last 'a', my algorithm thinks there's a legitimate character there. See:


However, as far as I can tell, if someone can't keep their characters inside the boxes, they don't deserve to get their quiz graded! They can play games... but so can we! They want to play games, so be it! Let the games begin!

2 comments:

shiaokai said...

yo, babs linked me to your blog. your writing style is stuntastic. keep up the good work.

swarley said...

shieeeeeet