Recognition of Handwritten Names: 2008

Tuesday, March 11, 2008

LoG filter not too hot either

I still need to play with the kernel size but so far LoG (Laplacian of Gaussian) hasn't been helping out at all.

Using a bigger binomial kernel

Using a bigger kernel didn't seem to help the nearest neighbor out :(

Incorporating another set of NN

Currently when I run my algorithm, I come up with the most likely set of states that made up the character images. In other words, I come up with a predicted name. I now run nearest neighbor with the predicted name against the roster names and say that the nearest neighbor is my predicted name. This way my predicted name is at least a valid possibility.

And now... I get 100% accuracy. That's right. Yeah yeah, the roster only has 7 names but it is a start!!1!!.

The way I calculate the distance between two strings is I have a digit array with the indices of the characters - 'a' maps to 1, 'b' to 2, 'c' to 3 etc. Then I use Euclidean distance. In the future I can apparently use edit distance but Euclidean is fine for me for now.

Ok there is one more hack in there. The names are not all the same length, and in order for nearest neighbor to work, each feature vector needs to be the same size. The way I do this is have all of the feature vectors be the size of the longest name on the roster. The extra slots in the feature vector for names not as long as the longest name are set to zero. I require that the nearest neighbor returned be the same length as the query vector (before I add zeros). I think that this step helps a lot especially since the roster is so small. The next step is to have a test set that is much larger.

Filtering with a discrete binomial kernel

I tried filtering both the training data and test data before running nearest neighbor on it to try and improve the nearest neighbor performance. Here is an example of a filtered test name:

As compared to the original unfiltered:

I used a 7x7 binomial kernel which turns out to be the following divided by 4096.


     1     6    15    20    15     6     1
     6    36    90   120    90    36     6
    15    90   225   300   225    90    15
    20   120   300   400   300   120    20
    15    90   225   300   225    90    15
     6    36    90   120    90    36     6
     1     6    15    20    15     6     1

This actually hurt my performance. For example, previously my accuracy on "James Carter" was ~45%. Now I predict "Geora Rdrter" and my accuracy is ~36%. Similarly, for "Richard Nixon" I now predict "Gerdora Rtero" and my accuracy is ~8.3%, when it used to be 25%.

It turns out that both 3x3 and 5x5 binomial filters are better than the 7x7, but still never improve performance; for the most part filtering with a binomial kernel hurts performance.

Monday, March 10, 2008

Emission probabilities

An emission probability in my case is P(image|letter) ... the probability of the image given that you know it is of a particular character. In other words, the probability of a certain letter looking like this. The way I've been calculating this probability is P(letter|image), since I've been doing my own version of the nearest neighbor density. P(letter|image ) != P(letter|image).. this was wrong! I can use Bayes' rule to turn the probability around though:

P(image|letter) = P(letter|image)P(image) / P(letter)

I don't know what to plug in for P(image) so for now I've just been using 1 - each image has equal probability. For P(letter), this is easy to find - just look at how many times each letter appears in the roster and divide by the total number of characters. Since P(image) = 1, my new equation is:

P(image|letter) = P(letter|image) / P(letter)

I noticed that if a letter has high probability, P(image|letter) will go down (compared to a less likely character. The probability will go up because the denominator will always be <= 1) and if P(letter) is very low then P(image|letter) will be very high.

My results became worse once I took this new probability into account.

Results after scaling

Like I said in my last post, the test images ranged from around 111 - 255, while the training images ranged from 0 - 255. I changed the range of the test images to also be 0 - 255 and I believe that my results improved.



Query name       Predicted Name   Accuracy
Lyndon Johnson   Gerdon Aninson   53.84615%
Gerald Ford      Gerald Rerd      80%
George Bush      George Resh      80%
Richard Nixon    Rildora Rtero    25%
William Clinton  William Clinton  100%
Ronald Reagan    Ronald Rerdon    75%
James Carter     Geora Carter     45.45455%

Saturday, March 8, 2008

Looking into why my nearest neighbor results are awful

Here I am comparing one of my test L's to half of my training L's. The test L is always on the right. Also the test L is a lot lighter than the training L's. In particular, the minimum value of the test L is 111 whereas the minimum value of the training L's is consistently 0. I am not sure if this will have an impact on my results or not.

Similarly, here are some 'o' comparisons, where the test 'o' is always on the right. No training 'o' matches the test 'o' exactly in size, which is a problem.

I'm not quite sure that the Viterbi algorithm is the right thing to be using for my case because it does not take into account the lexicon (roster) as much as it should.

Monday, March 3, 2008

Viterbi getting a little better

I have implemented an "isPrefix" function in Matlab that works fine. The problem is that I don't know how to incorporate it into the Viterbi algorithm. Instead of using that, I have a function that takes in a character and an index and returns true if there exists a string in the roster with that character at that index, and false otherwise. I use this function when assigning emission probabilities - I only assign a character a positive emission probability if this character could be for realz.

Now the algorithm guesses "william clinton" correctly, but "lyndon johnson" it now guesses to be "lildon joreron", and "gerald ford" to be "georicarin". This algorithm will work especially badly for large rosters where the function taking in a character and an index will almost always return true. However with my small roster of 7 this function has helped.

Instead of having an "isPrefix" function, I plan to implement an "isSuffix" function because of the way that the Viterbi algorithm works when finding the most likely sequence. It starts at the end and works its way backwards.

Sunday, March 2, 2008

Viterbi experiments

I implemented the Viterbi algorithm and my results are not good yet but I am optimistic. Here is the setup:

I have 7 test images, here is the roster:

lyndon johnson
george bush
gerald ford
richard nixon
william clinton
ronald reagan
james carter

The names are all lower case because I'm not taking case into account yet.

I have calculated the transition probabilities - so I have a 26x26 matrix containing the probability of transitioning from 1 character to another. Here is how I calculated that:

P(o|e) = Count(oe) / Count(e)

This means that the probability that I see the sequence "oe" is the number of times I saw the sequence "oe" divided by the number of times I just saw the letter "e".

Emission probabilities, in my case, are supposed to mean "the probability of seeing this character image given that I know what the character is". For example, the probability of seeing this character image given that I know it is an 'a'. That probability is hard to calculate, I would have to use Gaussians and other crazy math. So, instead of doing that, for now I am doing a very simple nearest neighbor probability calculation. Given a character image, I run nearest neighbor against all of my training data and take the top 40 votes. I say that the probability that this character is an 'a' is the amount of votes for 'a' + 1 divided by 40 + 26. Ideally I would just say that the probability that this character is an 'a' is the amount of votes for 'a' / 40. However, my nearest neighbor framework is working pretty poorly so just because I have no votes for 'a' I don't want to completely eliminate this possibility. Therefore, no letter has zero probability.

I thought that even though my nearest neighbor framework is working pretty badly, that the transition probabilities would help out a lot because the roster is so small. Not really the case though!

Here is an example of an input name:

Here is what my algorithm came up with for the name:
licallinilicl

Yeah, pretty bad. At first I was like "are those even valid transitions?!" but it's true, each of those transitions are valid. For example "li", "William" is in the roster. So, I need to figure out a way to take more into account than just the bigrams. Maybe I should have an "isPrefix" function such that as I'm going through the HMM, only give something a probability if it is a valid prefix of a name. Ok that is what I will do next.

Monday, February 25, 2008

Experimenting with nearest neighbor

Nearest neighbor shockingly doesn't work that well. I think part of the problem is that the training data isn't all that great and there isn't enough of it.

Here's the setup:

Training:
I have 90 training images for each letter in the alphabet (there are 26 of those). This makes for 2340 total training images, for those of you that can't do maf. All of the training images are the same size - 8 bit images of 120 by 123 pixels.

Testing:
So far I only have 2 names that I'm testing against but that size will grow shortly. I made the test characters also be 120 by 123 pixels (by adding extra white space evenly around the edges).

Here is the testing process:
I load all of my training data into a huge matrix of size 2340 x 14760, where each row is a strung out training image (of a character). I then read in a test character image, and find the euclidean distance between that test image and each of the training images, and sort the results based on the distances.

Currently I am looking at the top 50 closest matches and having those vote on a character. I have been getting some good and some bad results.

The first letter I tried, 'J' from "Jean Poole" had 'J' as its top match!

At that point life was pretty good. For one thing, all of the top 10 matches were j's. So that case works pretty well.

The next letter I tried was 'e':

The top match for that was 'p'... not so good. In fact, only 3 of the 50 votes were for 'e'. Here are the training e's... so you're trying to tell me only 3 of these look like that 'e' up there?

Here is another example.. 'o':

Luckily the mode of the top 50 matches is 'o', so 'o' wins but still, some weird results. The training image that is closest to the 'o' is the following 'n':...

I don't really get why that is. Here is the second closest match:

This is a little more reasonable, even though it is a 'c'. You can see the resemblance.

Lastly, here is the third match:

This happens to be a 'g' that was cut-off at the bottom during the traumatizing training process. This is also fairly understandable. Finally, the 4th closest match is an 'o'. (Of course after that there are plenty more random results.) This is what the first matching 'o' looks like:

Wednesday, February 13, 2008

First try at nearest neighbor - very simple!

I took one of my a's from "Jean Poole" and ran nearest neighbor with it against my training a's and b's only, and the nearest neighbor was an 'a'! Phew. At least that works. As a matter of fact, the top 10 nearest neighbors were a's, and then there was a 'b' and so on. And now for harder tests....

Test images all the same size

These are some examples of test images after I've made them all the same size. Now I have to do the same for training images and make sure the test images AND training images can be the same size!

Just to give you an idea, the size of the characters are about 76 by 40 pixels.

The 'h' here is bad, but I can't fix this without changing the code a lot and in the process making other things break:

This one looks good to me:

Example of test data

Here is what the test data looks like once I've isolated the character boxes and done a little thresholding:

Here is what the test data looks like after I've removed the unwanted lines towards the edges:

Monday, February 11, 2008

Training data all set

Partial set of training data for character 'a':

All of these individual characters are saved to their own .png file. The next step is to make them all the same size by adding extra white pixels to the boundaries, based on the size of the largest training image. Then, I will perform the nearest neighbor algorithm on the test data. The goal is for this to be done by Wednesday.

Wednesday, February 6, 2008

Looking at thresholded characters

Just for kicks, this is what the training images look like as binary images. The top row is the 8-bit image and the bottom row is the binary image.

And b's, as always:

Here all I have to store are the non-zero indices, as opposed to storing the whole image, which is beneficial. And now I should center these characters! (Subtract off the centroid... why is that so hard?!)

Monday, February 4, 2008

Using results of edge.m

Since there are artifacts near the edges on the letters both in the training and test data, perhaps I will use the results of edge.m to run nearest neighbor on. Here are examples of outputs of edge.m (the figures with black in the background).

Training data 'a':

Training data 'b':

Test data with edge.m run on the individual characters (after removing the leftover lines). Still needs some work.

And these were the original characters.

Another test data image:

I'm better at getting rid of leftover lines!

Yes, I am. Now the next step is to save all training characters in one place. Here are some examples of the training data (because you really need more).

Top line is after lines were removed, bottom is before.

Example 1:

Example 2:

Example 3:

Example 4:

Getting rid of leftover lines

To get rid of the leftover lines, I expand a rectangle starting in the middle of the character image and stop it when the sum of each rectangle edge is at a maximum (has the most white space).

One issue is that the size of the character boxes will vary now, but hopefully I will be able to correct this by zero padding the images (or 1 padding...).

Sometimes it works, sometimes it doesn't. Here are some examples of before and after removing the leftover lines. The images on the top half of the figures are after, and on the bottom are before.

Good example of 'a':

Good 'b' example:

Bad 'a' example:

Thursday, January 31, 2008

Removing leftover edges

Hmmmmmmmm..... how to get rid of the junk towards the edges of the figure.

Wednesday, January 30, 2008

I got some training data

Fabian gave me 18 filled out handwriting sheets, which is very helpful! I have been parsing those, and have been getting better results. Here is an example of another filled out sheet - that was scanned in at an angle. Luckily, that isn't a problem!

Here are examples of the a's from this image:

Here are examples of the b's from this image:

Monday, January 28, 2008

Example of cut out training data

Here are some example a's:

Here are some example b's:

You get the picture. The question now is how much that extra noise at the edges of the letters is going to have an effect.

Example of training data

Here is an example of the training data I'll be using for character classification. It is called ABCDETC and is from NEC Labs, available here

I am using the same parsing method that I am using for the scanned names. Here non maximal suppression actually works!

Friday, January 25, 2008

Non maximal suppression not really working

My plan was to find the top vertical line found from the hough transform (the one with the most votes) and then take the angle from that line and assume that that angle is the angle of the remaining vertical lines that I'm interested. So at that point I take the column vector of the hough space corresponding to that angle and sort it in descending order and plot the top 20 lines found. I also implemented non maximal suppression in the following way: for each line that I'm about to plot, I check if the amount of votes for that line is the maximum in that cell's surrounding window (just looking at changes in rho, not theta). If it isn't a maximum then I skip it - I don't plot that line.

I've been having a hard time playing around with the window threshold - how big should the window be? When it is too big then I start missing lines that I need but when it is too small then it doesn't get rid of enough lines. The following is a perfect example. I was playing with the threshold for the following image and this was the best I could get it. It didn't get rid of the additional vertical line between the e and the a, while it missed the line between the n and the space.

Isolated letters

These are what the letters look like when I cut them out of the image based on the lines found by the hough transform:

My next step is to write a function that takes in one of these letter images and determines whether or not there is a letter inside. I also need to think of a way to implement non maximal suppression!

Wednesday, January 23, 2008

Better Isolated Boxes

Here are two examples of the lines around the character boxes. This is after determining the width and height of the boxes and then drawing lines on top of the figure.

Tuesday, January 22, 2008

Isolating character boxes

This is where I'm currently at with regards to isolating the character boxes:

This is without rotating the image at all - just taking both the vertical and horizontal gradient, running edge.m on each of those, and then running the hough transform on the result.

Here is a cool image of the hough space for the vertical lines:

I have been trying so hard to rotate the images properly that I forgot my real goal which was to isolate the character boxes. Finally when I was able to rotate the image, it turned out that that did not help me out at all! Here are examples of before and after rotated images:

As you can see, Matlab didn't do a great job cropping the image after rotating it, even though I specified 'crop' to imrotate.m. Oh well, for the time being I'm not using it anyway.

Wednesday, January 16, 2008

Better angles from Hough

I decreased the size of the cells in the accumulator array that the Hough Transform uses and the accuracy of the angles of the detected lines significantly improved. Yesterday I was using a resolution of 1 radian, but now my resolution is pi/450.

There are still too many lines detected for the upper boxes compared to the lower boxes. Here is what I get when I plot the top 5 lines detected (instead of thresholding):

3 of the 5 lines detected are for the top horizontal line. The 12th line that is detected is the upper horizontal line for the PID. I'm not quite sure why that is. Here is what I get when taking the top 12 lines:

When I get greedy and increase the resolution of the angle too much, I start to hurt from it by failing to detect lines. For example, when I set the resolution to pi/720, my top 20 detected lines are the following:

It doesn't even detect the bottom horizontal line.

This is the process I've been using to detect the lines:
1. Resize the image to half of its size.
2. Take the gradient of the image.
3. Run edge.m on the vertical gradient image, specifying 'sobel' as a parameter.
4. Run the Hough Transform on the result of edge.m
5. Sort the accumulator matrix in descending order and plot the top x lines on top of the original image.

Here is an example vertical gradient of the image:

Here is an example of the output of edge.m, passed into the Hough Transform function:

Finally, here is an example of output from the Hough Transform function:

The white portions in the middle of the image correspond to the lines detected in the image.

The next significant problem is detecting the vertical lines. When I instead use the horizontal gradient, here are the top 20 lines I detect (none of which are useful):

Also, here is the output of the Hough Transform. Note how many white parts there are - meaning lots of lines were detected.

Ok one last thing - here is what is given to the Hough Transform function in the vertical case: