Recognition of Handwritten Names: March 2008

Tuesday, March 11, 2008

LoG filter not too hot either

I still need to play with the kernel size but so far LoG (Laplacian of Gaussian) hasn't been helping out at all.

Using a bigger binomial kernel

Using a bigger kernel didn't seem to help the nearest neighbor out :(

Incorporating another set of NN

Currently when I run my algorithm, I come up with the most likely set of states that made up the character images. In other words, I come up with a predicted name. I now run nearest neighbor with the predicted name against the roster names and say that the nearest neighbor is my predicted name. This way my predicted name is at least a valid possibility.

And now... I get 100% accuracy. That's right. Yeah yeah, the roster only has 7 names but it is a start!!1!!.

The way I calculate the distance between two strings is I have a digit array with the indices of the characters - 'a' maps to 1, 'b' to 2, 'c' to 3 etc. Then I use Euclidean distance. In the future I can apparently use edit distance but Euclidean is fine for me for now.

Ok there is one more hack in there. The names are not all the same length, and in order for nearest neighbor to work, each feature vector needs to be the same size. The way I do this is have all of the feature vectors be the size of the longest name on the roster. The extra slots in the feature vector for names not as long as the longest name are set to zero. I require that the nearest neighbor returned be the same length as the query vector (before I add zeros). I think that this step helps a lot especially since the roster is so small. The next step is to have a test set that is much larger.

Filtering with a discrete binomial kernel

I tried filtering both the training data and test data before running nearest neighbor on it to try and improve the nearest neighbor performance. Here is an example of a filtered test name:

As compared to the original unfiltered:

I used a 7x7 binomial kernel which turns out to be the following divided by 4096.


     1     6    15    20    15     6     1
     6    36    90   120    90    36     6
    15    90   225   300   225    90    15
    20   120   300   400   300   120    20
    15    90   225   300   225    90    15
     6    36    90   120    90    36     6
     1     6    15    20    15     6     1

This actually hurt my performance. For example, previously my accuracy on "James Carter" was ~45%. Now I predict "Geora Rdrter" and my accuracy is ~36%. Similarly, for "Richard Nixon" I now predict "Gerdora Rtero" and my accuracy is ~8.3%, when it used to be 25%.

It turns out that both 3x3 and 5x5 binomial filters are better than the 7x7, but still never improve performance; for the most part filtering with a binomial kernel hurts performance.

Monday, March 10, 2008

Emission probabilities

An emission probability in my case is P(image|letter) ... the probability of the image given that you know it is of a particular character. In other words, the probability of a certain letter looking like this. The way I've been calculating this probability is P(letter|image), since I've been doing my own version of the nearest neighbor density. P(letter|image ) != P(letter|image).. this was wrong! I can use Bayes' rule to turn the probability around though:

P(image|letter) = P(letter|image)P(image) / P(letter)

I don't know what to plug in for P(image) so for now I've just been using 1 - each image has equal probability. For P(letter), this is easy to find - just look at how many times each letter appears in the roster and divide by the total number of characters. Since P(image) = 1, my new equation is:

P(image|letter) = P(letter|image) / P(letter)

I noticed that if a letter has high probability, P(image|letter) will go down (compared to a less likely character. The probability will go up because the denominator will always be <= 1) and if P(letter) is very low then P(image|letter) will be very high.

My results became worse once I took this new probability into account.

Results after scaling

Like I said in my last post, the test images ranged from around 111 - 255, while the training images ranged from 0 - 255. I changed the range of the test images to also be 0 - 255 and I believe that my results improved.



Query name       Predicted Name   Accuracy
Lyndon Johnson   Gerdon Aninson   53.84615%
Gerald Ford      Gerald Rerd      80%
George Bush      George Resh      80%
Richard Nixon    Rildora Rtero    25%
William Clinton  William Clinton  100%
Ronald Reagan    Ronald Rerdon    75%
James Carter     Geora Carter     45.45455%

Saturday, March 8, 2008

Looking into why my nearest neighbor results are awful

Here I am comparing one of my test L's to half of my training L's. The test L is always on the right. Also the test L is a lot lighter than the training L's. In particular, the minimum value of the test L is 111 whereas the minimum value of the training L's is consistently 0. I am not sure if this will have an impact on my results or not.

Similarly, here are some 'o' comparisons, where the test 'o' is always on the right. No training 'o' matches the test 'o' exactly in size, which is a problem.

I'm not quite sure that the Viterbi algorithm is the right thing to be using for my case because it does not take into account the lexicon (roster) as much as it should.

Monday, March 3, 2008

Viterbi getting a little better

I have implemented an "isPrefix" function in Matlab that works fine. The problem is that I don't know how to incorporate it into the Viterbi algorithm. Instead of using that, I have a function that takes in a character and an index and returns true if there exists a string in the roster with that character at that index, and false otherwise. I use this function when assigning emission probabilities - I only assign a character a positive emission probability if this character could be for realz.

Now the algorithm guesses "william clinton" correctly, but "lyndon johnson" it now guesses to be "lildon joreron", and "gerald ford" to be "georicarin". This algorithm will work especially badly for large rosters where the function taking in a character and an index will almost always return true. However with my small roster of 7 this function has helped.

Instead of having an "isPrefix" function, I plan to implement an "isSuffix" function because of the way that the Viterbi algorithm works when finding the most likely sequence. It starts at the end and works its way backwards.

Sunday, March 2, 2008

Viterbi experiments

I implemented the Viterbi algorithm and my results are not good yet but I am optimistic. Here is the setup:

I have 7 test images, here is the roster:

lyndon johnson
george bush
gerald ford
richard nixon
william clinton
ronald reagan
james carter

The names are all lower case because I'm not taking case into account yet.

I have calculated the transition probabilities - so I have a 26x26 matrix containing the probability of transitioning from 1 character to another. Here is how I calculated that:

P(o|e) = Count(oe) / Count(e)

This means that the probability that I see the sequence "oe" is the number of times I saw the sequence "oe" divided by the number of times I just saw the letter "e".

Emission probabilities, in my case, are supposed to mean "the probability of seeing this character image given that I know what the character is". For example, the probability of seeing this character image given that I know it is an 'a'. That probability is hard to calculate, I would have to use Gaussians and other crazy math. So, instead of doing that, for now I am doing a very simple nearest neighbor probability calculation. Given a character image, I run nearest neighbor against all of my training data and take the top 40 votes. I say that the probability that this character is an 'a' is the amount of votes for 'a' + 1 divided by 40 + 26. Ideally I would just say that the probability that this character is an 'a' is the amount of votes for 'a' / 40. However, my nearest neighbor framework is working pretty poorly so just because I have no votes for 'a' I don't want to completely eliminate this possibility. Therefore, no letter has zero probability.

I thought that even though my nearest neighbor framework is working pretty badly, that the transition probabilities would help out a lot because the roster is so small. Not really the case though!

Here is an example of an input name:

Here is what my algorithm came up with for the name:
licallinilicl

Yeah, pretty bad. At first I was like "are those even valid transitions?!" but it's true, each of those transitions are valid. For example "li", "William" is in the roster. So, I need to figure out a way to take more into account than just the bigrams. Maybe I should have an "isPrefix" function such that as I'm going through the HMM, only give something a probability if it is a valid prefix of a name. Ok that is what I will do next.

Recognition of Handwritten Names