Tuesday, March 11, 2008
LoG filter not too hot either
Incorporating another set of NN
Currently when I run my algorithm, I come up with the most likely set of states that made up the character images. In other words, I come up with a predicted name. I now run nearest neighbor with the predicted name against the roster names and say that the nearest neighbor is my predicted name. This way my predicted name is at least a valid possibility.
And now... I get 100% accuracy. That's right. Yeah yeah, the roster only has 7 names but it is a start!!1!!.
The way I calculate the distance between two strings is I have a digit array with the indices of the characters - 'a' maps to 1, 'b' to 2, 'c' to 3 etc. Then I use Euclidean distance. In the future I can apparently use edit distance but Euclidean is fine for me for now.
Ok there is one more hack in there. The names are not all the same length, and in order for nearest neighbor to work, each feature vector needs to be the same size. The way I do this is have all of the feature vectors be the size of the longest name on the roster. The extra slots in the feature vector for names not as long as the longest name are set to zero. I require that the nearest neighbor returned be the same length as the query vector (before I add zeros). I think that this step helps a lot especially since the roster is so small. The next step is to have a test set that is much larger.
And now... I get 100% accuracy. That's right. Yeah yeah, the roster only has 7 names but it is a start!!1!!.
The way I calculate the distance between two strings is I have a digit array with the indices of the characters - 'a' maps to 1, 'b' to 2, 'c' to 3 etc. Then I use Euclidean distance. In the future I can apparently use edit distance but Euclidean is fine for me for now.
Ok there is one more hack in there. The names are not all the same length, and in order for nearest neighbor to work, each feature vector needs to be the same size. The way I do this is have all of the feature vectors be the size of the longest name on the roster. The extra slots in the feature vector for names not as long as the longest name are set to zero. I require that the nearest neighbor returned be the same length as the query vector (before I add zeros). I think that this step helps a lot especially since the roster is so small. The next step is to have a test set that is much larger.
Filtering with a discrete binomial kernel
I tried filtering both the training data and test data before running nearest neighbor on it to try and improve the nearest neighbor performance. Here is an example of a filtered test name:
As compared to the original unfiltered:
I used a 7x7 binomial kernel which turns out to be the following divided by 4096.
This actually hurt my performance. For example, previously my accuracy on "James Carter" was ~45%. Now I predict "Geora Rdrter" and my accuracy is ~36%. Similarly, for "Richard Nixon" I now predict "Gerdora Rtero" and my accuracy is ~8.3%, when it used to be 25%.
It turns out that both 3x3 and 5x5 binomial filters are better than the 7x7, but still never improve performance; for the most part filtering with a binomial kernel hurts performance.
As compared to the original unfiltered:
I used a 7x7 binomial kernel which turns out to be the following divided by 4096.
1 6 15 20 15 6 1
6 36 90 120 90 36 6
15 90 225 300 225 90 15
20 120 300 400 300 120 20
15 90 225 300 225 90 15
6 36 90 120 90 36 6
1 6 15 20 15 6 1
This actually hurt my performance. For example, previously my accuracy on "James Carter" was ~45%. Now I predict "Geora Rdrter" and my accuracy is ~36%. Similarly, for "Richard Nixon" I now predict "Gerdora Rtero" and my accuracy is ~8.3%, when it used to be 25%.
It turns out that both 3x3 and 5x5 binomial filters are better than the 7x7, but still never improve performance; for the most part filtering with a binomial kernel hurts performance.
Monday, March 10, 2008
Emission probabilities
An emission probability in my case is P(image|letter) ... the probability of the image given that you know it is of a particular character. In other words, the probability of a certain letter looking like this. The way I've been calculating this probability is P(letter|image), since I've been doing my own version of the nearest neighbor density. P(letter|image ) != P(letter|image).. this was wrong! I can use Bayes' rule to turn the probability around though:
P(image|letter) = P(letter|image)P(image) / P(letter)
I don't know what to plug in for P(image) so for now I've just been using 1 - each image has equal probability. For P(letter), this is easy to find - just look at how many times each letter appears in the roster and divide by the total number of characters. Since P(image) = 1, my new equation is:
P(image|letter) = P(letter|image) / P(letter)
I noticed that if a letter has high probability, P(image|letter) will go down (compared to a less likely character. The probability will go up because the denominator will always be <= 1) and if P(letter) is very low then P(image|letter) will be very high.
My results became worse once I took this new probability into account.
P(image|letter) = P(letter|image)P(image) / P(letter)
I don't know what to plug in for P(image) so for now I've just been using 1 - each image has equal probability. For P(letter), this is easy to find - just look at how many times each letter appears in the roster and divide by the total number of characters. Since P(image) = 1, my new equation is:
P(image|letter) = P(letter|image) / P(letter)
I noticed that if a letter has high probability, P(image|letter) will go down (compared to a less likely character. The probability will go up because the denominator will always be <= 1) and if P(letter) is very low then P(image|letter) will be very high.
My results became worse once I took this new probability into account.
Results after scaling
Like I said in my last post, the test images ranged from around 111 - 255, while the training images ranged from 0 - 255. I changed the range of the test images to also be 0 - 255 and I believe that my results improved.
Query name Predicted Name Accuracy
Lyndon Johnson Gerdon Aninson 53.84615%
Gerald Ford Gerald Rerd 80%
George Bush George Resh 80%
Richard Nixon Rildora Rtero 25%
William Clinton William Clinton 100%
Ronald Reagan Ronald Rerdon 75%
James Carter Geora Carter 45.45455%
Saturday, March 8, 2008
Looking into why my nearest neighbor results are awful
Here I am comparing one of my test L's to half of my training L's. The test L is always on the right. Also the test L is a lot lighter than the training L's. In particular, the minimum value of the test L is 111 whereas the minimum value of the training L's is consistently 0. I am not sure if this will have an impact on my results or not.
Similarly, here are some 'o' comparisons, where the test 'o' is always on the right. No training 'o' matches the test 'o' exactly in size, which is a problem.
I'm not quite sure that the Viterbi algorithm is the right thing to be using for my case because it does not take into account the lexicon (roster) as much as it should.
Similarly, here are some 'o' comparisons, where the test 'o' is always on the right. No training 'o' matches the test 'o' exactly in size, which is a problem.
I'm not quite sure that the Viterbi algorithm is the right thing to be using for my case because it does not take into account the lexicon (roster) as much as it should.
Monday, March 3, 2008
Viterbi getting a little better
I have implemented an "isPrefix" function in Matlab that works fine. The problem is that I don't know how to incorporate it into the Viterbi algorithm. Instead of using that, I have a function that takes in a character and an index and returns true if there exists a string in the roster with that character at that index, and false otherwise. I use this function when assigning emission probabilities - I only assign a character a positive emission probability if this character could be for realz.
Now the algorithm guesses "william clinton" correctly, but "lyndon johnson" it now guesses to be "lildon joreron", and "gerald ford" to be "georicarin". This algorithm will work especially badly for large rosters where the function taking in a character and an index will almost always return true. However with my small roster of 7 this function has helped.
Instead of having an "isPrefix" function, I plan to implement an "isSuffix" function because of the way that the Viterbi algorithm works when finding the most likely sequence. It starts at the end and works its way backwards.
Now the algorithm guesses "william clinton" correctly, but "lyndon johnson" it now guesses to be "lildon joreron", and "gerald ford" to be "georicarin". This algorithm will work especially badly for large rosters where the function taking in a character and an index will almost always return true. However with my small roster of 7 this function has helped.
Instead of having an "isPrefix" function, I plan to implement an "isSuffix" function because of the way that the Viterbi algorithm works when finding the most likely sequence. It starts at the end and works its way backwards.
Sunday, March 2, 2008
Viterbi experiments
I implemented the Viterbi algorithm and my results are not good yet but I am optimistic. Here is the setup:
I have 7 test images, here is the roster:
lyndon johnson
george bush
gerald ford
richard nixon
william clinton
ronald reagan
james carter
The names are all lower case because I'm not taking case into account yet.
I have calculated the transition probabilities - so I have a 26x26 matrix containing the probability of transitioning from 1 character to another. Here is how I calculated that:
P(o|e) = Count(oe) / Count(e)
This means that the probability that I see the sequence "oe" is the number of times I saw the sequence "oe" divided by the number of times I just saw the letter "e".
Emission probabilities, in my case, are supposed to mean "the probability of seeing this character image given that I know what the character is". For example, the probability of seeing this character image given that I know it is an 'a'. That probability is hard to calculate, I would have to use Gaussians and other crazy math. So, instead of doing that, for now I am doing a very simple nearest neighbor probability calculation. Given a character image, I run nearest neighbor against all of my training data and take the top 40 votes. I say that the probability that this character is an 'a' is the amount of votes for 'a' + 1 divided by 40 + 26. Ideally I would just say that the probability that this character is an 'a' is the amount of votes for 'a' / 40. However, my nearest neighbor framework is working pretty poorly so just because I have no votes for 'a' I don't want to completely eliminate this possibility. Therefore, no letter has zero probability.
I thought that even though my nearest neighbor framework is working pretty badly, that the transition probabilities would help out a lot because the roster is so small. Not really the case though!
Here is an example of an input name:
Here is what my algorithm came up with for the name:
licallinilicl
Yeah, pretty bad. At first I was like "are those even valid transitions?!" but it's true, each of those transitions are valid. For example "li", "William" is in the roster. So, I need to figure out a way to take more into account than just the bigrams. Maybe I should have an "isPrefix" function such that as I'm going through the HMM, only give something a probability if it is a valid prefix of a name. Ok that is what I will do next.
I have 7 test images, here is the roster:
lyndon johnson
george bush
gerald ford
richard nixon
william clinton
ronald reagan
james carter
The names are all lower case because I'm not taking case into account yet.
I have calculated the transition probabilities - so I have a 26x26 matrix containing the probability of transitioning from 1 character to another. Here is how I calculated that:
P(o|e) = Count(oe) / Count(e)
This means that the probability that I see the sequence "oe" is the number of times I saw the sequence "oe" divided by the number of times I just saw the letter "e".
Emission probabilities, in my case, are supposed to mean "the probability of seeing this character image given that I know what the character is". For example, the probability of seeing this character image given that I know it is an 'a'. That probability is hard to calculate, I would have to use Gaussians and other crazy math. So, instead of doing that, for now I am doing a very simple nearest neighbor probability calculation. Given a character image, I run nearest neighbor against all of my training data and take the top 40 votes. I say that the probability that this character is an 'a' is the amount of votes for 'a' + 1 divided by 40 + 26. Ideally I would just say that the probability that this character is an 'a' is the amount of votes for 'a' / 40. However, my nearest neighbor framework is working pretty poorly so just because I have no votes for 'a' I don't want to completely eliminate this possibility. Therefore, no letter has zero probability.
I thought that even though my nearest neighbor framework is working pretty badly, that the transition probabilities would help out a lot because the roster is so small. Not really the case though!
Here is an example of an input name:
Here is what my algorithm came up with for the name:
licallinilicl
Yeah, pretty bad. At first I was like "are those even valid transitions?!" but it's true, each of those transitions are valid. For example "li", "William" is in the roster. So, I need to figure out a way to take more into account than just the bigrams. Maybe I should have an "isPrefix" function such that as I'm going through the HMM, only give something a probability if it is a valid prefix of a name. Ok that is what I will do next.
Subscribe to:
Posts (Atom)