As you can see, performance is saturated after 180 iterations. This threshold varies for the different letters, but 200 is always enough from what I've seen.
I spent the majority of my time this weekend dealing with my old code for extracting letters from test data. To remind you, I am given something like:
... and I need to extract all of these letters. I wrote code for this last year but seem to have lost it :) So, I wrote new code to extract the letters which was time consuming. I am not entirely happy with the result of the code either, but I'll deal with it for now.
Given 4 images of 1 type of letter that were automatically extracted with the code I wrote, I ran a number of the 26 different classifiers on them to find the confidences. Confidence ranges from 0 to 1 where 1 indicates that it is 100% confident that the letter is a true instance, and 0 means the classifier is 100% confident that the letter is a false instance.
Here are 4 examples of the letter 'n':
Classifier Image 1 Image 2 Image 3 Image 4
n 0.4799 0.4897 0.5225 0.5120
a 0.3878 0.4333 0.3969 0.3969
b 0.4337 0.4499 0.4552 0.4967
c 0.3715 0.3448 0.3443 0.3232
So, the 'n' classifier does the best, which is a relief. However, these numbers can be pretty close, so hopefully the roster information will take care of this.
I wrote another perl script to generate the transition probabilities because I seem to have lost my old one also. Given that the current letter is 'x', I find the probability that the next letter is a 'y' by dividing the number of occurrences of the string 'xy' by the number of occurrences of the letter 'x'.
No comments:
Post a Comment