Tuesday, March 11, 2008

Filtering with a discrete binomial kernel

I tried filtering both the training data and test data before running nearest neighbor on it to try and improve the nearest neighbor performance. Here is an example of a filtered test name:

As compared to the original unfiltered:


I used a 7x7 binomial kernel which turns out to be the following divided by 4096.

1 6 15 20 15 6 1
6 36 90 120 90 36 6
15 90 225 300 225 90 15
20 120 300 400 300 120 20
15 90 225 300 225 90 15
6 36 90 120 90 36 6
1 6 15 20 15 6 1


This actually hurt my performance. For example, previously my accuracy on "James Carter" was ~45%. Now I predict "Geora Rdrter" and my accuracy is ~36%. Similarly, for "Richard Nixon" I now predict "Gerdora Rtero" and my accuracy is ~8.3%, when it used to be 25%.

It turns out that both 3x3 and 5x5 binomial filters are better than the 7x7, but still never improve performance; for the most part filtering with a binomial kernel hurts performance.

No comments: