<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4978165749595234779</id><updated>2011-08-05T10:11:58.723-07:00</updated><title type='text'>Recognition of Handwritten Names</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>46</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4771701274311625687</id><published>2009-03-11T10:54:00.001-07:00</published><updated>2009-03-11T10:54:50.846-07:00</updated><title type='text'>Presentation slides</title><content type='html'>&lt;a href="http://cse.ucsd.edu/~dbitton/finalpresentation.pdf"&gt;here&lt;/a&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4771701274311625687?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4771701274311625687/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4771701274311625687' title='38 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4771701274311625687'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4771701274311625687'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/03/presentation-slides.html' title='Presentation slides'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>38</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-3857859193484844855</id><published>2009-03-09T01:07:00.000-07:00</published><updated>2009-03-09T01:23:06.774-07:00</updated><title type='text'>Jittering Data</title><content type='html'>After my results came out really bad, I decided that my training data was the problem.  Both quality and quantity wise.  It is difficult to get more training data fast, so Serge recommended using what I have to make more.  I used Piotr Dollar's jitterImage function to take a single image and apply transformations to it.&lt;br /&gt;&lt;br /&gt;Exaggeration:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SbTRV55kvpI/AAAAAAAAAe8/oFgau6qJKNs/s1600-h/ajitter.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 219px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SbTRV55kvpI/AAAAAAAAAe8/oFgau6qJKNs/s400/ajitter.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5311100034826419858" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Piotr's function applies both rotational and translational rotation to images.  Serge recommended playing around with thickness also by raising an image to a power greater than 1, and also a power between 0 and 1.&lt;br /&gt;&lt;br /&gt;I did all this, and started out with around 50,000 examples of each character instead of 135.  I found out quickly that I run out of memory with this many examples.  I eventually had to bring it down to 625, and training took forever.&lt;br /&gt;&lt;br /&gt;The first thing I noticed was that the error in the training data was much higher than for when I only had 135 examples.  I think that this is because there is a lot more variation in the data now and it's hard to cover all cases in 200 features.  When I only had 135 examples, the error eventually went down to 0.  This is the error with the 625 examples.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SbTRchEjAaI/AAAAAAAAAfE/MsQSK5s6r0Q/s1600-h/aclf.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SbTRchEjAaI/AAAAAAAAAfE/MsQSK5s6r0Q/s400/aclf.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5311100148420641186" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Also, here are the ROC curves plotted on top of each other.  When I only have 135 examples, the ROC curve was a right angle.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SbTRumiPMWI/AAAAAAAAAfM/6oOuVZ1gFiY/s1600-h/rocs.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SbTRumiPMWI/AAAAAAAAAfM/6oOuVZ1gFiY/s400/rocs.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5311100459124994402" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is half of the 625 training examples for 'a':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SbTRF6yd7rI/AAAAAAAAAe0/fQTqSbQaezA/s1600-h/azample.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 219px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SbTRF6yd7rI/AAAAAAAAAe0/fQTqSbQaezA/s400/azample.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5311099760187141810" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;To cut to the chase, my algorithm still performs poorly, and probably even worse now.  D:&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-3857859193484844855?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/3857859193484844855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=3857859193484844855' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3857859193484844855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3857859193484844855'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/03/jittering-data.html' title='Jittering Data'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/SbTRV55kvpI/AAAAAAAAAe8/oFgau6qJKNs/s72-c/ajitter.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4583055928448302746</id><published>2009-03-04T01:56:00.001-08:00</published><updated>2009-03-04T10:06:39.853-08:00</updated><title type='text'>Well, I ran it already</title><content type='html'>Finally, I got everything together and ran my algorithm.  It did horribly.  We're talking 0% accuracy horrible.  I am using the same hidden markov model (hmm) as last year and I think that the model is just too unforgiving.  Once it mispredicts the first letter, we're basically screwed.&lt;br /&gt;&lt;br /&gt;I tried abandoning the hmm and for each letter in the test name, find the prediction of that letter, independent of the rest of the letters in the name.  I found the prediction by taking the max over all confidences for each of the 26 possibilities.  Then, in the end, I find the nearest neighbor of the predicted name with all of the names in the roster.  This also did horribly.&lt;br /&gt;&lt;br /&gt;I took a look at what was going on under the covers.  I found that for each letter, the confidence of that letter being the right letter is pretty high compared to the rest of the confidences, but some other letter always beats it by a little bit.  And my algorithm doesn't care about 2nd place.  Therefore, I need another way!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4583055928448302746?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4583055928448302746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4583055928448302746' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4583055928448302746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4583055928448302746'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/03/well-i-ran-it-already.html' title='Well, I ran it already'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-6177976622740929644</id><published>2009-03-02T22:51:00.001-08:00</published><updated>2009-03-03T00:17:23.777-08:00</updated><title type='text'>Huh.  How 'bout dat</title><content type='html'>Well.  My real road block at this point was extracting letters from the test data character boxes because I lost my code from last year.  But losing the code wasn't the real bad part - the code wasn't very good in the first place.  It had all sorts of hough transforms in it.&lt;br /&gt;&lt;br /&gt;Instead, I decided to take a blank sheet and do normalized cross correlation with it against a filled in box.  This way, I could find the center of the filled in boxes.  I basically figured out how big the boxes are, and can now extract the letter mo' betta.  The red circle is where the normalized cross correlation signal was the strongest.&lt;br /&gt;&lt;br /&gt;Zample:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SazT6yC1yoI/AAAAAAAAAdc/yAR7G4Aj2Ro/s1600-h/nicklines.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 120px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SazT6yC1yoI/AAAAAAAAAdc/yAR7G4Aj2Ro/s400/nicklines.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308851067582532226" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;To extract the letters, I found that the thickness of the character box lines were about 3 pixels wide so I took this into account.&lt;br /&gt;&lt;br /&gt;The result:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/SazULjL4OZI/AAAAAAAAAdk/BbX99Nsf19w/s1600-h/nick.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 219px;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/SazULjL4OZI/AAAAAAAAAdk/BbX99Nsf19w/s400/nick.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308851355651684754" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Yee-haw.  I took the output of this and fed it into my "get rid of white space" function, and resized the letters to 24 by 24 pixels.  I also checked if the sum of the pixels of the character image are above a certain threshold, and if so, I assumed there was no letter there.  I display these as gray cells with an x through them.  This resulted in the following:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SazlmdyYWEI/AAAAAAAAAd0/zLUUv_xyrrM/s1600-h/nicknows.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 71px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SazlmdyYWEI/AAAAAAAAAd0/zLUUv_xyrrM/s400/nicknows.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308870509756700738" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is another zample.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SazmvbRv2mI/AAAAAAAAAd8/-89cSo7i7XQ/s1600-h/daf.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 71px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SazmvbRv2mI/AAAAAAAAAd8/-89cSo7i7XQ/s400/daf.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308871763213408866" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There is an issue though.  SOME people (I'm not going to name any names) cannot write inside character boxes.  Here is an example:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/Sazm8pbBA-I/AAAAAAAAAeE/WRR9AXA4I_0/s1600-h/verma.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 113px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/Sazm8pbBA-I/AAAAAAAAAeE/WRR9AXA4I_0/s400/verma.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308871990348678114" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Because of that last 'a', my algorithm thinks there's a legitimate character there.  See:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SaznI-xJMCI/AAAAAAAAAeM/cLx-iHJ0o8o/s1600-h/vermabad.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 71px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SaznI-xJMCI/AAAAAAAAAeM/cLx-iHJ0o8o/s400/vermabad.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308872202237063202" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;However, as far as I can tell, if someone can't keep their characters inside the boxes, they don't deserve to get their quiz graded!  They can play games... but so can we!  They want to play games, so be it!  Let the games begin!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-6177976622740929644?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/6177976622740929644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=6177976622740929644' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/6177976622740929644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/6177976622740929644'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/03/huh-how-bout-dat.html' title='Huh.  How &apos;bout dat'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/SazT6yC1yoI/AAAAAAAAAdc/yAR7G4Aj2Ro/s72-c/nicklines.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8537270225827747596</id><published>2009-03-02T11:20:00.000-08:00</published><updated>2009-03-02T13:17:55.517-08:00</updated><title type='text'>Testing out the classifiers</title><content type='html'>I created and saved off 26 different classifiers - 1 for each lower case letter.  I started out with 5000 haar-like features and brought it down to 200.  So now, the feature vectors will be of length 200.  I chose 200 because I found that after 200, there was no improvement in performance.  Take a look at the following graph: &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/Saw2Iuur8MI/AAAAAAAAAdU/9zZGSUU2s20/s1600-h/gperf.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/Saw2Iuur8MI/AAAAAAAAAdU/9zZGSUU2s20/s400/gperf.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308677584373674178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As you can see, performance is saturated after 180 iterations.  This threshold varies for the different letters, but 200 is always enough from what I've seen.&lt;br /&gt;&lt;br /&gt;I spent the majority of my time this weekend dealing with my old code for extracting letters from test data.  To remind you, I am given something like:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SawyixTBJLI/AAAAAAAAAdE/XgchmX888tk/s1600-h/test23.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 90px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SawyixTBJLI/AAAAAAAAAdE/XgchmX888tk/s400/test23.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308673633693017266" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;... and I need to extract all of these letters.  I wrote code for this last year but seem to have lost it :)  So, I wrote new code to extract the letters which was time consuming.  I am not entirely happy with the result of the code either, but I'll deal with it for now.&lt;br /&gt;&lt;br /&gt;Given 4 images of 1 type of letter that were automatically extracted with the code I wrote, I ran a number of the 26 different classifiers on them to find the confidences.  Confidence ranges from 0 to 1 where 1 indicates that it is 100% confident that the letter is a true instance, and 0 means the classifier is 100% confident that the letter is a false instance.&lt;br /&gt;&lt;br /&gt;Here are 4 examples of the letter 'n':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/Saw0PY8jp2I/AAAAAAAAAdM/PUsUS4VQfEQ/s1600-h/ns.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 78px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/Saw0PY8jp2I/AAAAAAAAAdM/PUsUS4VQfEQ/s400/ns.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5308675499762100066" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Classifier  Image 1  Image 2  Image 3  Image 4&lt;br /&gt;    n       0.4799   0.4897   0.5225   0.5120&lt;br /&gt;    a       0.3878   0.4333   0.3969   0.3969&lt;br /&gt;    b       0.4337   0.4499   0.4552   0.4967&lt;br /&gt;    c       0.3715   0.3448   0.3443   0.3232&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;So, the 'n' classifier does the best, which is a relief.  However, these numbers can be pretty close, so hopefully the roster information will take care of this.&lt;br /&gt;&lt;br /&gt;I wrote another perl script to generate the transition probabilities because I seem to have lost my old one also.  Given that the current letter is 'x', I find the probability that the next letter is a 'y' by dividing the number of occurrences of the string 'xy' by the number of occurrences of the letter 'x'.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8537270225827747596?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8537270225827747596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8537270225827747596' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8537270225827747596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8537270225827747596'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/03/testing-out-classifiers.html' title='Testing out the classifiers'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/Saw2Iuur8MI/AAAAAAAAAdU/9zZGSUU2s20/s72-c/gperf.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8349993862751013034</id><published>2009-02-23T15:32:00.001-08:00</published><updated>2009-02-23T15:35:42.108-08:00</updated><title type='text'>Where the classifier is at now</title><content type='html'>Here is the ROC curve for the training data:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SaMyRmGupXI/AAAAAAAAAc0/soOxhKz21No/s1600-h/trn.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SaMyRmGupXI/AAAAAAAAAc0/soOxhKz21No/s400/trn.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5306140063840511346" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And for testing:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SaMyY1LPWPI/AAAAAAAAAc8/I7OB-trY60s/s1600-h/test.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SaMyY1LPWPI/AAAAAAAAAc8/I7OB-trY60s/s400/test.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5306140188145047794" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The curve is perfect for training but not excellent for testing.  The next step is just to use what I have for my overall project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8349993862751013034?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8349993862751013034/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8349993862751013034' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8349993862751013034'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8349993862751013034'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/02/where-classifier-is-at-now.html' title='Where the classifier is at now'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/SaMyRmGupXI/AAAAAAAAAc0/soOxhKz21No/s72-c/trn.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-7175166565491118791</id><published>2009-02-22T23:13:00.000-08:00</published><updated>2009-02-22T23:19:36.764-08:00</updated><title type='text'>Improving the ROC curve</title><content type='html'>It turns out that the ROC curve posted in the previous post was incorrect.  I accidentally tested on my training data.  Oooops!  Once I fixed this bug, my ROC curve became very ugly.  In fact, this is what it looked like:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SaJM70A0W9I/AAAAAAAAAck/QVpq-hiLnOQ/s1600-h/test.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SaJM70A0W9I/AAAAAAAAAck/QVpq-hiLnOQ/s400/test.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5305887901453933522" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Uh oh!  This scared me.  I started tweaking the algorithm by changing the number of haar features from 1,000 to 5,000, and changed the number of weak classifiers from 25 to 100.  This barely made any difference.  I then thought that maybe my 24 by 24 pixel character images simply weren't sufficient so I generated new training data and made the images slightly bigger - 40 by 40 pixels.  This also didn't make a difference.&lt;br /&gt;&lt;br /&gt;What did make a difference was centering each of the images and getting rid of the extra white space.  This changed the ROC curve to this:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SaJNvsEwfJI/AAAAAAAAAcs/07kn_QOe0CU/s1600-h/aclf.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SaJNvsEwfJI/AAAAAAAAAcs/07kn_QOe0CU/s400/aclf.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5305888792676170898" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-7175166565491118791?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/7175166565491118791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=7175166565491118791' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/7175166565491118791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/7175166565491118791'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/02/improving-roc-curve.html' title='Improving the ROC curve'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/SaJM70A0W9I/AAAAAAAAAck/QVpq-hiLnOQ/s72-c/test.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-5871911913297241691</id><published>2009-02-18T15:06:00.001-08:00</published><updated>2009-02-18T15:47:24.319-08:00</updated><title type='text'>Boosting in Matlab</title><content type='html'>&lt;a href="http://vision.ucsd.edu/~bbabenko/"&gt;Boris&lt;/a&gt; gave me his Matlab boosting code.  His code is an implementation of &lt;a href="http://en.wikipedia.org/wiki/AdaBoost"&gt;Adaboost&lt;/a&gt; using &lt;a href="http://en.wikipedia.org/wiki/Haar-like_features"&gt;haar-like features&lt;/a&gt;.  The weak classifier is just a stump (looks at 1 feature and thresholds it).&lt;br /&gt;&lt;br /&gt;The idea behind boosting is that many weak classifiers together create 1 strong classifier.  The weak classifiers used in this implementation are extremely trivial and their individual accuracy is not that much better than 0.5.  However, as we add more weak classifiers to the overall classifier, the overall accuracy improves, as shown in the following graph:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SZydJAvIrYI/AAAAAAAAAcE/QT5d6atcZzE/s1600-h/trn_error.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SZydJAvIrYI/AAAAAAAAAcE/QT5d6atcZzE/s400/trn_error.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5304287239277686146" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;What is a haar feature?&lt;br /&gt;In this implementation, a random number of rectangles are created, each with a different weight.  Note that all of the training images must be the same size.  We choose these haar features without seeing the images yet.  The sum of the pixels in each of the different rectangles and the weights are used to come up with the feature.  Each haar feature will result in 1 singleton value for each image.&lt;br /&gt;&lt;br /&gt;Example of haar features:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SZyaGWkeKcI/AAAAAAAAAb0/L10F3bPNMqI/s1600-h/haars.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 219px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SZyaGWkeKcI/AAAAAAAAAb0/L10F3bPNMqI/s400/haars.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5304283895064046018" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;What you'll need for an 'X' classifier:&lt;br /&gt;1.  A pool of images of 'X' (positive examples)&lt;br /&gt;2.  A pool of images that do not contain 'X' in them (negative examples)&lt;br /&gt;&lt;br /&gt;Set aside part of each of the above pools for testing.  The remaining, you'll use for training.&lt;br /&gt;&lt;br /&gt;How it will go down:&lt;br /&gt;1.  Choose the number of haar features to create, and call this nh.&lt;br /&gt;2.  Apply all nh haar features to all of the training images, both positive and negative.  As a result, for each image, we will have a feature vector of length nh.&lt;br /&gt;3.  Choose the number of weak classifiers desired, and call this nwc.  Note that nwc &lt;= nh.&lt;br /&gt;4.  Choose nwc of the nh haar features.  Ideally, these nwc haar features are the best haar features out of the bunch.  This means that these features are the strongest.  Associate a threshold for each of the nwc haar features.&lt;br /&gt;5.  For each of the test images, find the nwc features.  We now have a feature vector of length nwc for each of the test images.&lt;br /&gt;6.  Given the threshold for each of the nwc features, come up with a confidence for each test image.&lt;br /&gt;&lt;br /&gt;I created an 'a' classifier for my project.  I resized all of my training images to 24 by 24 pixels.  I used a portion of the images of 'a' as the positive training examples, and used the remainder of them for testing.  I used a combination of images of letters 'b' through 'z' as the negative training examples, and a separate portion of those for testing. Using a threshold of 0.5, the number of false positives was 194/1350, which comes out to a rate of 14%.  The true positive rate is 100%.  Here is the ROC curve:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SZybVq0dRGI/AAAAAAAAAb8/a2C8LyPX8kA/s1600-h/roc.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 219px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SZybVq0dRGI/AAAAAAAAAb8/a2C8LyPX8kA/s400/roc.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5304285257709470818" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;How this will apply to my project:  Given an image of a letter, I will apply all 26 classifiers to it.  I will then have a score for each letter, and can use this instead of the nearest neighbor mechanism I was using before.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-5871911913297241691?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/5871911913297241691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=5871911913297241691' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5871911913297241691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5871911913297241691'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/02/boosting-in-matlab.html' title='Boosting in Matlab'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/SZydJAvIrYI/AAAAAAAAAcE/QT5d6atcZzE/s72-c/trn_error.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-1266530275294780495</id><published>2009-02-09T15:49:00.000-08:00</published><updated>2009-02-09T15:54:17.642-08:00</updated><title type='text'>Boosting with OpenCV</title><content type='html'>My next step is to apply OpenCV's boosting framework to my problem.  I have OpenCv installed, and after a bunch of linking issues, I can at least get things to compile.  I have spent this weekend and today figuring out how to use it.  I have a much better handle on it now and will hopefully get it working soon.&lt;br /&gt;&lt;br /&gt;I have been referencing &lt;a href="http://note.sonots.com/SciSoftware/haartraining.html"&gt;this&lt;/a&gt;  tutorial, which is ok.  &lt;br /&gt;&lt;br /&gt;I plan to train 26 different classifiers, one for each letter.&lt;br /&gt;&lt;br /&gt;Random fact: OpenCV either lets you give it one training image, where it takes random samples of the image, or it lets you give it a bunch of images and it does nothing with those.  Although the tutorial mentioned above provides a way to combine the 2 approaches.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-1266530275294780495?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/1266530275294780495/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=1266530275294780495' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/1266530275294780495'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/1266530275294780495'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/02/boosting-with-opencv.html' title='Boosting with OpenCV'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-726986148356881615</id><published>2009-02-02T14:46:00.001-08:00</published><updated>2009-02-02T14:56:37.198-08:00</updated><title type='text'>Trying PCA</title><content type='html'>I decided to use PCA on my training data.  The way I did this is I used Matlab's princomp function.  So I run princomp on my n x p matrix (n row vectors, where each row vector has p values).  I then get a matrix D of size p x p.  I decided to use the first 100 components, so I multiplied my n x p data matrix by D(:,1:100).  Sometimes the results are better, sometimes they are worse.  I will do more experimentation and report back.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-726986148356881615?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/726986148356881615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=726986148356881615' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/726986148356881615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/726986148356881615'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/02/trying-pca.html' title='Trying PCA'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-5391685263034084912</id><published>2009-01-26T15:10:00.000-08:00</published><updated>2009-01-26T15:30:26.136-08:00</updated><title type='text'>Fixing the cropping issue</title><content type='html'>I've spent some time now fixing up my cropping function.  The reason I've been spending my time on this is because I think that it's really important to have quality training data.  If my training data is crap, then nothing will work.  I want to be able to cleanly cut out letters.  So, I changed my cropping algorithm to the following:&lt;br /&gt;cut the original image into 4 pieces, as follows:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SX5FFTBFXdI/AAAAAAAAAaM/qT3LC3alsYw/s1600-h/fourpieces.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SX5FFTBFXdI/AAAAAAAAAaM/qT3LC3alsYw/s400/fourpieces.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295746169140764114" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is the left piece:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SX5FPvdvNjI/AAAAAAAAAaU/zB_HnrzcTOk/s1600-h/left.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 193px; height: 400px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SX5FPvdvNjI/AAAAAAAAAaU/zB_HnrzcTOk/s400/left.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295746348575831602" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Right piece:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/SX5FWavj6bI/AAAAAAAAAac/XwsCa_MqNFA/s1600-h/right.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 164px; height: 400px;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/SX5FWavj6bI/AAAAAAAAAac/XwsCa_MqNFA/s400/right.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295746463272528306" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Top piece:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SX5Ffm1PJsI/AAAAAAAAAak/jOSlUXeIfa8/s1600-h/top.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 127px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SX5Ffm1PJsI/AAAAAAAAAak/jOSlUXeIfa8/s400/top.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295746621136381634" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Bottom piece:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SX5FmuILldI/AAAAAAAAAas/fArUYlpknyA/s1600-h/bottom.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 89px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SX5FmuILldI/AAAAAAAAAas/fArUYlpknyA/s400/bottom.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295746743353972178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I run the hough transform on all four pieces.  In each, I know where a line would be, if there exists one.  I have some way of thresholding whether or not there is a line there (if the maximum number of votes is higher than a certain percentage of the length of the image).  If so, I cut the appropriate piece out.&lt;br /&gt;&lt;br /&gt;The performance isn't as great as expected.  I think that I need to fix up my hough transform function because sometimes, when there appears to be a dark line, it won't find it (the number of votes is low).&lt;br /&gt;&lt;br /&gt;Here are the overall results anyway:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SX5GU4gMvmI/AAAAAAAAAa0/JCxPGvTQK7E/s1600-h/newcutout.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 288px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SX5GU4gMvmI/AAAAAAAAAa0/JCxPGvTQK7E/s400/newcutout.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295747536413048418" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The letters straight out of the training sheet look like this, with no postprocessing:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SX5HaV-OhtI/AAAAAAAAAa8/5NeYF1SKgpc/s1600-h/nocrop.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 288px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SX5HaV-OhtI/AAAAAAAAAa8/5NeYF1SKgpc/s400/nocrop.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5295748729734596306" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The next step for me is to fix my hough transform function, get the training data looking good, retrying nearest neighbor, and then moving on to opencv (boosting).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-5391685263034084912?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/5391685263034084912/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=5391685263034084912' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5391685263034084912'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5391685263034084912'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/01/fixing-cropping-issue.html' title='Fixing the cropping issue'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/SX5FFTBFXdI/AAAAAAAAAaM/qT3LC3alsYw/s72-c/fourpieces.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-573395765306037887</id><published>2009-01-14T13:34:00.000-08:00</published><updated>2009-01-14T15:24:31.452-08:00</updated><title type='text'>Cropping</title><content type='html'>I get better accuracy now, when I try to classify my test data.  However, I noticed that after my "cropping" tool, the test data can get pretty deformed.  For example, take a look at a before and after sequence:&lt;br /&gt;&lt;br /&gt;Before:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SW5i4vW8rGI/AAAAAAAAAZc/2iDbqxdUiUo/s1600-h/kennybig.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 58px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SW5i4vW8rGI/AAAAAAAAAZc/2iDbqxdUiUo/s400/kennybig.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5291275339131825250" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;After:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/SW5i9x-1ZKI/AAAAAAAAAZk/Ofz2zm61tt8/s1600-h/kennedy.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 46px;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/SW5i9x-1ZKI/AAAAAAAAAZk/Ofz2zm61tt8/s400/kennedy.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5291275425735337122" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I decreased the dimensionality of the test data by a lot.  The new test data characters are 32 by 32 pixels.  Before this, it was 120 by 123 pixels.  This reduction in dimensionality really brought down the running time of classification.  I think that I need to tweak my cropping tool though.&lt;br /&gt;&lt;br /&gt;Cropping tool = bad.  Exhibit A:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SW5wdEFDgJI/AAAAAAAAAZs/wDevB44AugQ/s1600-h/woops.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SW5wdEFDgJI/AAAAAAAAAZs/wDevB44AugQ/s200/woops.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5291290256820371602" /&gt;&lt;/a&gt; = &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SW5wn6a7CqI/AAAAAAAAAZ0/RotkiPjdUJI/s1600-h/badn.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SW5wn6a7CqI/AAAAAAAAAZ0/RotkiPjdUJI/s200/badn.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5291290443206298274" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;When I change it that only 2% of the pixels can be all white (instead of 5%), then the cut out changes to:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SW5xpzZLAVI/AAAAAAAAAZ8/-x1MZ4JwTbY/s1600-h/bettern.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 200px; height: 150px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SW5xpzZLAVI/AAAAAAAAAZ8/-x1MZ4JwTbY/s200/bettern.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5291291575191273810" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Now, the same test set looks like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SW50AWwuRRI/AAAAAAAAAaE/3vKeXlC_icM/s1600-h/betterkennedy.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 42px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SW50AWwuRRI/AAAAAAAAAaE/3vKeXlC_icM/s400/betterkennedy.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5291294161665672466" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-573395765306037887?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/573395765306037887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=573395765306037887' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/573395765306037887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/573395765306037887'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/01/cropping.html' title='Cropping'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/SW5i4vW8rGI/AAAAAAAAAZc/2iDbqxdUiUo/s72-c/kennybig.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4090147868639446491</id><published>2009-01-12T15:29:00.000-08:00</published><updated>2009-01-12T15:40:42.356-08:00</updated><title type='text'>Fixing up training data</title><content type='html'>I noticed that the actual letters in my training data take up a small area in the actual image.  This seems bad, so I fixed up my code to minimize the amount of white space surrounding the actual letter.&lt;br /&gt;&lt;br /&gt;Example of old training data:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/SWvS-qJw_DI/AAAAAAAAAZM/Ei1WfTEY9sk/s1600-h/trainingas.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 288px;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/SWvS-qJw_DI/AAAAAAAAAZM/Ei1WfTEY9sk/s400/trainingas.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5290554161185750066" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Example of new training data:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/SWvU3bhoElI/AAAAAAAAAZU/tlXJeHpsEz8/s1600-h/newas.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 288px;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/SWvU3bhoElI/AAAAAAAAAZU/tlXJeHpsEz8/s400/newas.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5290556236023468626" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4090147868639446491?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4090147868639446491/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4090147868639446491' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4090147868639446491'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4090147868639446491'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/01/fixing-up-training-data.html' title='Fixing up training data'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/SWvS-qJw_DI/AAAAAAAAAZM/Ei1WfTEY9sk/s72-c/trainingas.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-2689947761015028243</id><published>2009-01-03T22:08:00.000-08:00</published><updated>2009-01-03T22:15:50.443-08:00</updated><title type='text'>I'm baaaack</title><content type='html'>Didn't get enough last year?  Miss me?  Well, I'm back for more.  I will be continuing the same project.  Here is a refresher of how the project works:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/SWBS_Zt2uGI/AAAAAAAAAZE/I7iOQ_LyOeU/s1600-h/diagram.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 273px;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/SWBS_Zt2uGI/AAAAAAAAAZE/I7iOQ_LyOeU/s400/diagram.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5287317211721414754" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The first thing that I will be doing this week is getting back to the point I was at at the end of winter 2008.  I will then research machine learning methods to apply to the problem, to replace the nearest neighbor approach.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-2689947761015028243?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/2689947761015028243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=2689947761015028243' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2689947761015028243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2689947761015028243'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2009/01/im-baaaack.html' title='I&apos;m baaaack'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/SWBS_Zt2uGI/AAAAAAAAAZE/I7iOQ_LyOeU/s72-c/diagram.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-2868925682121298285</id><published>2008-03-11T20:08:00.001-07:00</published><updated>2008-03-11T20:08:47.372-07:00</updated><title type='text'>Presentation slides</title><content type='html'>...can be found &lt;a href="http://ieng9.ucsd.edu/%7Edbitton/cse190slides.ppt"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-2868925682121298285?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/2868925682121298285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=2868925682121298285' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2868925682121298285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2868925682121298285'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/presentation-slides.html' title='Presentation slides'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-2503682781478811654</id><published>2008-03-11T19:42:00.001-07:00</published><updated>2008-11-12T17:38:04.070-08:00</updated><title type='text'>LoG filter not too hot either</title><content type='html'>I still need to play with the kernel size but so far LoG (Laplacian of Gaussian) hasn't been helping out at all.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R9dDDRw12eI/AAAAAAAAAN0/ZiPbNwG7H64/s1600-h/logcarter.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R9dDDRw12eI/AAAAAAAAAN0/ZiPbNwG7H64/s400/logcarter.png" alt="" id="BLOGGER_PHOTO_ID_5176680020272601570" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-2503682781478811654?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/2503682781478811654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=2503682781478811654' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2503682781478811654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2503682781478811654'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/log-filter-not-too-hot-either.html' title='LoG filter not too hot either'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R9dDDRw12eI/AAAAAAAAAN0/ZiPbNwG7H64/s72-c/logcarter.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4251425175600062585</id><published>2008-03-11T19:23:00.000-07:00</published><updated>2008-11-12T17:38:04.175-08:00</updated><title type='text'>Using a bigger binomial kernel</title><content type='html'>&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;Using a bigger kernel didn't seem to help the nearest neighbor out :(&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R9dAuxw12dI/AAAAAAAAANs/NrsDWHs6aoA/s1600-h/carterfiltered.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R9dAuxw12dI/AAAAAAAAANs/NrsDWHs6aoA/s400/carterfiltered.png" alt="" id="BLOGGER_PHOTO_ID_5176677469062027730" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4251425175600062585?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4251425175600062585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4251425175600062585' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4251425175600062585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4251425175600062585'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/using-bigger-binomial-kernel.html' title='Using a bigger binomial kernel'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R9dAuxw12dI/AAAAAAAAANs/NrsDWHs6aoA/s72-c/carterfiltered.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-1289278406373472756</id><published>2008-03-11T19:15:00.000-07:00</published><updated>2008-03-11T19:35:38.428-07:00</updated><title type='text'>Incorporating another set of NN</title><content type='html'>Currently when I run my algorithm, I come up with the most likely set of states that made up the character images.  In other words, I come up with a predicted name.  I now run nearest neighbor with the predicted name against the roster names and say that the nearest neighbor is my predicted name.  This way my predicted name is at least a valid possibility.&lt;br /&gt;&lt;br /&gt;And now... I get 100% accuracy. That's right.  Yeah yeah, the roster only has  7 names but it is a start!!1!!.&lt;br /&gt;&lt;br /&gt;The way I calculate the distance between two strings is I have a digit array with the indices of the characters - 'a' maps to 1, 'b' to 2, 'c' to 3 etc.  Then I use Euclidean distance.  In the future I can apparently use edit distance but Euclidean is fine for me for now.&lt;br /&gt;&lt;br /&gt;Ok there is one more hack in there.  The names are not all the same length, and in order for nearest neighbor to work, each feature vector needs to be the same size.  The way I do this is have all of the feature vectors be the size of the longest name on the roster.  The extra slots in the feature vector for names not as long as the longest name are set to zero.  I require that the nearest neighbor returned be the same length as the query vector (before I add zeros).  I think that this step helps a lot especially since the roster is so small.  The next step is to have a test set that is much larger.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-1289278406373472756?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/1289278406373472756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=1289278406373472756' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/1289278406373472756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/1289278406373472756'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/incorporating-another-set-of-nn.html' title='Incorporating another set of NN'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-6990023570236853788</id><published>2008-03-11T14:14:00.000-07:00</published><updated>2008-11-12T17:38:04.408-08:00</updated><title type='text'>Filtering with a discrete binomial kernel</title><content type='html'>I tried filtering both the training data and test data before running nearest neighbor on it to try and improve the nearest neighbor performance.  Here is an example of a filtered test name:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R9b2jhw12bI/AAAAAAAAANc/uugRhfGqoI4/s1600-h/filteredjames.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R9b2jhw12bI/AAAAAAAAANc/uugRhfGqoI4/s400/filteredjames.png" alt="" id="BLOGGER_PHOTO_ID_5176595911928043954" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;As compared to the original unfiltered:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R9b2vxw12cI/AAAAAAAAANk/_9Mu8I2a6F8/s1600-h/unfilteredjames.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R9b2vxw12cI/AAAAAAAAANk/_9Mu8I2a6F8/s400/unfilteredjames.png" alt="" id="BLOGGER_PHOTO_ID_5176596122381441474" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I used a 7x7 binomial kernel which turns out to be the following divided by 4096.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;     1     6    15    20    15     6     1&lt;br /&gt;     6    36    90   120    90    36     6&lt;br /&gt;    15    90   225   300   225    90    15&lt;br /&gt;    20   120   300   400   300   120    20&lt;br /&gt;    15    90   225   300   225    90    15&lt;br /&gt;     6    36    90   120    90    36     6&lt;br /&gt;     1     6    15    20    15     6     1&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This actually hurt my performance.  For example, previously my accuracy on "James Carter" was ~45%.  Now I predict "Geora Rdrter" and my accuracy is ~36%. Similarly, for "Richard Nixon" I now predict "Gerdora Rtero" and my accuracy is ~8.3%, when it used to be 25%.&lt;br /&gt;&lt;br /&gt;It turns out that both 3x3 and 5x5 binomial filters are better than the 7x7, but still never improve performance; for the most part filtering with a binomial kernel hurts performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-6990023570236853788?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/6990023570236853788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=6990023570236853788' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/6990023570236853788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/6990023570236853788'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/filtering-with-discrete-binomial-kernel.html' title='Filtering with a discrete binomial kernel'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R9b2jhw12bI/AAAAAAAAANc/uugRhfGqoI4/s72-c/filteredjames.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-36555150459059681</id><published>2008-03-10T13:25:00.000-07:00</published><updated>2008-03-10T13:33:50.510-07:00</updated><title type='text'>Emission probabilities</title><content type='html'>An emission probability in my case is P(image|letter) ... the probability of the image given that you know it is of a particular character.  In other words, the probability of a certain letter looking like this.  The way I've been calculating this probability is P(letter|image), since I've been doing my own version of the nearest neighbor density.  P(letter|image ) != P(letter|image).. this was wrong!  I can use Bayes' rule to turn the probability around though:&lt;br /&gt;&lt;br /&gt;P(image|letter) = P(letter|image)P(image) / P(letter)&lt;br /&gt;&lt;br /&gt;I don't know what to plug in for P(image) so for now I've just been using 1 - each image has equal probability.  For P(letter), this is easy to find - just look at how many times each letter appears in the roster and divide by the total number of characters.  Since P(image) = 1, my new equation is:&lt;br /&gt;&lt;br /&gt;P(image|letter) = P(letter|image) / P(letter)&lt;br /&gt;&lt;br /&gt;I noticed that if a letter has high probability, P(image|letter) will go down (compared to a less likely character.  The probability will go up because the denominator will always be &lt;= 1) and if P(letter) is very low then P(image|letter) will be very high.&lt;br /&gt;&lt;br /&gt;My results became worse once I took this new probability into account.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-36555150459059681?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/36555150459059681/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=36555150459059681' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/36555150459059681'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/36555150459059681'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/emission-probabilities.html' title='Emission probabilities'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8023215054076267673</id><published>2008-03-10T11:58:00.000-07:00</published><updated>2008-03-10T12:26:18.755-07:00</updated><title type='text'>Results after scaling</title><content type='html'>Like I said in my last post, the test images ranged from around 111 - 255, while the training images ranged from 0 - 255.  I changed the range of the test images to also be 0 - 255 and I believe that my results improved.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;Query name       Predicted Name   Accuracy&lt;/b&gt;&lt;br /&gt;Lyndon Johnson   Gerdon Aninson   53.84615%&lt;br /&gt;Gerald Ford      Gerald Rerd      80%&lt;br /&gt;George Bush      George Resh      80%&lt;br /&gt;Richard Nixon    Rildora Rtero    25%&lt;br /&gt;William Clinton  William Clinton  100%&lt;br /&gt;Ronald Reagan    Ronald Rerdon    75%&lt;br /&gt;James Carter     Geora Carter     45.45455%&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8023215054076267673?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8023215054076267673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8023215054076267673' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8023215054076267673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8023215054076267673'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/results-after-scaling.html' title='Results after scaling'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-3003026108015801002</id><published>2008-03-08T15:14:00.000-08:00</published><updated>2008-11-12T17:38:04.785-08:00</updated><title type='text'>Looking into why my nearest neighbor results are awful</title><content type='html'>Here I am comparing one of my test L's to half of my training L's. The test L is always on the right.  Also the test L is a lot lighter than the training L's.  In particular, the minimum value of the test L is 111 whereas the minimum value of the training L's is consistently 0.  I am not sure if this will have an impact on my results or not.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R9MeQBw12ZI/AAAAAAAAANM/-sF9y_taFBM/s1600-h/lcomparisons.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R9MeQBw12ZI/AAAAAAAAANM/-sF9y_taFBM/s400/lcomparisons.png" alt="" id="BLOGGER_PHOTO_ID_5175513657478863250" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Similarly, here are some 'o' comparisons, where the test 'o' is always on the right.  No training 'o' matches the test 'o' exactly in size, which is a problem.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R9MfZRw12aI/AAAAAAAAANU/uAHF0P0Itvs/s1600-h/ocomparisons.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R9MfZRw12aI/AAAAAAAAANU/uAHF0P0Itvs/s400/ocomparisons.png" alt="" id="BLOGGER_PHOTO_ID_5175514915904280994" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm not quite sure that the Viterbi algorithm is the right thing to be using for my case because it does not take into account the lexicon (roster) as much as it should.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-3003026108015801002?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/3003026108015801002/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=3003026108015801002' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3003026108015801002'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3003026108015801002'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/looking-into-why-my-nearest-neighbor.html' title='Looking into why my nearest neighbor results are awful'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R9MeQBw12ZI/AAAAAAAAANM/-sF9y_taFBM/s72-c/lcomparisons.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8412808605162621092</id><published>2008-03-03T12:02:00.000-08:00</published><updated>2008-03-03T12:08:26.071-08:00</updated><title type='text'>Viterbi getting a little better</title><content type='html'>I have implemented an "isPrefix" function in Matlab that works fine.  The problem is that I don't know how to incorporate it into the Viterbi algorithm.   Instead of using that, I have a function that takes in a character and an index and returns true if there exists a string in the roster with that character at that index, and false otherwise.  I use this function when assigning emission probabilities - I only assign a character a positive emission probability if this character could be for realz.&lt;br /&gt;&lt;br /&gt;Now the algorithm guesses "william clinton" correctly, but "lyndon johnson" it now guesses to be "lildon joreron", and "gerald ford" to be "georicarin".  This algorithm will work especially badly for large rosters where the function taking in a character and an index will almost always return true.  However with my small roster of 7 this function has helped.&lt;br /&gt;&lt;br /&gt;Instead of having an "isPrefix" function, I plan to implement an "isSuffix" function because of the way that the Viterbi algorithm works when finding the most likely sequence.  It starts at the end and works its way backwards.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8412808605162621092?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8412808605162621092/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8412808605162621092' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8412808605162621092'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8412808605162621092'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/viterbi-getting-little-better.html' title='Viterbi getting a little better'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-6155492789922983568</id><published>2008-03-02T18:13:00.000-08:00</published><updated>2008-11-12T17:38:04.938-08:00</updated><title type='text'>Viterbi experiments</title><content type='html'>I implemented the Viterbi algorithm and my results are not good yet but I am optimistic.  Here is the setup:&lt;br /&gt;&lt;br /&gt;I have 7 test images, here is the roster:&lt;br /&gt;&lt;br /&gt;lyndon johnson&lt;br /&gt;george bush&lt;br /&gt;gerald ford&lt;br /&gt;richard nixon&lt;br /&gt;william clinton&lt;br /&gt;ronald reagan&lt;br /&gt;james carter&lt;br /&gt;&lt;br /&gt;The names are all lower case because I'm not taking case into account yet.&lt;br /&gt;&lt;br /&gt;I have calculated the transition probabilities - so I have a 26x26 matrix containing the probability of transitioning from 1 character to another.  Here is how I calculated that:&lt;br /&gt;&lt;br /&gt;P(o|e) = Count(oe) / Count(e)&lt;br /&gt;&lt;br /&gt;This means that the probability that I see the sequence "oe" is the number of times I saw the sequence "oe" divided by the number of times I just saw the letter "e".&lt;br /&gt;&lt;br /&gt;Emission probabilities, in my case, are supposed to mean "the probability of seeing this character image given that I know what the character is".  For example, the probability of seeing this character image given that I know it is an 'a'.  That probability is hard to calculate, I would have to use Gaussians and other crazy math.  So, instead of doing that, for now I am doing a very simple nearest neighbor probability calculation.  Given a character image, I run nearest neighbor against all of my training data and take the top 40 votes.  I say that the probability that this character is an 'a' is the amount of votes for 'a' + 1 divided by 40 + 26.  Ideally I would just say that the probability that this character is an 'a' is the amount of votes for 'a' / 40.  However, my nearest neighbor framework is working pretty poorly so just because I have no votes for 'a' I don't want to completely eliminate this possibility.  Therefore, no letter has zero probability.&lt;br /&gt;&lt;br /&gt;I thought that even though my nearest neighbor framework is working pretty badly, that the transition probabilities would help out a lot because the roster is so small.  Not really the case though!&lt;br /&gt;&lt;br /&gt;Here is an example of an input name:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R8ti87iEKgI/AAAAAAAAANE/MPda_0StCXE/s1600-h/lyndon.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R8ti87iEKgI/AAAAAAAAANE/MPda_0StCXE/s400/lyndon.png" alt="" id="BLOGGER_PHOTO_ID_5173337395877390850" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here is what my algorithm came up with for the name:&lt;br /&gt;licallinilicl&lt;br /&gt;&lt;br /&gt;Yeah, pretty bad.  At first I was like "are those even valid transitions?!" but it's true, each of those transitions are valid.  For example "li", "William" is in the roster.  So, I need to figure out a way to take more into account than just the bigrams.  Maybe I should have an "isPrefix" function such that as I'm going through the HMM, only give something a probability if it is a valid prefix of a name.  Ok that is what I will do next.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-6155492789922983568?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/6155492789922983568/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=6155492789922983568' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/6155492789922983568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/6155492789922983568'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/03/viterbi-experiments.html' title='Viterbi experiments'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_PakaWnbOqtc/R8ti87iEKgI/AAAAAAAAANE/MPda_0StCXE/s72-c/lyndon.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-5634076736303150457</id><published>2008-02-25T14:31:00.000-08:00</published><updated>2008-11-12T17:38:06.299-08:00</updated><title type='text'>Experimenting with nearest neighbor</title><content type='html'>Nearest neighbor shockingly doesn't work that well.  I think part of the problem is that the training data isn't all that great and there isn't enough of it.&lt;br /&gt;&lt;br /&gt;Here's the setup:&lt;br /&gt;&lt;br /&gt;Training:&lt;br /&gt;I have 90 training images for each letter in the alphabet (there are 26 of those).  This makes for 2340 total training images, for those of you that can't do maf.  All of the training images are the same size - 8 bit images of 120 by 123 pixels.&lt;br /&gt;&lt;br /&gt;Testing:&lt;br /&gt;So far I only have 2 names that I'm testing against but that size will grow shortly.  I made the test characters also be 120 by 123 pixels (by adding extra white space evenly around the edges).&lt;br /&gt;&lt;br /&gt;Here is the testing process:&lt;br /&gt;I load all of my training data into a huge matrix of size 2340 x 14760, where each row is a strung out training image (of a character).  I then read in a test character image, and find the euclidean distance between that test image and each of the training images, and sort the results based on the distances.&lt;br /&gt;&lt;br /&gt;Currently I am looking at the top 50 closest matches and having those vote on a character.  I have been getting some good and some bad results.&lt;br /&gt;&lt;br /&gt;The first letter I tried, 'J' from "Jean Poole" had 'J' as its top match! &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NLMvMevYI/AAAAAAAAAMU/IxJ0R33thUQ/s1600-h/j.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NLMvMevYI/AAAAAAAAAMU/IxJ0R33thUQ/s320/j.png" alt="" id="BLOGGER_PHOTO_ID_5171059479350787458" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;At that point life was pretty good.  For one thing, all of the top 10 matches were j's.  So that case works pretty well.&lt;br /&gt;&lt;br /&gt;The next letter I tried was 'e':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R8NLv_MevZI/AAAAAAAAAMc/5lkJ7bQ_LsY/s1600-h/e.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R8NLv_MevZI/AAAAAAAAAMc/5lkJ7bQ_LsY/s320/e.png" alt="" id="BLOGGER_PHOTO_ID_5171060084941176210" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The top match for that was 'p'... not so good.  In fact, only &lt;span style="font-weight: bold;"&gt;3&lt;/span&gt; of the 50 votes were for 'e'.  Here are the training e's... so you're trying to tell me only 3 of these look like that 'e' up there?&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NOAvMevaI/AAAAAAAAAMk/PvspvMerglc/s1600-h/es.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NOAvMevaI/AAAAAAAAAMk/PvspvMerglc/s400/es.png" alt="" id="BLOGGER_PHOTO_ID_5171062571727240610" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here is another example.. 'o':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NG5vMevTI/AAAAAAAAALs/2nv0vdU_Ke0/s1600-h/o.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NG5vMevTI/AAAAAAAAALs/2nv0vdU_Ke0/s320/o.png" alt="" id="BLOGGER_PHOTO_ID_5171054754886761778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Luckily the mode of the top 50 matches is 'o', so 'o' wins but still, some weird results.  The training image that is closest to the 'o' is the following 'n':...&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R8NHJ_MevUI/AAAAAAAAAL0/KwT0ioJ8ON4/s1600-h/bestomatch.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R8NHJ_MevUI/AAAAAAAAAL0/KwT0ioJ8ON4/s320/bestomatch.png" alt="" id="BLOGGER_PHOTO_ID_5171055034059636034" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I don't really get why that is.  Here is the second closest match:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NHjvMevVI/AAAAAAAAAL8/ki8HN0-mT9c/s1600-h/secondbesto.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NHjvMevVI/AAAAAAAAAL8/ki8HN0-mT9c/s320/secondbesto.png" alt="" id="BLOGGER_PHOTO_ID_5171055476441267538" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This is a little more reasonable, even though it is a 'c'.  You can see the resemblance.&lt;br /&gt;&lt;br /&gt;Lastly, here is the third match:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R8NH5PMevWI/AAAAAAAAAME/jWJ5NmiZGk0/s1600-h/thirdbesto.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R8NH5PMevWI/AAAAAAAAAME/jWJ5NmiZGk0/s320/thirdbesto.png" alt="" id="BLOGGER_PHOTO_ID_5171055845808455010" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This happens to be a 'g' that was cut-off at the bottom during the traumatizing training process.  This is also fairly understandable.  Finally, the 4th closest match is an 'o'.  (Of course after that there are plenty more random results.)  This is what the first matching 'o' looks like:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NI3vMevXI/AAAAAAAAAMM/zUADjH0QFQI/s1600-h/omatch.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R8NI3vMevXI/AAAAAAAAAMM/zUADjH0QFQI/s320/omatch.png" alt="" id="BLOGGER_PHOTO_ID_5171056919550279026" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-5634076736303150457?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/5634076736303150457/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=5634076736303150457' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5634076736303150457'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5634076736303150457'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/experimenting-with-nearest-neighbor.html' title='Experimenting with nearest neighbor'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R8NLMvMevYI/AAAAAAAAAMU/IxJ0R33thUQ/s72-c/j.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8259883251348159245</id><published>2008-02-13T15:58:00.000-08:00</published><updated>2008-02-13T15:59:58.204-08:00</updated><title type='text'>First try at nearest neighbor - very simple!</title><content type='html'>I took one of my a's from "Jean Poole" and ran nearest neighbor with it against my training a's and b's only, and the nearest neighbor was an 'a'!  Phew.  At least that works.  As a matter of fact, the top 10 nearest neighbors were a's, and then there was a 'b' and so on.  And now for harder tests....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8259883251348159245?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8259883251348159245/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8259883251348159245' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8259883251348159245'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8259883251348159245'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/first-try-at-nearest-neighbor-very.html' title='First try at nearest neighbor - very simple!'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-3130260714822069029</id><published>2008-02-13T13:37:00.001-08:00</published><updated>2008-11-12T17:38:07.175-08:00</updated><title type='text'>Test images all the same size</title><content type='html'>These are some examples of test images after I've made them all the same size.  Now I have to do the same for training images and make sure the test images AND training images can be the same size!&lt;br /&gt;&lt;br /&gt;Just to give you an idea, the size of the characters are about 76 by 40 pixels.&lt;br /&gt;&lt;br /&gt;The 'h' here is bad, but I can't fix this without changing the code a lot and in the process making other things break:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R7NjT_MevOI/AAAAAAAAAKU/2du24EEWQPg/s1600-h/brightsamesize.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R7NjT_MevOI/AAAAAAAAAKU/2du24EEWQPg/s400/brightsamesize.png" alt="" id="BLOGGER_PHOTO_ID_5166582392556535010" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This one looks good to me:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R7NjgvMevPI/AAAAAAAAAKc/7ZA391MXEQc/s1600-h/samesizejean.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R7NjgvMevPI/AAAAAAAAAKc/7ZA391MXEQc/s400/samesizejean.png" alt="" id="BLOGGER_PHOTO_ID_5166582611599867122" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-3130260714822069029?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/3130260714822069029/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=3130260714822069029' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3130260714822069029'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3130260714822069029'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/test-images-all-same-size.html' title='Test images all the same size'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R7NjT_MevOI/AAAAAAAAAKU/2du24EEWQPg/s72-c/brightsamesize.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-2950944919314834433</id><published>2008-02-13T12:22:00.002-08:00</published><updated>2008-11-12T17:38:07.585-08:00</updated><title type='text'>Example of test data</title><content type='html'>Here is what the test data looks like once I've isolated the character boxes and done a little thresholding:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R7NRuvMevMI/AAAAAAAAAKE/Z6khj3iUiLo/s1600-h/badjean.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R7NRuvMevMI/AAAAAAAAAKE/Z6khj3iUiLo/s400/badjean.png" alt="" id="BLOGGER_PHOTO_ID_5166563060908735682" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is what the test data looks like after I've removed the unwanted lines towards the edges:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R7NRy_MevNI/AAAAAAAAAKM/DHByBXqVIVs/s1600-h/goodjean.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R7NRy_MevNI/AAAAAAAAAKM/DHByBXqVIVs/s400/goodjean.png" alt="" id="BLOGGER_PHOTO_ID_5166563133923179730" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-2950944919314834433?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/2950944919314834433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=2950944919314834433' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2950944919314834433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2950944919314834433'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/example-of-test-data.html' title='Example of test data'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R7NRuvMevMI/AAAAAAAAAKE/Z6khj3iUiLo/s72-c/badjean.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-3245843221383763531</id><published>2008-02-11T15:40:00.000-08:00</published><updated>2008-11-12T17:38:07.790-08:00</updated><title type='text'>Training data all set</title><content type='html'>Partial set of training data for character 'a':&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R7DdLvMevLI/AAAAAAAAAJ8/F2idl4qQQUA/s1600-h/trainingas.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R7DdLvMevLI/AAAAAAAAAJ8/F2idl4qQQUA/s400/trainingas.png" alt="" id="BLOGGER_PHOTO_ID_5165871966311070898" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;All of these individual characters are saved to their own .png file.  The next step is to make them all the same size by adding extra white pixels to the boundaries, based on the size of the largest training image.  Then, I will perform the nearest neighbor algorithm on the test data.  The goal is for this to be done by Wednesday.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-3245843221383763531?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/3245843221383763531/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=3245843221383763531' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3245843221383763531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3245843221383763531'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/training-data-all-set.html' title='Training data all set'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R7DdLvMevLI/AAAAAAAAAJ8/F2idl4qQQUA/s72-c/trainingas.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4614246544382241201</id><published>2008-02-06T13:50:00.000-08:00</published><updated>2008-11-12T17:38:08.133-08:00</updated><title type='text'>Looking at thresholded characters</title><content type='html'>Just for kicks, this is what the training images look like as binary images.  The top row is the 8-bit image and the bottom row is the binary image.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6or7xhV6cI/AAAAAAAAAJs/XctTEA344ns/s1600-h/threshedas.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6or7xhV6cI/AAAAAAAAAJs/XctTEA344ns/s400/threshedas.png" alt="" id="BLOGGER_PHOTO_ID_5163988228639156674" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And b's, as always:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R6osGRhV6dI/AAAAAAAAAJ0/duPC9krrBXU/s1600-h/threshedbs.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R6osGRhV6dI/AAAAAAAAAJ0/duPC9krrBXU/s400/threshedbs.png" alt="" id="BLOGGER_PHOTO_ID_5163988409027783122" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here all I have to store are the non-zero indices, as opposed to storing the whole image, which is beneficial.  And now I should center these characters!  (Subtract off the centroid... why is that so hard?!)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4614246544382241201?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4614246544382241201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4614246544382241201' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4614246544382241201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4614246544382241201'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/looking-at-thresholded-characters.html' title='Looking at thresholded characters'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_PakaWnbOqtc/R6or7xhV6cI/AAAAAAAAAJs/XctTEA344ns/s72-c/threshedas.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8769444153184496211</id><published>2008-02-04T15:11:00.000-08:00</published><updated>2008-11-12T17:38:09.045-08:00</updated><title type='text'>Using results of edge.m</title><content type='html'>Since there are artifacts near the edges on the letters both in the training and test data, perhaps I will use the results of edge.m to run nearest neighbor on.  Here are examples of outputs of edge.m (the figures with black in the background).&lt;br /&gt;&lt;br /&gt;Training data 'a':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R6ecPRhV6VI/AAAAAAAAAIc/f1jj3Fih4ZQ/s1600-h/cleanas.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R6ecPRhV6VI/AAAAAAAAAIc/f1jj3Fih4ZQ/s400/cleanas.png" alt="" id="BLOGGER_PHOTO_ID_5163267284018784594" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Training data 'b':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6ecXhhV6WI/AAAAAAAAAIk/gI0o6ej228U/s1600-h/cleanbs.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6ecXhhV6WI/AAAAAAAAAIk/gI0o6ej228U/s400/cleanbs.png" alt="" id="BLOGGER_PHOTO_ID_5163267425752705378" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Test data with edge.m run on the individual characters (after removing the leftover lines).  Still needs some work.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R6echBhV6XI/AAAAAAAAAIs/n4PZ7Bgl6Uc/s1600-h/jeanedge.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R6echBhV6XI/AAAAAAAAAIs/n4PZ7Bgl6Uc/s400/jeanedge.png" alt="" id="BLOGGER_PHOTO_ID_5163267588961462642" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And these were the original characters.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R6edgxhV6bI/AAAAAAAAAJM/m63vLb2dUHQ/s1600-h/jeanseparated.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R6edgxhV6bI/AAAAAAAAAJM/m63vLb2dUHQ/s400/jeanseparated.png" alt="" id="BLOGGER_PHOTO_ID_5163268684178123186" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Another test data image:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R6edPxhV6YI/AAAAAAAAAI0/iniJF5Iyjfg/s1600-h/brightedge.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R6edPxhV6YI/AAAAAAAAAI0/iniJF5Iyjfg/s400/brightedge.png" alt="" id="BLOGGER_PHOTO_ID_5163268392120347010" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8769444153184496211?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8769444153184496211/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8769444153184496211' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8769444153184496211'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8769444153184496211'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/using-results-of-edgem.html' title='Using results of edge.m'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R6ecPRhV6VI/AAAAAAAAAIc/f1jj3Fih4ZQ/s72-c/cleanas.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-497942490250015607</id><published>2008-02-04T14:16:00.001-08:00</published><updated>2008-11-12T17:38:10.143-08:00</updated><title type='text'>I'm better at getting rid of leftover lines!</title><content type='html'>Yes, I am.  Now the next step is to save all training characters in one place.  Here are some examples of the training data (because you really need more).&lt;br /&gt;&lt;br /&gt;Top line is after lines were removed, bottom is before.&lt;br /&gt;&lt;br /&gt;Example 1:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R6eO4BhV6RI/AAAAAAAAAH8/lZUo6_uoFcU/s1600-h/acomp4.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R6eO4BhV6RI/AAAAAAAAAH8/lZUo6_uoFcU/s400/acomp4.png" alt="" id="BLOGGER_PHOTO_ID_5163252590935664914" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Example 2:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R6eO9RhV6SI/AAAAAAAAAIE/NWFTxjYnHpo/s1600-h/acomp5.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R6eO9RhV6SI/AAAAAAAAAIE/NWFTxjYnHpo/s400/acomp5.png" alt="" id="BLOGGER_PHOTO_ID_5163252681129978146" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Example 3:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6ePShhV6TI/AAAAAAAAAIM/xR2-CHmHlf4/s1600-h/bcomp4.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6ePShhV6TI/AAAAAAAAAIM/xR2-CHmHlf4/s400/bcomp4.png" alt="" id="BLOGGER_PHOTO_ID_5163253046202198322" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Example 4:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R6ePZxhV6UI/AAAAAAAAAIU/6_m7wVEYqpI/s1600-h/bcomp5.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R6ePZxhV6UI/AAAAAAAAAIU/6_m7wVEYqpI/s400/bcomp5.png" alt="" id="BLOGGER_PHOTO_ID_5163253170756249922" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-497942490250015607?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/497942490250015607/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=497942490250015607' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/497942490250015607'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/497942490250015607'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/im-better-at-getting-rid-of-leftover.html' title='I&apos;m better at getting rid of leftover lines!'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R6eO4BhV6RI/AAAAAAAAAH8/lZUo6_uoFcU/s72-c/acomp4.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-7895575398492633598</id><published>2008-02-04T12:44:00.000-08:00</published><updated>2008-11-12T17:38:10.739-08:00</updated><title type='text'>Getting rid of leftover lines</title><content type='html'>To get rid of the leftover lines, I expand a rectangle starting in the middle of the character image and stop it when the sum of each rectangle edge is at a maximum (has the most white space).&lt;br /&gt;&lt;br /&gt;One issue is that the size of the character boxes will vary now, but hopefully I will be able to correct this by zero padding the images (or 1 padding...).&lt;br /&gt;&lt;br /&gt;Sometimes it works, sometimes it doesn't.  Here are some examples of before and after removing the leftover lines.  The images on the top half of the figures are after, and on the bottom are before.&lt;br /&gt;&lt;br /&gt;Good example of 'a':&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R6d6NxhV6OI/AAAAAAAAAHk/V2loCDuPWRc/s1600-h/acomparison.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R6d6NxhV6OI/AAAAAAAAAHk/V2loCDuPWRc/s400/acomparison.png" alt="" id="BLOGGER_PHOTO_ID_5163229874853636322" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Good 'b' example:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6d6UhhV6PI/AAAAAAAAAHs/TKsm-YIUtbA/s1600-h/bcomparison.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6d6UhhV6PI/AAAAAAAAAHs/TKsm-YIUtbA/s400/bcomparison.png" alt="" id="BLOGGER_PHOTO_ID_5163229990817753330" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Bad 'a' example:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6d6chhV6QI/AAAAAAAAAH0/c0omrj39m-8/s1600-h/moreacomparison.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6d6chhV6QI/AAAAAAAAAH0/c0omrj39m-8/s400/moreacomparison.png" alt="" id="BLOGGER_PHOTO_ID_5163230128256706818" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-7895575398492633598?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/7895575398492633598/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=7895575398492633598' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/7895575398492633598'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/7895575398492633598'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/02/getting-rid-of-leftover-lines.html' title='Getting rid of leftover lines'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R6d6NxhV6OI/AAAAAAAAAHk/V2loCDuPWRc/s72-c/acomparison.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4366371531733124354</id><published>2008-01-31T16:16:00.000-08:00</published><updated>2008-11-12T17:38:10.914-08:00</updated><title type='text'>Removing leftover edges</title><content type='html'>Hmmmmmmmm..... how to get rid of the junk towards the edges of the figure.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6Jk5xhV6NI/AAAAAAAAAHc/JZWGaeuXh8Q/s1600-h/rawa.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6Jk5xhV6NI/AAAAAAAAAHc/JZWGaeuXh8Q/s400/rawa.png" alt="" id="BLOGGER_PHOTO_ID_5161799066628516050" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4366371531733124354?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4366371531733124354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4366371531733124354' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4366371531733124354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4366371531733124354'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/removing-leftover-edges.html' title='Removing leftover edges'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_PakaWnbOqtc/R6Jk5xhV6NI/AAAAAAAAAHc/JZWGaeuXh8Q/s72-c/rawa.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-3685121473486744773</id><published>2008-01-30T15:56:00.000-08:00</published><updated>2008-11-12T17:38:11.642-08:00</updated><title type='text'>I got some training data</title><content type='html'>&lt;a href="http://www.kyb.mpg.de/%7Efabee"&gt;Fabian&lt;/a&gt; gave me 18 filled out handwriting sheets, which is very helpful!  I have been parsing those, and have been getting better results.  Here is an example of another filled out sheet - that was scanned in at an angle.  Luckily, that isn't a problem!&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R6EPZRhV6KI/AAAAAAAAAHE/maVcMX_sfG4/s1600-h/bettertrain.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R6EPZRhV6KI/AAAAAAAAAHE/maVcMX_sfG4/s400/bettertrain.png" alt="" id="BLOGGER_PHOTO_ID_5161423574817695906" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here are examples of the a's from this image:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R6EPlRhV6LI/AAAAAAAAAHM/vnNB3UA-8jw/s1600-h/betteras.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R6EPlRhV6LI/AAAAAAAAAHM/vnNB3UA-8jw/s400/betteras.png" alt="" id="BLOGGER_PHOTO_ID_5161423780976126130" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here are examples of the b's from this image:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R6EPoBhV6MI/AAAAAAAAAHU/4K9Q-KBynYI/s1600-h/betterbs.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R6EPoBhV6MI/AAAAAAAAAHU/4K9Q-KBynYI/s400/betterbs.png" alt="" id="BLOGGER_PHOTO_ID_5161423828220766402" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-3685121473486744773?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/3685121473486744773/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=3685121473486744773' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3685121473486744773'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/3685121473486744773'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/i-got-some-training-data.html' title='I got some training data'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R6EPZRhV6KI/AAAAAAAAAHE/maVcMX_sfG4/s72-c/bettertrain.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-5073128369290607937</id><published>2008-01-28T15:46:00.000-08:00</published><updated>2008-11-12T17:38:12.160-08:00</updated><title type='text'>Example of cut out training data</title><content type='html'>Here are some example a's:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R55pfRhV6II/AAAAAAAAAGc/wqVrJIzpAt4/s1600-h/exampleas.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R55pfRhV6II/AAAAAAAAAGc/wqVrJIzpAt4/s400/exampleas.png" alt="" id="BLOGGER_PHOTO_ID_5160678209013278850" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here are some example b's:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R55plBhV6JI/AAAAAAAAAGk/J2KoT_yHLkg/s1600-h/examplebs.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R55plBhV6JI/AAAAAAAAAGk/J2KoT_yHLkg/s400/examplebs.png" alt="" id="BLOGGER_PHOTO_ID_5160678307797526674" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You get the picture.  The question now is how much that extra noise at the edges of the letters is going to have an effect.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-5073128369290607937?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/5073128369290607937/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=5073128369290607937' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5073128369290607937'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5073128369290607937'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/example-of-cut-out-training-data.html' title='Example of cut out training data'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R55pfRhV6II/AAAAAAAAAGc/wqVrJIzpAt4/s72-c/exampleas.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8869095693501838435</id><published>2008-01-28T14:36:00.000-08:00</published><updated>2008-11-12T17:38:12.362-08:00</updated><title type='text'>Example of training data</title><content type='html'>Here is an example of the training data I'll be using for character classification.  It is called ABCDETC and is from NEC Labs, available &lt;a href="http://www.nec-labs.com/%7Ejasonw/abcdetc/"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I am using the same parsing method that I am using for the scanned names.  Here non maximal suppression actually works!&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R55ZLBhV6HI/AAAAAAAAAGU/cIIejVr-K14/s1600-h/trainexample.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R55ZLBhV6HI/AAAAAAAAAGU/cIIejVr-K14/s400/trainexample.png" alt="" id="BLOGGER_PHOTO_ID_5160660268934883442" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8869095693501838435?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8869095693501838435/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8869095693501838435' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8869095693501838435'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8869095693501838435'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/example-of-training-data.html' title='Example of training data'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R55ZLBhV6HI/AAAAAAAAAGU/cIIejVr-K14/s72-c/trainexample.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-2733012252396264789</id><published>2008-01-25T15:20:00.001-08:00</published><updated>2008-11-12T17:38:12.537-08:00</updated><title type='text'>Non maximal suppression not really working</title><content type='html'>My plan was to find the top vertical line found from the hough transform (the one with the most votes) and then take the angle from that line and assume that that angle is the angle of the remaining vertical lines that I'm interested.  So at that point I take the column vector of the hough space corresponding to that angle and sort it in descending order and plot the top 20 lines found.  I also implemented non maximal suppression in the following way: for each line that I'm about to plot, I check if the amount of votes for that line is the maximum in that cell's surrounding window (just looking at changes in rho, not theta).  If it isn't a maximum then I skip it - I don't plot that line.&lt;br /&gt;&lt;br /&gt;I've been having a hard time playing around with the window threshold - how big should the window be?  When it is too big then I start missing lines that I need but when it is too small then it doesn't get rid of enough lines.  The following is a perfect example.  I was playing with the threshold for the following image and this was the best I could get it.  It didn't get rid of the additional vertical line between the e and the a, while it missed the line between the n and the space.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R5pv1RhV6GI/AAAAAAAAAF0/ECL24uNJLns/s1600-h/nonmaxsup.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R5pv1RhV6GI/AAAAAAAAAF0/ECL24uNJLns/s400/nonmaxsup.png" alt="" id="BLOGGER_PHOTO_ID_5159559284133324898" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-2733012252396264789?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/2733012252396264789/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=2733012252396264789' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2733012252396264789'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2733012252396264789'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/non-maximal-suppression-not-really.html' title='Non maximal suppression not really working'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R5pv1RhV6GI/AAAAAAAAAF0/ECL24uNJLns/s72-c/nonmaxsup.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4564000297645932278</id><published>2008-01-25T10:58:00.000-08:00</published><updated>2008-11-12T17:38:12.670-08:00</updated><title type='text'>Isolated letters</title><content type='html'>These are what the letters look like when I cut them out of the image based on the lines found by the hough transform:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R5oxjBhV6FI/AAAAAAAAAFs/B7O9fAWfn6w/s1600-h/jeanseparated.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R5oxjBhV6FI/AAAAAAAAAFs/B7O9fAWfn6w/s400/jeanseparated.png" alt="" id="BLOGGER_PHOTO_ID_5159490800879790162" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;My next step is to write a function that takes in one of these letter images and determines whether or not there is a letter inside.  I also need to think of a way to implement non maximal suppression!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4564000297645932278?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4564000297645932278/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4564000297645932278' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4564000297645932278'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4564000297645932278'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/isolated-letters.html' title='Isolated letters'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R5oxjBhV6FI/AAAAAAAAAFs/B7O9fAWfn6w/s72-c/jeanseparated.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-5681768776917846072</id><published>2008-01-23T11:05:00.001-08:00</published><updated>2008-11-12T17:38:12.884-08:00</updated><title type='text'>Better Isolated Boxes</title><content type='html'>Here are two examples of the lines around the character boxes.  This is after determining the width and height of the boxes and then drawing lines on top of the figure.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R5eQRRhV6EI/AAAAAAAAAFk/IsTx370Nz9c/s1600-h/isolatedjean.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R5eQRRhV6EI/AAAAAAAAAFk/IsTx370Nz9c/s400/isolatedjean.png" alt="" id="BLOGGER_PHOTO_ID_5158750524611618882" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R5eQMhhV6DI/AAAAAAAAAFc/CS6cht-Cz5Q/s1600-h/brightisolated.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R5eQMhhV6DI/AAAAAAAAAFc/CS6cht-Cz5Q/s400/brightisolated.png" alt="" id="BLOGGER_PHOTO_ID_5158750443007240242" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-5681768776917846072?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/5681768776917846072/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=5681768776917846072' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5681768776917846072'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5681768776917846072'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/better-isolated-boxes.html' title='Better Isolated Boxes'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R5eQRRhV6EI/AAAAAAAAAFk/IsTx370Nz9c/s72-c/isolatedjean.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-2545465785678552885</id><published>2008-01-22T20:19:00.000-08:00</published><updated>2008-11-12T17:38:13.714-08:00</updated><title type='text'>Isolating character boxes</title><content type='html'>This is where I'm currently at with regards to isolating the character boxes:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R5bAkhhV5_I/AAAAAAAAAE8/W9OYeUOovkY/s1600-h/jeanpoole.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R5bAkhhV5_I/AAAAAAAAAE8/W9OYeUOovkY/s400/jeanpoole.png" alt="" id="BLOGGER_PHOTO_ID_5158522156905523186" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This is without rotating the image at all - just taking both the vertical and horizontal gradient, running edge.m on each of those, and then running the hough transform on the result.&lt;br /&gt;&lt;br /&gt;Here is a cool image of the hough space for the vertical lines:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R5bCbRhV6CI/AAAAAAAAAFU/QGlr4WGOHnU/s1600-h/verthough.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R5bCbRhV6CI/AAAAAAAAAFU/QGlr4WGOHnU/s400/verthough.png" alt="" id="BLOGGER_PHOTO_ID_5158524197014988834" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I have been trying so hard to rotate the images properly that I forgot my real goal which was to isolate the character boxes.  Finally when I was able to rotate the image, it turned out that that did not help me out at all!  Here are examples of before and after rotated images:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R5bBsRhV6AI/AAAAAAAAAFE/-F7vVjZ4pvs/s1600-h/rotatebefore.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R5bBsRhV6AI/AAAAAAAAAFE/-F7vVjZ4pvs/s400/rotatebefore.png" alt="" id="BLOGGER_PHOTO_ID_5158523389561137154" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R5bByhhV6BI/AAAAAAAAAFM/yZrwbmOMVuI/s1600-h/rotateafter.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R5bByhhV6BI/AAAAAAAAAFM/yZrwbmOMVuI/s400/rotateafter.png" alt="" id="BLOGGER_PHOTO_ID_5158523496935319570" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;As you can see, Matlab didn't do a great job cropping the image after rotating it, even though I specified 'crop' to imrotate.m.  Oh well, for the time being I'm not using it anyway.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-2545465785678552885?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/2545465785678552885/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=2545465785678552885' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2545465785678552885'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/2545465785678552885'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/some-progress.html' title='Isolating character boxes'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R5bAkhhV5_I/AAAAAAAAAE8/W9OYeUOovkY/s72-c/jeanpoole.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-81515057565686738</id><published>2008-01-16T15:03:00.000-08:00</published><updated>2008-11-12T17:38:14.966-08:00</updated><title type='text'>Better angles from Hough</title><content type='html'>I decreased the size of the cells in the accumulator array that the Hough Transform uses and the accuracy of the angles of the detected lines significantly improved.  Yesterday I was using a resolution of 1 radian, but now my resolution is pi/450.&lt;br /&gt;&lt;br /&gt;There are still too many lines detected for the upper boxes compared to the lower boxes.  Here is what I get when I plot the top 5 lines detected (instead of thresholding):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R46O__NT-RI/AAAAAAAAADs/H04UuOoEW5U/s1600-h/houghtop5.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R46O__NT-RI/AAAAAAAAADs/H04UuOoEW5U/s320/houghtop5.png" alt="" id="BLOGGER_PHOTO_ID_5156215853335968018" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;3 of the 5 lines detected are for the top horizontal line.  The 12th line that is detected is the upper horizontal line for the PID.  I'm not quite sure why that is.  Here is what I get when taking the top 12 lines:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R46PyfNT-SI/AAAAAAAAAD0/-f7xNjIyLEE/s1600-h/houghtop12.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R46PyfNT-SI/AAAAAAAAAD0/-f7xNjIyLEE/s320/houghtop12.png" alt="" id="BLOGGER_PHOTO_ID_5156216720919361826" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;When I get greedy and increase the resolution of the angle too much, I start to hurt from it by failing to detect lines.  For example, when I set the resolution to pi/720, my top 20 detected lines are the following:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R46Q1vNT-TI/AAAAAAAAAD8/ghiKSwLqPGo/s1600-h/houghtop20.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R46Q1vNT-TI/AAAAAAAAAD8/ghiKSwLqPGo/s320/houghtop20.png" alt="" id="BLOGGER_PHOTO_ID_5156217876265564466" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It doesn't even detect the bottom horizontal line.&lt;br /&gt;&lt;br /&gt;This is the process I've been using to detect the lines:&lt;br /&gt;1. Resize the image to half of its size.&lt;br /&gt;2. Take the gradient of the image.&lt;br /&gt;3. Run edge.m on the vertical gradient image, specifying 'sobel' as a parameter.&lt;br /&gt;4. Run the Hough Transform on the result of edge.m&lt;br /&gt;5. Sort the accumulator matrix in descending order and plot the top x lines on top of the original image.&lt;br /&gt;&lt;br /&gt;Here is an example vertical gradient of the image:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R46UIfNT-UI/AAAAAAAAAEE/vNdGB9EqFY8/s1600-h/verticalgradient.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R46UIfNT-UI/AAAAAAAAAEE/vNdGB9EqFY8/s320/verticalgradient.png" alt="" id="BLOGGER_PHOTO_ID_5156221496922995010" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here is an example of the output of edge.m, passed into the Hough Transform function:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R46UjPNT-VI/AAAAAAAAAEM/EoOxbmbjAxE/s1600-h/edgeexample.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R46UjPNT-VI/AAAAAAAAAEM/EoOxbmbjAxE/s320/edgeexample.png" alt="" id="BLOGGER_PHOTO_ID_5156221956484495698" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Finally, here is an example of output from the Hough Transform function:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R46VCPNT-XI/AAAAAAAAAEc/7FzUfLqFp28/s1600-h/houghoutput.png"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R46VCPNT-XI/AAAAAAAAAEc/7FzUfLqFp28/s400/houghoutput.png" alt="" id="BLOGGER_PHOTO_ID_5156222489060440434" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The white portions in the middle of the image correspond to the lines detected in the image.&lt;br /&gt;&lt;br /&gt;The next significant problem is detecting the vertical lines.  When I instead use the horizontal gradient, here are the top 20 lines I detect (none of which are useful):&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_PakaWnbOqtc/R46WSfNT-YI/AAAAAAAAAEk/Y3kw3SCf6J4/s1600-h/houghvert.png"&gt;&lt;img style="cursor: pointer;" src="http://1.bp.blogspot.com/_PakaWnbOqtc/R46WSfNT-YI/AAAAAAAAAEk/Y3kw3SCf6J4/s400/houghvert.png" alt="" id="BLOGGER_PHOTO_ID_5156223867744942466" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Also, here is the output of the Hough Transform.  Note how many white parts there are - meaning lots of lines were detected.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R46WmvNT-ZI/AAAAAAAAAEs/kbYRVgxOIes/s1600-h/betterhough.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R46WmvNT-ZI/AAAAAAAAAEs/kbYRVgxOIes/s400/betterhough.png" alt="" id="BLOGGER_PHOTO_ID_5156224215637293458" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Ok one last thing - here is what is given to the Hough Transform function in the vertical case:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R46W__NT-aI/AAAAAAAAAE0/jZ8I3EuRCK8/s1600-h/inputtohoughvert.png"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R46W__NT-aI/AAAAAAAAAE0/jZ8I3EuRCK8/s400/inputtohoughvert.png" alt="" id="BLOGGER_PHOTO_ID_5156224649428990370" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-81515057565686738?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/81515057565686738/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=81515057565686738' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/81515057565686738'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/81515057565686738'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/better-angles-from-hough.html' title='Better angles from Hough'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_PakaWnbOqtc/R46O__NT-RI/AAAAAAAAADs/H04UuOoEW5U/s72-c/houghtop5.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-5361387322736649889</id><published>2008-01-15T13:51:00.000-08:00</published><updated>2008-11-12T17:38:15.081-08:00</updated><title type='text'>Hough is working better but not there yet</title><content type='html'>The change I made was first taking the gradient of the image before running edge.m on it (and finally the hough transform).  This seemed to get rid of a lot of the extra lines towards the top of the image without sacrificing the lines on the bottom of the image.&lt;br /&gt;&lt;br /&gt;I still have the problem that the lines that are detected seemed to not be oriented correctly - perhaps the angle is off.&lt;br /&gt;&lt;br /&gt;Here is what I get now just looking at the vertical gradient and thresholding at 50% of the accumulator:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_PakaWnbOqtc/R40rl_NT-PI/AAAAAAAAADc/_XogRySD6h4/s1600-h/houghbetterbutnotgood.bmp"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_PakaWnbOqtc/R40rl_NT-PI/AAAAAAAAADc/_XogRySD6h4/s320/houghbetterbutnotgood.bmp" alt="" id="BLOGGER_PHOTO_ID_5155825080031508722" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The next step is to get the lines detected on target (have the right angle) and also detect the vertical lines (which has proven to be significantly more difficult).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-5361387322736649889?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/5361387322736649889/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=5361387322736649889' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5361387322736649889'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/5361387322736649889'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/hough-is-working-better-but-not-there.html' title='Hough is working better but not there yet'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_PakaWnbOqtc/R40rl_NT-PI/AAAAAAAAADc/_XogRySD6h4/s72-c/houghbetterbutnotgood.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-8002483207001757792</id><published>2008-01-14T19:27:00.001-08:00</published><updated>2008-11-12T17:38:15.398-08:00</updated><title type='text'>Why is the Hough Transform not working?</title><content type='html'>Ugh... these lines are incorrect!&lt;br /&gt;&lt;br /&gt;Thresholding at 50% max value of accumulator:&lt;br /&gt;Finds too little lines and the lines that it finds aren't completely correct imho:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_PakaWnbOqtc/R4woQvNT-NI/AAAAAAAAADM/VTdWibgf-Xs/s1600-h/houghnoworky.bmp"&gt;&lt;img style="cursor: pointer;" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R4woQvNT-NI/AAAAAAAAADM/VTdWibgf-Xs/s320/houghnoworky.bmp" alt="" id="BLOGGER_PHOTO_ID_5155539941447694546" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Thresholding at ~44% max value of accumulator:&lt;br /&gt;Finds too many lines for the name portion and not enough lines for the PID portion!:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_PakaWnbOqtc/R4wp9fNT-OI/AAAAAAAAADU/w5mXCOeeCDU/s1600-h/houghnoworky2.bmp"&gt;&lt;img style="cursor: pointer;" src="http://3.bp.blogspot.com/_PakaWnbOqtc/R4wp9fNT-OI/AAAAAAAAADU/w5mXCOeeCDU/s320/houghnoworky2.bmp" alt="" id="BLOGGER_PHOTO_ID_5155541809758468322" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;[Ignore the titles of these images!]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-8002483207001757792?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/8002483207001757792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=8002483207001757792' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8002483207001757792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/8002483207001757792'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/why-is-hough-transform-not-working.html' title='Why is the Hough Transform not working?'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R4woQvNT-NI/AAAAAAAAADM/VTdWibgf-Xs/s72-c/houghnoworky.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-4334156782946006630</id><published>2008-01-11T16:02:00.000-08:00</published><updated>2008-11-12T17:38:15.519-08:00</updated><title type='text'>Sample Input</title><content type='html'>Here is an example of an input quiz or assignment. This is the subset of the quiz or assignment that contains the student's name and PID number. Part of the preprocessing will be isolating the name and PID number from the rest of the quiz.&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;One idea that is up in the air right now is using character boxes for the students to enter their name and PID. This simplifies the problem significantly, as I do not need to figure out myself where one letter ends and the next begins.&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;&lt;a href="http://4.bp.blogspot.com/_PakaWnbOqtc/R4gEPvNT-MI/AAAAAAAAADA/T0hEjQS8mSI/s1600-h/danielsmall.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5154374441942382786" style="CURSOR: hand" alt="" src="http://4.bp.blogspot.com/_PakaWnbOqtc/R4gEPvNT-MI/AAAAAAAAADA/T0hEjQS8mSI/s320/danielsmall.JPG" border="0" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-4334156782946006630?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/4334156782946006630/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=4334156782946006630' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4334156782946006630'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/4334156782946006630'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/sample-input.html' title='Sample Input'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_PakaWnbOqtc/R4gEPvNT-MI/AAAAAAAAADA/T0hEjQS8mSI/s72-c/danielsmall.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4978165749595234779.post-1921176531613204821</id><published>2008-01-06T12:57:00.000-08:00</published><updated>2008-01-06T12:59:26.002-08:00</updated><title type='text'>Welcome to Dafna's CSE 190 blog!</title><content type='html'>I will be doing a project on the recognition of handwritten names from a limited and closed lexicon using a hidden markov model.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4978165749595234779-1921176531613204821?l=dafna190.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dafna190.blogspot.com/feeds/1921176531613204821/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4978165749595234779&amp;postID=1921176531613204821' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/1921176531613204821'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4978165749595234779/posts/default/1921176531613204821'/><link rel='alternate' type='text/html' href='http://dafna190.blogspot.com/2008/01/welcome-to-dafnas-cse-190-blog.html' title='Welcome to Dafna&apos;s CSE 190 blog!'/><author><name>Dafna</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
