Watching Certain Pixels: 2013

Saturday, December 28, 2013

machine learning rc car project

A two year old blog post by some one about a remote control car driven by a neural network was interesting. I recently took another look at it, and this time I did something. I'm starting to learn Android development with Java and Android Studio so I modified the code a bit to run in Android Studio, and then forked the remote control car project on github. Used jdk 7 for android studio and gradle 1.8. Encountered some trouble with the alert dialog to get the ip so rewrote it as separate activity. First test with wifi communication between laptop and phone did not work. Not sure why it started working after a reinstall. Will need to test it again later.

Monday, November 18, 2013

Machine Learning Basics

This video, Machine Learning: The Basics, with Ron Bekkerman, looks like an interesting introduction to machine learning. I tried the Kaggle digit recognizer kNN benchmark over a month ago and it took several hours on my laptop. He does warn in the video that this algorithm can experience some scalability issues. I haven't had the desire to progress further in the Learning From Data class. Watching this video on machine learning was a step I haven't taken in quite a while. The video didn't get into much technical detail, but I didn't really feel like sitting through that right now.

Friday, October 25, 2013

Quick Note about P-value Confusion

Lately I've been working on learning about p-values and hypothesis testing. I'm still learning introductory statistics and am having trouble keeping the idea so I am going to note it here for future reference. A comment with a mnemonic at cross validated is "p is low, H0 must go". At this point, I believe that means the probability of seeing the data assuming that the null hypothesis is true is too low to believe. It is highly unlikely we would see this data if the original assumption was correct, and so we should reject the null hypothesis. I hope I have this in the right order and that it is not always the probability of the null hypothesis being true given the data collected. The wikipedia page has a list of misunderstandings and criticisms of the p-value that I hope to check out later. Based on a forum post, to interpret p-value again, if the null hypothesis is true then p percent of experiments should show a test statistic that is the same or more extreme than what was collected with the current experiment being analyzed. Hence by repeating the experiment one can become more confident that the data from a previous experiment was not an extreme event. Once more, based on the wikipedia page, the p-value is the probability of observing the test statistic or something more extreme.

Sunday, September 01, 2013

Javascript Homework

Learning From Data
Homework #1 visualization
101 iterations
20 data points

A jQuery Mobile Charts post encouraged me to try out an unusual javascript project. It helped answer question 7 of homework #1 from the 2012 archives of the Learning From Data course by CalTech. In this homework assignment, an exercise is proposed to use a very basic perceptron learning algorithm presented in lecture. The problem definition restricted input to a 2-d feature set to facilitate visualization with scatter charts. Another simplification involved limiting the number of classes to 2. A function defined by a randomly placed line determined class. As can be seen in the picture, above the orange line was the red class, and below the line was the blue class. The orange line in the chart is the function the learning algorithm is trying to approximate. The particular case posted in the image was an outlier that took 101 tries before the learning algorithm hypothesized the green line shown in the picture and thus correctly classified all 20 points of the training data. The two lines look very similar so in this case the classification error was very low. Some runs with a different random set of data were not so close.

The project requirements were not tied to a programming language or platform and don't need high performance. This was just the first homework assignment, and I am sure it gets harder. At first I started with a C# tool due to familiarity, but later I attended a jQuery mobile class and found the related charts blog post. Since I still want to finish my reversi project from last year in javascript and also wanted to try out some of what was taught in the class, I changed platforms and languages. Of course, using javascript for a machine learning solution seems unusual. Most existing machine learning libraries are in another language. But due to the nature of the homework assignment, it seemed to make sense for this special case:

The simple perceptron learning algorithm isn't likely to be found in an existing library because it is too basic to be of practical use.
Coding it from scratch enables learning
It is simple enough to code from scratch
I have some familiarity with javascript language already
It is relatively easy to share the results cross platform
It doesn't take a lot of code as long as you have a plotting library
Performance isn't a requirement
Platform isn't a requirement

I only needed to restart plotting points and a line to get to the point were I left off in the other language. A problem with the charting library arose, but thankfully a guy from Europe found a workaround and posted it to stack overflow. With his help, I was able to use the jqplot library together with jquery mobile to finish the tool this holiday weekend.

Friday, March 22, 2013

Repost of coding tip

Interesting tip on the process of coding on tumblr from Brad Milne:
The One Tip That Will Help You Learn To Code 10x Faster

The information about a trap was what struck me:

To be honest, it hurt when I heard it because I literally have spent probably close to 100 hours so far looking for answers online and getting caught up into learning something else. You know how it is… because you don’t know what you don’t know it’s easy to fall into the trap of thinking that whatever you’re learning at the time is worthwhile. IT ISN’T.

It reminds me of a search tree that has branched too widely on a breadth first search or has gone too deeply on a depth first search. It seems one needs a heuristic to bound the search. But as he states, you don't know what you don't know, so it is hard to estimate the distance to finding the goal node from the node you are on.