I just finished a semester of research under Tucker Balch developing software to run on the Georgia Tech entry into the DARPA Urban Challenge. The goal of the DARPA Urban Challenge is to build an autonomous automobile which can navigate an urban environment as a human would. It was very interesting and challenging work with alot of other very motivated undergraduate and graduate students working to build out the platform to drive the Sting Racing team automobile. The students worked in a variety of teams concentrating on specific areas including Visual Odometry, Learning by Example, Pose Estimation, Health Monitoring, and Laser Scan Matching.
I worked in the Learn by Example team where we sought to use instance based learning to associate image-action pairs. Our team developed and tested a variety of different approaches for learning actions from images and efficiently matching to images in image databases. For my part, I implemented a smaller component of our teams overall architecture involving pre-processing of the images obtained from cameras mounted on the automobile.
I spent a lot of time researching different image classification techniques. At first, I was actually attempting to provide a very detailed classification of each image by labeling areas of the image as one of six classes: pavement, white lines, yellow lines, buildings, plants, sky. Given the real-time constraints and the high error of this approach I decided to modify my goal. After playing with a few other ideas and getting feedback from my teammates I decided to cut the number of classes down to two in order to effectively create an image mask identifying the road in the images. In this way my piece served to reduce the amount of data that needed to be processed by our other approaches.
My basic approach was inspired by a paper by Bischof, Schneider, and Pinz. The idea is to use a neural network to classify pixels of an image based only on the data available at each pixel. I used the Weka Machine Learning toolkit to train and test my neural networks. The image above shows the results of a six, three, and two class classification with each class represented by a color overlaying the original image. By the end of the semester I was able to develop a functional component that when given a camera image was able to output an accurate mask identifying the road in the image in real-time. I hope to be able to do future work on this component in order to enable it to learn online using the laser sensor data, parallelize the code to run on multi-processor systems, and output a confidence measure for the image masks.