Yahoo! Open Hack All Stars

After two years of waiting, Yahoo! finally set a date for their global hackathon*, the Yahoo! Open Hack All Stars, to bring together all their Open Hack and HackU winners from around the globe to compete against each other. I was part of a team of three (with Rory Tulk and Aran Donohue) that won the Canadian HackU event which was hosted at both the University of Toronto and the University of Waterloo in 2009.

The 2009 HackU event was memorable, if only because our team surprised Yahoo! who had put more time and resources and had more teams competing from Waterloo (a school better known for Computer Science because it’s the birthplace of the Blackberry). Our team was the only Toronto team to demo and we demoed over Skype from Aran’s cubicle. Our hack was a collaborative online document editor called Docuvine (later renamed Pagical) with a “live copy” feature that allowed you to pull in any web content that would automatically be kept up to date. For example, you could write a report that would always include the latest data from an online spreadsheet or the latest tweet from someone’s Twitter stream. This hack also ended up getting Aran and I an interview at YCombinator (which we didn’t end up getting selected for) but that’s a story for another time.

Back to the present, it ended up that only Rory and I could make it to the Open Hack All Stars event in NYC. Like the Startup Weekend event, Rory and I had just 24 hours to create some new and interesting software with the additional requirements that it should use Yahoo! technologies and target the digital media publishing space (because the Open Hack All Stars event was running in parallel with Yahoo’s Global Partner Summit whose attendees the finalists would demo to).

Given the rise of photo journalism online and the popularity of sites like Boston.com’s Big Picture, we decided to create a browser extension for Google Chrome that would automatically transform traditional newspapers’ homepages into a more photo-centric news experience. This would allow consumers to visually browse the latest news articles while still having the option of reading the full text coverage of each article. And all with no site changes or work required on the part of publishers.

We used YQL to fetch articles from sites, Readability to extract the article text and images and jQuery with the Masonry plugin to build our photo layout in a gratuitous but impressive animated sequence. The most time intensive part of our hack was converting the Readability library to work with arbitrary pages rather than just the current page. In an homage to how Readability makes sites easier to read and how we were attempting to make sites easier to visually browse, we called our extension Viewability.

For our demo, we used the National Post as our example publisher site. If you’re using Google Chrome, you can see it in action yourself by downloading and installing the Viewability extension and then visiting the National Post site. It currently only works on the National Post site but it wouldn’t be hard to add support for other sites. And, it can also be easily converted to work as a standard bookmarklet. For our demo though, I found that it worked faster as an extension but using web workers might solve that.

Below are screenshots of what the photo browsing layout and the single article view look like. The most impressive part is of course the animation (you can get some sense of how the animation worked without installing the extension by playing with this jQuery Masonry demo).

The photo browsing interface with captions and mouse over article snippets.

 

The full text article view users see when they click on a photo from the photo browsing interface.

 

In the end, Viewability was one of the top six finalists but didn’t come out on top. So it ended up being a nice all expenses paid trip to NYC (though outside of a late night bus tour, all the contestants spent 99% of their time in a hotel writing code – and loved it).

*Hack in the sense of building software and not the illegal breaking into computers sense.

Comic Gopher Reborn

Comic Gopher

I’ve updated my old desktop-based webcomic viewer. The new version is related to the old one in name only. It uses completely different technologies and supports a different set of comics.

It is a purely client-side HTML and javascript web application. There are no cookies or server side storage of any kind – everything is stored in your browsers local cache. This means that your subscriptions and settings aren’t accessible on other computers or other browsers (in this first version at least). So make sure you stick to one computer and one browser when reading comics.

All the comics come from either the Darkgate Comic Slurper, phpGrabComics or straight from the comic author’s sites. If a comic’s author provides a feed, it’s relatively easy to add a their comic so feel free to send requests to add new comics. Some of the comics you can subscribe to aren’t appropriate for all ages, consider yourself warned. Now go read some comics and let me know what you think.

Savant Web Client

For the theory requirement of my Master’s degree, I took Topics in Computational Biology: Analysis of High Throughput Sequencing Data with Michael Brudno.  Since my last encounter with biology was in high school, I spent most of the time for this course reading Wikipedia and other sites to fill in my knowledge gaps before I could even make sense of the assigned papers.  It was actually refreshing to delve into a field so different from Computer Science.  I was especially intrigued by the idea of DNA as the programming language of life (though I don’t think I’d want to program DNA until there is something at least as abstract as an assembly language and the hardware to automatically run it).

As my final project for the course, I built a web-based genome browser called the Savant Web Client which is based on the recently released Savant Desktop Genome Browser built by Marc Fiume and Vanessa Williams of the Computational Biology Lab at University of Toronto.

The goal of my final project was to show that a web-based genome browser could look just as good and perform just as fast as a desktop based genome browser.  I did this by replicating the main functionality of the Savant desktop genome browser in the Savant Web Client.  Also, the Savant team didn’t have anyone concentrating on the HCI aspects of Savant, so I also took time to tweak and add to the user interface as I rebuilt it for the web.  My project write-up for the Savant Web Client explains everything in more detail and explains how to download and use the GPL’ed source code.

What I enjoyed most about creating the Savant Web Client, and the real reason I chose to do it, was the chance to play with some of the newest web technologies.  In fact, my choice of APIs restricted me to targeting only the Google Chrome web browser.  The Savant Web Client is powered by web sockets, the canvas element, ProcessingJS, jQuery, YUI, and the client/server jWebSocket libraries.  Only the use of web sockets requires the client to run on Google Chrome.  The jWebSocket libraries do seem to support methods of simulating web sockets on other browsers though I didn’t attempt to use them.

In my opinion, the coolest parts of the Savant Web Client are the login screen and the live collaboration.  On the login screen, I put together a wicked 3D double helix visualization by combining ProcessingJS and JS3D.  I definitely spent more time on it than I should have given the project but I’ve always had a tendency to over-engineer splash screens and intro pages.

The live collaboration of the Savant Web Client is a product of the publish/subscribe model of web socket communication I used.  Currently, a Savant Web Client server only gives clients access to one genome and interval track.  So everyone who connects is looking at the same genome and interval track and every time one client pans or zooms, every other client sees their views pan and zoom as well.  I didn’t build in any chat or other collaborative tools but you can easily imagine the client being expanded to support a more comprehensive form of collaborative live genome analysis (something none of the other genome browsers support as far as I know).

You can read more about the Savant Web Client and download the source if you view the Savant Web Client project writeup.

Learning Foreign Language Vocabulary with ALOE

I completed my Master’s thesis this week.  I have to say I cut it a bit close with some of the participants in my study finishing up just last week.  In the course of writing my thesis, I’ve acquired a much deeper understanding of statistics, data visualizations and the more mundane art of Microsoft Word collaboration and document formatting.

For my Master’s, I’ve developed a new system that teaches vocabulary in context by transforming a student’s everyday web browsing experience into a language learning environment. The prototype, dubbed ALOE, selectively translates parts of every web page into the foreign language being learned such that the student reading the page can learn vocabulary using contextual hints provided by the untranslated words.  ALOE also provides multiple choice questions and definition lookups on the translated web pages.  The key idea behind ALOE is that it is able to augment students existing web browsing habits in order to provide language learning opportunities that don’t impede the students web browsing tasks.

To summarize the research results, the two month user evaluation of the ALOE prototype showed that the foreign vocabulary learning approach taken by ALOE works in practice.  Most of the participants enjoyed using ALOE and they were able to learn an average of fifty new vocabulary words.  It was also found that most of the participants wanted to continue using the ALOE prototype as-is but would have benefited from improvements in speed, Website compatibility, learning adaptability and the ability to customize ALOE.

To get all the nitty-gritty details and see the pretty data visualizations I created, feel free to peruse the full thesis:

Andrew Trusty – MSc Thesis – Augmenting the L1 Web for L2 Vocabulary Learning

Update: A shorter verson of my thesis was accepted to the 2011 ACM CHI Conference

The ALOE software currently isn’t available.  Releasing it will require a bit of work to remove all the study-specific hooks and cruft and setup a new server.  But if you’re interested in using it, leave a comment or contact me directly to let me know.  If there’s enough interest I might find the time to release it.

Readable Feeds

(Update – Readable Feeds has become a victim of it’s success and the new App Engine quota limitations and is no longer running – but there are many alternatives)

Another weekend, another Google App Engine project.  This time it’s called Readable Feeds and thankfully, I actually finished it in a weekend unlike Cloudsafe.  Readable Feeds is an extension of the Arc 90 Readability Experiment and Nirmal Patel’s Hacker News Readability script.  It is actually a very simple application, you give it a feed and it generates a new feed that hopefully has more content and less clutter than the original feed.

For example, with the Hacker News feed which consists primarily of just links to interesting web pages, the feed is transformed to contain the content of the pages linked to so that you don’t have to leave your feed reader to access the full content (Nirmal’s page has some good screenshots showing this).  It can also repair those crippled feeds that only show excerpts and replace the excerpts with the full content.  I said hopefully before because this process doesn’t always work and in fact fails spectacularly on some feeds like those from the New York Times which link to registration protected pages which Readable Feeds can’t bypass.

I’m also happy to report this is my first project (but hopefully not last) to be featured on Hacker News.  Some of you might also notice a striking visual similarity to Pubfeed which is of course purely coincidence.

Cloudsafe Online Backups

(Update – Cloudsafe is no longer running and I’ve sold the App Engine sub-domain to the fine folks at CloudSafe.com)

I just put the finishing touches on my first substantial Google App Engine project called Cloudsafe. Cloudsafe is a safe and easy way to backup the data you’ve accumulated with all the web applications you use. By giving it your account details for various web applications like Delicious, Google Reader, and LibraryThing, Cloudsafe will create a single downloadable archive of all the data you have on those sites, whether it’s your bookmarks, blog feeds or your book collection.

I started working on it a week ago as a weekend project but it evolved into a week-long battle to get App Engine to conform to my needs. Google has built an impressive service with App Engine but it takes some getting used to because it lacks the standard cron interface and they’ve also removed some of the default python libraries and there are lots of rules about space and time quotas you have to adhere to. But even with those hurdles I found it to be a perfect fit for Cloudsafe because in the end I get a very responsive application with free SSL and built-in pycrypto support which are integral features in a security concious application like Cloudsafe.

Lest I be continually berated by my peers and security afficionados, I must admit that Cloudsafe is far from optimal as a secure backup solution. Even though I’ve built the site with SSL, AES encryption, XSS, SQL injection, cookie hijacking, and other security concerns in mind, the fact remains that it is a web application being developed by someone you likely don’t know or trust (me) and it is running on someone elses computers (Google’s).

I’ve tried to add features to make the site more palatable for wary users such as the default behavior which runs one backup and then instantly forgets user account details but if you’re like me (ironically) you will never trusty a 3rd party site with your passwords. Though I do trust it because I wrote it and because I’m not using it to backup anything I consider confidential. My online data (bookmarks, book collection, etc) is already public (you can find some of it showing up in my lifestream on the right side of this page) but the sites storing the data require account verification to access the backup functionality.

If people find Cloudsafe useful, I’d love to add backup support for additional sites that people use and build a desktop version that the more security concious users could use. So give Cloudsafe a try and drop me some feedback here or on the Cloudsafe Uservoice page on how I could improve it.