Google's street view API was used with an Image recognition API [Camfind] to automatically detect features on a street in NYC. This approach reduces the barriers to access advanced Machine Learning techniques and can scale to be used for other recognition assignments (Eg: identifying and counting certain businesses from an image)
Google Street view is great! I think its one of those Google products that showed the original essence of Google which was to "organize the world's information and make it universally accessible and useful". What they achieved with Street view ushered in a mass awareness of Big Data as it made the public aware of the possibilities if data was gathered, integrated and presented at a scale such as this. It made the world smaller and much larger at the same time.
During the Fall semester at CUSP this past year, while exploring the Maps API, I stumbled across the Street View API that allows for querying the service and getting back a picture of any street. The kicker was that the query URL could include a simple address such as 300 Cadman Plaza West, Brooklyn.
A sample query string such as http://maps.googleapis.com/maps/api/streetview?size=1024x768&pitch=30&fov=120&heading=-100&location=300+Cadman+Plaza+West+Brooklyn returns this:
The URL has a bunch of nifty parameters that can be changed to change the view of the street.
The same URL reformatted to show the various parameters. A description of all these params can be found here - https://developers.google.com/maps/documentation/streetview/
So you can change the heading to change perspective. The image above had heading set to -100. Changing it to -150 gives us this
In thinking of deliberately applying some urban think, I wrote a script that toured my street using Street view images. NYC's grid system makes that easy. A naive example is querying all odd or even addresses from 100 East 80th Street to 500 East 80th Street (even addresses for North, odd for South). The end result yielded the entire West-to-East stretch side of East 80th Street. I ended up collecting >10,000 images for each street (a lot of them were duplicates). There was also some niftyness to work with Google on its throttling policy.
Around the same time we were learning on how to use Neural networks for image/face detection at CUSP and that made me want to apply some recognition on top of what I had already done.
I came across the Google Street View Gentrification Observations Project at Harvard (http://scholar.harvard.edu/jackelynhwang/projects/ggo) that uses Google Street view to identify gentrification patterns in a neighborhood. At first I had (naively) assumed that they were able to look at people's faces and classify skin pigmentation and come up with a gentrification metric of sorts (which would be amaze!). The research however uses Street View as a resource for a more qualitative study of a neighbourhood by looking at changes in buildings from a particular street over time. (Street view has a feature where you can back in time and look at the same street "view" across many years dating back to 2007).
I decided to apply some form of "machine learning" on this ability to tour using Google Street view. After many hours of trial and error which included using Neural Networks (pybrain) to identify fire hydrants as well as using Wolfram Alpha's keypoint detection feature, I came across Camfind's API - https://www.mashape.com/imagesearcher/camfind which is currently one of the best and easiest to use, web-based, image recognition service out there.
Camfind's API is a freemium service. They do offer 500 detections / month for free but after that it gets pretty expensive.
The basic concept is that you send camfind an image and it responds with what it thinks is in the picture. (You can try out their mobile app - its pretty accurate!)
I then tweaked my script to take a streetview picture for each address (say 100 East 80th Street) with the parameters set in manner that it is pointing in the general direction of where I would expect to find a fire hydrant. I then send this picture to CAMFIND and wait for a response. The point is to be able to automatically do this and get back "Hydrant" when a street view image contains an actual fire hydrant. At this point I think its better to just show the final result in action :) (See video below)
The final result is a script that loops from a start address to an end address and sends each image to Camfind which does the recognition work and sends back a text response of all the identified objects in the picture. In this case a street view image has a fire hydrant in it, it is sent to Camfind which returns back the word "Hydrant".
This experiment partially validates my intuition about Machine Learning being consumed as a service to potentially solve complex urban problems. Although it was fun to learn about the intricacies of Neural Networks, their "hidden layers" and weights, I find it more useful to consume Machine Learning in this manner that allows for more accessible problem solving.
At the time of writing this post, one of our classmates (Thanks Juan) pointed us to the Google Prediction API - https://cloud.google.com/prediction/docs and how Google is also heading in this same general trend. With respect to fast image recognition, Google is lagging behind. Google Goggles appears to be the closest solution but is not close to Camfind's current accuracy / ease of use.
Please do comment with feedback / suggestions / further ideas on how this could be improved upon. Will share scripts once Github is in place.