When discussing this year’s Mobile Photo Connect “Prospering in the emerging Photos at your Fingertips” theme, several photo app developers commented along the lines of “Wouldn’t it be nice if… companies such as Google or Facebook would share their machine learning and image recognition technologies with all of us?”
Few expected this was likely to ever happen, let alone within weeks after the conference. On November 9 Google announced its open source TensorFlow machine-learning framework, followed three weeks later by its Cloud Vision image recognition API – technologies at the heart of the much praised Google Photos service.
So what exactly are these technologies? How will they impact existing image recognition API developers? And how useful could they be for photo app developers who don’t have the resources to develop image recognition technology themselves?
Let’s start with a high-level description of TensorFlow and Cloud Vision.
TensorFlow is Google’s internal machine learning framework, which the company uses in consumer applications such as Google Photos and Google Translate. The framework is now also offered externally as open source software. (AI-based machine learning is at the heart of the tremendous improvements made in image recognition in the last few years. These systems use image training sets to automatically develop image classification algorithms, rather than developers needing to “manually” specify the infinitely complex criteria for how certain images should be classified.)
As an open source machine-learning framework, TensorFlow is certainly not unique. For instance, Torch and Caffe are both open source frameworks well-established in the AI community. But what makes TensorFlow conceptually appealing, according to Ramzi Rizk, CTO & co-founder of EyeEm, which uses Caffe, is TensorFlow’s presumed scalability and speed, coupled with Google’s vast resources that could be allocated to further develop the framework.
Although the first public version does not enable AI systems to learn from their training sets by using multiple computers, this will change in the future when TensorFlow should be able to run on anything from a single smartphone to implementations across thousands of computers in data centers, according to Google. Such distributed systems are vital for deep learning systems that analyze vast quantities of photos in order to build image recognition algorithms.
Google Cloud Vision API
Google Photos has received universal praise ever since it was introduced at the end of May, bringing image recognition features for the first time to the masses. Google’s Cloud Vision API now makes the underlying image recognition technology available to a broad range of companies, according to product manager, Ram Ramanathan, and software engineer, Emanuel Taropa.
“They [the potential API users] range from companies that manage large media catalogs to consumer and social technology companies.”
Cloud Vision offers the following features through a REST API:
- Label detection (object classification)
- OCR (detection and extraction of text within an image, supporting many languages)
- Face detection (detection of multiple faces within an image, along with the associated key facial attributes, such as emotional state, or wearing headwear. Note that Cloud Vision stays away from privacy-sensitive facerecognition functionality)
- Logo detection
- Landmark detection
- Explicit (adult, violent) content detection
Multiple features can be applied to the same image. For instance, a photo might recognize running shoes, the Nike logo on the box, the retailer’s text on the point of purchase display and the happy faces of the people eyeing the display.
The Cloud Vision API and the images can be called from any mobile or cloud platform. A subset of the Cloud Vision API is also provided as part of Google’s Mobile Vision APIs for the Android platform, according to Ramanathan and Taopa, and future releases will also be integrated with Google Cloud Storage.
For now, test versions of the API are free and the company has not yet announced pricing for when the API officially launches.
Perspectives on the Google Cloud Vision API
We spoke with several image recognition developers about their opinions of the Google Cloud Vision API and about the threat the API might pose to their businesses.
In general, although the jury is still out as not all functionality of Cloud Vision and TensorFlow are exposed at this time, the developers we spoke to see these Google initiatives as serious attempts that will enable a broader range of developers to implement image recognition solutions – they’re not just token attempts to impress the world with how open and altruistic Google has become.
Here are some of the comments:
Matt Zeiler, CEO of image recognition API vendor Clarifai, “We knew from the get-go that Google at some point would do this [provide their image recognition API]. It helps us, as it validates our market. Our opportunity is to offer special versions for specific markets, for instance, medical, and offer the level of customization and support a horizontal solution such as theirs can’t offer.”
Mary Tarczynski, CMO of Ditto, which provides image recognition technologies to identify brands’ presence in social media streams, sees Google’s announcement
both as a threat and an opportunity, “The differentiation is shifting. Just being able to know when and where their logos appear in social media is no longer enough for major consumer brands. For instance, we’ve started to offer ways to identify logos displayed on different products (such as on glass vs. plastic bottles) or in different settings (such as logos on a product that resides on a supermarket shelve). In the end, our big brand customers have very specific needs that require targeted image recognition solutions.”
For Guangda Li, CTO and co-founder of visual search API company ViSenze, specialized functionality is also the key for companies like his, as he articulated in an interview we published recently: “The fact that Google opened its image recognition technology through the recent launch of Cloud Vision API is a huge benefit for companies that want to build on top of it, and a huge disadvantage for other vendors that didn’t specialize into a specific vertical. […] Other vendors that offer image recognition APIs need to rethink their strategy now. If they were not focused on a vertical but on general image recognition, they will be out of the market soon.”
Ramzi Rizk, CTO & co-founder of photo community and marketplace EyeEm, echoes the sentiment expressed by the other developers quoted here in that a horizontal solution like Google’s can’t replace the type of specialized image recognition technology companies have developed to address specific use cases. In EyeEm’s case, this is enabling its users to easily discover photos they might want to purchase or to enjoy viewing because of their esthetic qualities. EyeEm uses a combination of object classifications, abstract classifications (such as happiness), color detection and photography-specific classifications (such as symmetry and negative space), which goes up and above the Google Cloud Vision API.
Could EyeEm at some point use part of the Cloud Vision API and integrate the complementary functionality with their own APIs? “Conceptually, yes. But at this point, it’s too early to tell, as we don’t yet know all the features, the terms and conditions, and have not yet been able to benchmark the API.”
As we have extensively described in our Photos at your Fingertips report, consumers are overwhelmed by the number of photos they store on their devices and in the cloud, and solving their challenges will have widespread implications for their photo taking, sharing or even printing behavior.
Consumers’ needs and use cases differ, and there is no one-size-fits-all solution for solving their photo organizing problems.
But Google’s Cloud Vision API goes a long way. While the ink is not yet dry as to the API’s eventual features, pricing, and terms and conditions, it’s not a stretch to assume that such a comprehensive API could be an attractive proposition forphoto app developers who have not yet developed or are not in a position to develop their own image recognition technology.
Those photo app developers who do have image recognition technology might benefit from adding a subset of the Google Cloud Vision API functionality to complement their own solutions. For them the eventual pricing, as well as the usage terms, will be crucial.
Image recognition API vendors need to be on high alert: the days of offering general image recognition solutions might soon be over, as Guangda Li of ViSenze pointedly mentioned. The key to surviving or even thriving is to focus on areas that require specialized image recognition functions or services.
What’s left is the nagging question of why Google is sharing what many consider to be its crown jewels: its powerful machine learning framework and core image recognition API. Do they see the API as a potential major revenue source? An indirect enabler of other revenue-generating platforms, such as cloud storage or the Android platform? Or future products such as image recognition-enabled robots or self-driving vehicles? Or is it just a proactive move against a competitor the likes of Microsoft, Amazon or Facebook?
When the Cloud Vision API features, pricing and usage terms are finalized in the coming months, we might get a better indication of their motivation.
Author: Hans Hartman
Hans Hartman is president of Suite 48 Analytics, the leading research and analysis firm for the mobile photography market and organizer of Mobile Visual 1st, a yearly industry conference about mobile photography.