Content recognition in images has made great thrives over the last few years and shows no signs of slowing down. In fact, as more graduate students pour out of prestigious schools across the world, the more the field is experiencing healthy competition. To add fuel, major companies like Yahoo, Google, Apple have thrown multi million acquisition dollars at pre revenue startups, greatly activating VC’s potential to fund new ones. However, at the same time as the scope of research increases, a couple of hurdles might kill the start-ups before they have a chance to mature.
The first public applications of automated image classification via deep learning content recognition engines seem to be hitting a painful drawback. Flickr, who just launched its magic search with auto-tagging last week, might have to take a step back. A flurry of unhappy users have taken to forums and social media to complain that the auto tagging feature was an entrenchment to their freedom to tag as well as painfully inaccurate.
Flickr auto tagging tool comes from both a very selfish and unselfish motivation. It wants to help users who have little time/desire to tag their images to be on the same level as those who spend countless hours adding keywords. It also wants to make the 11 billion images it stores to be fully indexed and searchable in preparation for its announced licensing tool. However, what it didn’t expect is that many users do not want help. In fact, they feel it is disrespectful of their undeclared right to control their tags. To add insult to injury, the auto tagging feature is carelessly inaccurate, or, at least, seems to be. One of the biggest misconception about auto tagging today is that it is capable to perfectly recognize content 100% of the time. It is not. Nor are human beings by the way. And while accurate at 90 to 95% , those 5% mistakes ( or false positives) are the only result people notice. The impression is, even if the system makes 1 mistake for every 1,000 images, that it just doesn’t work, because people notice errors and not successes. As of today, Flickr is considering making the feature as an opt-in option only.
Yahoo’s photo sharing site is not the only one. Pond5, a stock photo/video platform, recently announced as similar feature to help its contributors keyword their images and the result was the same. A flurry of forum complains on its inaccuracy and invasion of “tagging rights”. Apparently most people are not ready to surrender their privilege to machines, especially if they feel they are not doing a better job than they are. And tagging can be a very subjective exercise, especially in the concept category ( i.e : fresh, happy, young, ….). Maybe the solution here for these companies is categorizing the images automatically for indexing but not make the results public. Like ghost keywording, use only for search results.
Enter the giants
But this is not the sole hurdle the technology is facing : Microsoft’s very quiet launch of Project Oxford, offering free vision API’s and SDK to any developer making a request is certainly a kick in the foundation of independent companies in the same space. Project Oxford is a series of REST API’s giving access to advanced image AI via the Microsoft Azure cloud. Given the company deep involvement in image processing and content recognition, there is a real potential for creating a strong offering. While the beta is free (with limitations), it will probably grow to a reasonably priced solution. Why ? Because one of the key engine of deep-learning content recognition is data. Massive amounts of data. The more images a system can use to train its algorithm, the more accurate it becomes. By offering practically free access to its engine, Microsoft can get extremely valuable training grounds for its research and development. And it doesn’t care if it looses money in the process. For small start-ups in this space, it could be a killer. Considering that Google ( a leader in “free” offering), Amazon, Yahoo will probably quickly follow, there will not be much left for the little guys to chew on.
Maybe not the end but close enough. After all, Google’s free Analytics did not kill the web traffic analytics market but certainly made it much harder for any new company to enter the space and remain in it successfully. Start-ups in the emerging content recognition field will have to quickly adapt their strategy and deliver more industry specific solutions as well as additional level of customized services in order to succeed. Not impossible, but probably not what they had in mind when they first launched.
Between user push back and big companies to make it free, deep learning automated image categorization start-ups might quickly grow from a fast growing segment to a desolated graveyard. However, and similar to other young offerings, there are plenty of added-service layers that they can implement in order to succeed. As visual content continue to explode, there is certainly a lot of opportunities.
Photo by dhammza
Author: Paul Melcher
Paul Melcher is the founder of Kaptur and Managing Director of Melcher System, a consultancy for visual technology firms. He is an entrepreneur, advisor, and consultant with a rich background in visual tech, content licensing, business strategy, and technology with more than 20 years experience in developing world-renowned photo-based companies with already two successful exits.