Let’s face it. Photos are dumb. Without context they don’t reveal much. While they can be really good at evoking emotions, they are really bad at explaining their content. In fact, without a viewer, an image explains nothing.
Let me explain. When confronted with a photograph, we are only able to recognize the items in it because of our memory. We have seen them before, thus we know what they are. Those unfamiliar to us remain unknown. A photo explains only if you already know the meaning. Not very useful.
We currently rely on titles, captions, keywords and hashtags to understand what we are looking at and discover new things or places. Without those, we are left wondering. Without hashtags, Instagram would be a desolate landscape of lonely pics and Flickr, without tags ( the proto-hastags)and captions, would be a massive collection of incomprehensible pictures.
The text that we associated with photos is also critical in finding photos. Currently, unless an image looks like another one ( similar), we cannot find it, even if it’s the same object or taken at the same location but different angle. In other words, without the metadata, the data is useless.
Cracking the code
Object recognition is the key to unlocking photo’s content and ultimately make them smart, useful and valuable. No wonder then that companies like Google, Yahoo, Facebook and others are hard at work trying to crack the code. But it’s not easy. While it takes human babies 9 months to recognize objects in pictures, computers have an extremely hard time to even recognize the same object in two different pictures.
Yahoo’s recent release of the Flickr app incorporating object recognition is the first step in what will soon become a very common feature. For now, Flickr uses its recognition engine to automatically add tags to images, making them even more easily retrievable in a search. However, companies like Image Searcher with their CamFind app already push the usefulness further by not only recognizing objects and telling you what they are, but also finding places where you can buy it, or telling you how it is pronounced in a foreign language. They also have a variant to help blind people hear what the camera sees.
Camfind cannot yet offer to recognize an object in any image other than the one you just took and Flickr cannot tell you anything more than what the object is. The real maturity of object recognition will be in the conversion of the two: Any image, whenever and by whomever it was taken will not only tell you what it is depicting but also everything about its content. Automatically and persistently.
The Holy Grail
Imagine taking any image and not only finding out what is in the image but where to find it. The ultimate consumer tool and brand’s wildest dream. Consumers would be able to quickly shop for items they see and brands would no longer need to rely so much on advertising since every photograph taken could become a storefront. The potential is massive.
Furthermore, your fridge would then be able to know exactly what brand of yogurt you are about to run out of and reorder them, without you having to tell them anything. Self driving cars could recognize any object in their path and react accordingly. Large image databases could self classify based on content without ever needing to enter one keyword, tag, hashtag, description, gps code – in other words, no more metadata.
Look who’s peeking
We are not there yet, obviously. Without 100% accuracy, the technology is just another spamming device, frustratingly throwing you off course. Algorithms, while close, can be fooled by a simple change in lighting or an obstruction. But companies with deep pockets, like Google or Amazon, are actively working on it. Google could offer an adwords for photos while Amazon can drive traffic to any product seen in a picture to its store. It is quietly testing it, in fact, with a beta version called the Amazon Publisher Studio, allowing for the automated tagging of images with links back to Amazon. No word yet on its success.
Object recognition will also enable computer to computer communications via photographs, allowing for extensive reach and accurate interaction between logic boards across the world. If a picture is worth a thousand words, the possibilities offered by extracting information automatically from photographs are exponential.
Finally, object recognition will also open the doors to the possible monetization of UGC. Companies built around photography posting and sharing, like Instagram, Flickr, Tumblr, Pinterest, Facebook, Twitter, Imgur would be able to finally extract value from each image they host. Needless to emphasise how huge this new market would be.
For now, each is secretly working on their secret sauce behind closed doors while we catch glimpses of their progress. However, we can expect a flurry of features using object recognition as they put their research to the test. Yahoo is the first one, let’s see who comes next.
Author: Paul Melcher
Paul Melcher is the founder of Kaptur and Managing Director of Melcher System, a consultancy for visual technology firms. He is an entrepreneur, advisor, and consultant with a rich background in visual tech, content licensing, business strategy, and technology with more than 20 years experience in developing world-renowned photo-based companies with already two successful exits.