The fundamental goal of visual tech should be reducing friction on how we interact with the world. Success will be measured on how easily we can pass from one function to the other with minimum active input.

A few years ago, while working on the development of a Saas, the everlasting issue of international compliance came up. With hundreds of possible input lines, the necessary elements for customization, the issue was who will translate and how many languages should we support. While English is understood by most of the countries actively using the web, it is still seen as foreign to any countries outside Canada, USA, UK, New Zealand and Australia. Internal discussions raged on. This is where visuals come in: In any Olympic event, instead of translating every signalisation into thousands of different languages, the Olympic organizers manage to achieve the same result with a  set of logograms. To indicate directions to various parts of the Olympic Village, every post features a design understandable by every culture. If it can be done in real life, why not on the web?

90 % of humans can recognize and understand these logograms. So why use words ?
90 % of humans can recognize and understand these logograms. So why use words ?
A security barrier.

Visual Tech can carry this idea much further. Already  we have seen password protection scheme replaced by a series of movements on photographs, making them much harder to discover as they do not have the combination limitation of 4 digits or letters. At the same time, they are much easier to remember since they are visual ( and personalized).  Facial recognition based door bells  offer keyless entry system literally erasing any physical contact between what used to be an ageless physical relationship between user and interface. Identities, whether full facial or iris-based, have also been much better protected by visual tech than any combination of social security numbers, PIN numbers or “Secret questions”. And this is just the beginning.

Losing the key

Since the invention of the written language, relationships between content and its users have been text-based. Like needing a key to open a door, you need to use a text to access the information needed. Storing, finding and retrieving information has always been the sole domain of translating idea, concept or thoughts into a text format and then looking for a match. This transition step – thoughts to text- has forced us to alter, if not diminish, our thoughts in order to make it fit into the text highly regimented structure. That is about to change. As we analyze visual content and transform it into data, we are getting closer to the possibility to search and retrieve from visuals to visuals directly. In fact, Image matching technology does it already. Once fingerprinting an image, it can scour the internet to find similar fingerprints. The next leap will be when  is no longer matching pixel per pixel but rather content per content. For example, feed the search engine with an image of a ball, a tree and a beach and the system retrieves all images that have a tree, ball, beach, regardless of their position in the frame. Not far enough? add semantics  and the search engine will analyze the image for its meaning and find similar images with the same meaning, regardless of its content.

Image matching technology looks for similar patterns. The semantic will search for similar meaning
Image matching technology looks for similar patterns. The semantic version will search for similar meaning.

Because of the complexity of human understandings and the possibility to  escape the limitations of text, we will be able to formulate, just with data, meanings that we could never define with words. This will open a new world of understanding, relationships and thus communication that we could only dream about today.

A wordless world

What does it mean  in the practical world?  Well, for example, e-commerce would change: Instead of typing a search for a red long skirt and finding a bunch of matches in numerous retailer sites, a user will upload  a photo of a the dress they saw another person wearing. Not only the engine will quickly identify the color and length, but also the fabric, the flow,  the light reflection, the emotion delivered, the intended impact and as long as it has other past data on you, understand why you like that dress in that picture. The result will be an exact match of what you are looking for – not what you were looking at- that could even be a different color and length. Because with fashion, it is not only how you look but how you feel and what you want to express about yourself that matters. A semantic visual search will be able to deliver exactly that. No more disappointments.

The consequence on our everyday lives will be huge. Rather than relying on textual descriptions, we will be able to rely on emotional responses as well. From travel destinations to restaurants picking, from dating ( imagine comparing to sets of personal pictures to match people together, based on what and how they photograph their lives) to house hunting, breaking down the walls of textual description will lead to a universe of more accurate, deep level connections. And the more we will photograph our world, the better it will become at providing us with perfectly accurate results.

For visual tech to deliver on its promise there is a long way to go. First, similar to what with did with all the text knowledge, we will have to index every single visual taken. We will also need to create a new level of language that will allow photos to be connected  with other photos, without any text descriptions.  We will need to map relationships between every single piece of visual content and comprehend the reason why. It’s a huge undertaking that not even a Google can process today. However, it is not too early to take the first steps.

 

 

 

Photo by willc2

Author: Paul Melcher

Paul Melcher is a highly influential and visionary leader in visual tech, with 20+ years of experience in licensing, tech innovation, and entrepreneurship. He is the Managing Director of MelcherSystem and has held executive roles at Corbis, Stipple, and more. Melcher received a Digital Media Licensing Association Award and is a board member of Plus Coalition, Clippn, and Anthology, and has been named among the “100 most influential individuals in American photography”

1 Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.