Google has recently launched its Cloud Vision API, making it easier for companies to build all sorts of applications using its image recognition. Even more, Google is offering it for free to begin with. This puts the whole image recognition industry into a different motion, and there are many other vendors that might get out of business soon.
To find out more about how the image recognition industry is impacted, we interviewed Guangda Li, the Co-Founder and CTO at ViSenze, a machine intelligence company that offers visual search and image recognition APIs to businesses.
Q: In your opinion, how is this news impacting the image recognition industry?
A: The fact that Google opened its image recognition technology through the recent launch of Cloud Vision API is a huge benefit for companies that want to build on top of it, and a huge disadvantage for other vendors that didn’t specialise into a specific vertical.
Q: So who will benefit from it?
A: Companies that can make use of image recognition and don’t want to invest in developing their own have more choices now. This open API includes useful features. And it’s interesting to see that Google opened their Optical Character
Recognition capacity too. This OCR is one of the key technologies that power Google Translate, as far as I know.
Q: And who might be out of business because of this?
A: Other vendors that offer image recognition APIs need to rethink their strategy now. If they were not focused on a vertical but on general image recognition, they will be out of the market soon.
ViSenze, for example, won’t be impacted as we have set higher barriers to entry through a focus on specific verticals, as well as more emphasis on visual search rather than image recognition.
Q: How advanced would you say the image recognition industry is nowadays?
A: Nowadays, image recognition is more like underlying technology that supports different applications across various industries.
That is because in terms of technology and pure R&D alone the gap is quite narrow and the accuracy reached a very high rate. In other words, it is quite mature now. I would say the maturity of image recognition at the moment is like that of speech recognition 2-3 years ago. There are still some bad cases here and there, but they are lesser and lesser.
Now it is more about who will quickly solve specific problems, real problems. This means combining domain knowledge with image recognition technology, hence the vertical focus is extremely important. And Google being Google will solve big, general problems, and most likely will not focus on any specific verticals.
Q: What advantages does Google have compared to other vendors in building its image recognition technology?
A: Google is really good at designing sophisticated large scale systems, so they can handle huge sets of data and iterate fast. And of course, they have resources to invest in pure technology development, while their bread & butter comes from other sources.
There are 3 main parameters that are important in developing this kind of technology:
- how you get your data source and how you tag it – and Google taps into huge data sets
- if you have large-scale data processing capacity – and Google is one of the best at this
- how well you can tackle domain expertise – but Google is not really zooming in on any specific vertical
Q: You mentioned ViSenze puts more emphasis on visual search rather than image recognition. What is the difference between the two?
A: Image recognition basically tells you, in words, what is in an image. It can identify objects, logos, face expressions etc. Visual search helps you find visually similar images. Or, like in our case, it can show you visually similar items as those identified in an image.
Visual search is more challenging to develop and relies even more on domain expertise to make sure the results are really relevant, not just technically accurate.
If we refer back to those main parameters, we can say we have the advantage of getting semi-structured data from customers in specific verticals, which is extremely valuable for training our models. Also, we use our domain knowledge to deeply tune our technology according to real users scenarios.
Q: Any closing remarks?
A: Again, any image recognition startup that hasn’t focused on a vertical will be out of the market soon. Google is a great tech provider when it comes to general image recognition, so the market will most probably be drawn to it.
In the West Hemisphere, companies like Microsoft and Google offer their internal deep learning framework as an open source, and productise their technology as APIs – think Microsoft’s Project Oxford and Google’s Cloud Vision API, all in order to promote their cloud servers. In the Eastern Hemisphere, Tencent also opened their face recognition API for free. And the trend will continue, as the whole industry should make progress in terms of applications built on top of visual technologies, and not just on R&D alone.
Author: Paul Melcher
Paul Melcher is the founder of Kaptur. He is an entrepreneur, advisor, consultant with a strong background in licensing, copyright, sales, marketing and technology with more than 20 years experience in developing world-renowned photo based companies with two successful exits. Named one of the “100 most influential people in photography” by American Photo magazine.