First of all, let us play with the Google Vision API and test it out:
Brave new world of photography
If you upload your photos to the Google Cloud API; you instantly get all this feedback on your photos.
Let me walk you through this.
First of all, let us start with this picture.
What Google’s visual-algorithm will do is automatically this:
1. Put little dots around the face of your subject. Little green dots on ‘significant’ facial detector features.
- Edges of eyes
- Nose / Nostril holes
To better understand how this works, check out ‘Landmarker.io‘
2. Google’s Visual API will put ‘labels’ on your photo.
For example in this picture it includes:
- Facial expression
Then Google will search the Web for related terms:
It discovered things like:
- It was shot on a Pentax 645Z (digital medium format camera)
- It is ‘street photography’
4. Text recognition
Now this was super surprising to me … Google was able to pick up this random text in the background that said ‘your door.’
I didn’t even notice it when I shot the picture. And this is why this is interesting:
- The text ‘your door’ is actually cut-off. Yet, Google’s machine learning vision was able to accurately transcribe it.
Practical idea: Perhaps Google’s AI can help us find distracting text in the background of our photos. Lesson: Don’t have distracting text in the background.
5. JSON Analysis
Then Google is able to use this language called ‘JSON’ to make a text file of all the annotation data:
To me this is interesting because it is a new way to see pictures (from the perspective of a machine).
Some practical ideas:
1. Google AI and machine learning can help us better categorize emotions (like laughter/joy):
Google’s API works hard to categorize human emotions, based on facial recognition.
Some examples on laughter/joy photos I uploaded to Google’s Cloud Vision:
Or you can even show “sorrow” in your pictures:
Some interesting ideas:
- Can AI/machine learning help us better identify emotions in other humans? What are the limits of this– because obviously there are some emotions which are much more subtle in human-beings.
- There are some emotions which do not fit into these small categories. For example, human emotions are more complex than we think.
For example look at this ’emotion recognition’ chart. I bet you 100 years from now, we will create thousands of new sub-categories of human emotions:
There are lots of muscles in the human face — but some philosophical issues:
- Not everyone expresses emotions the same way via muscles in their face.
- Some of us might feel disgusted, angry, or frustrated– yet have more muscle control to *NOT* show these expressions in our face.
- Sometimes we “mis-read” facial expressions/emotions.
For example as a human being, I can tell these facial expressions are “fake”:
But these are some “real” facial expressions I experienced when I saw the Alphago documentary:
Lesson: There will always be a limit to how much machines, AI, and machine-learning will be able to categorize human emotions.
2. Palette Generation
Google’s Cloud vision is very good at generating color palettes from photos:
Now this is good for these reasons:
- We can better understand the color-psychology of our pictures. For example, certain colors evoke certain emotions (red-warm photos evoke passion, cold-blue colors evoke peace and calmness). [Color Theory]
- Perhaps in the future, we can have an AI camera app, that (in real-time) shows you the dominant colors in your photos *while* you’re shooting!
- We can use machine learning to better understand the colors and palettes of famous artists in the past. For example, see the ‘artistic style transfer‘ tutorial to see how you can easily transfer the aesthetic style of famous artists to any picture.
3. Ethical questions
Google has a category of certain images which they call ‘racy’. The word means ‘risqué‘ — or:
The potential risk that the photograph might upset/offend someone (generally sexual).
But if we look at the picture above, it is ‘likely’ to be ‘RACY’. But… why? Perhaps because the woman’s leggings are tight-fitting?
Or a picture of a woman in pink tights. Google’s cloud vision knows that it is ‘thigh’, ‘tights’, ‘girl’, ‘human leg’:
And it has a ‘likely’ rating for racy. This is pretty obvious, because seeing a clear separation/edges of a woman’s posterior is seen as “sexual”:
How does the machine Vision API “know” it might be a racy-sexual photo? Well below is the JSON data from Google — for ‘visually similar images’:
These are some of the images that Google identified:
Anyways the difficult questions:
- Google obviously needs to attempt to ‘police’ the web with their ‘SafeSearch’ algorithm. However, at what point is something ‘risqué’, sexual, ‘inappropriate’, or “pornography?” How much nudity/skin showing is “too much”?
- Obvious chance of Google (or any big technology company) to censor certain images– like on Facebook, Instagram, etc.
We’re still in a brave new world of photography. To learn more about all this on your own, here are some links I recommend you to check out:
- Google Cloud Vision
- Landmarker.io (to put yellow dots on your photos of faces)
- Object Localization and Detection: AI free open class
- Human Mesh Recovery: Learning Human Poses and Shapes
- Convolutional Pose YouTube Videos Playlist >
- Oxford: Free image annotation tool // also see their free Wikimedia demo
- Shutter Stock: Image composition Tool
- UC Berkeley: Everybody Dance Now Research / [YouTube] [PDF]
- DeepLabCut (tracking joints) / [Github]
- NVIDIA: AI De-Noiser / [YouTube]
Brave new world of photography: