I’m a nerd and a female. Thanks to the society pressures, it is not surprising to talk about the fashion in female company. I got intrigued by this idea of having colors that complement your beauty.
The moment I bumped into the rule of matching colors it was evident to me what is behind it, an illusion created by the imperfection of our own eyes. We tend to see a color as a different color, depending on the background.
In this image, two orange circles are the same color, but we see them as different colors. The same principle is behind the idea of making you look beautiful with the colors. Certain colors will work with your skin tone and make you look rested and with minimal skin imperfections.
Of course, I wanted to develop the machine learning algorithm to do this. Seems easy, isn’t? All I need is simple binary clarification for the test colors from the already established theory behind the whole shebang.
Yeah, it is not so easy.
There are 17 test colors used to determine what color group of is suitable for you. This means that if I wish to match RGB values of skin to RGB values of 17 different colors I have a puzzle for multinomial logistic regression. In itself, this is not a big issue.
If you have enough data.
At this point in time, I do not. So I had to work around it, and make an analysis of all color groups to see can I develop a continuous variable to represent a testing colors.
I used the data containing 135 individual colors and tried to see will clustering algorithm, Kmeans, match them to their predetermined 12 groups and put testing colors close to the cluster centers.
Well, that did not happen. And I have to say, I was not surprised. The original method, developed for manual testing, was capturing the imperfection of the human eye, not the nature of the colors. So, of course, the colors will not cluster around testing colors.
A different approach was needed.
So instead, I analyzed colors using the Kmeans algorithm, ending up with 28 distinct clusters. Basically, 28 different testing colors that might be used to develop a continuous variable for my machine learning algorithm.
Machine Learning Algorithm
I managed to make an algorithm, based on Random Forest algorithm, that is capable of predicting appropriate fabric/makeup colors for a skin regardless of the race.
Existing color testing methods, that used trained stylists suffered complaints that this method of color determination is race dependent. So to mitigate that complaint, I formed my dataset from human faces of broad ethical/racial background, encompassing a broad range of the skin tones.
Here you can see obvious linear dependence between red and blue color channels of my data set.
There is a medical explanation behind this. I bumped into it a few years ago when I found a curious article about the Blue people from Appalachian region. At the beginning of the last century in Appalachian region, there was a family that had literally blue skin, blue like Smurfs.
Medically, the condition was caused by lack of certain blood enzyme responsible for helping hemoglobin bound with oxygen. So when I started thinking about the what might be behind ‘undertones’ of human skin, I remembered that article and that condition and asked myself can it be that so-call ‘cold,’ ‘neutral’, and ‘warm’ undertones are caused by the same enzyme. My idea was, depending how much enzyme person has, his/her skin will fall into a cold or a warm category, ultimately causing red and blue channels of the skin colors to show some kind of linear dependency.
The same linear dependency you can see in the graph above. Of course, I did not perform a medical analysis. It was bad enough asking people to give me selfies without makeup taken in a diffuse natural light. Asking for a blood sample would be simply too much, no one would volunteer.
Anyway, my next step was to see what can I get from my dataset. An initial look at the dataset did not show any linear dependency between colors of the skin and test colors.
Of course, it would not. It is never that easy.
So, I decided to try Random Forest. And that worked. Predictive algorithm showed a score of 0.7.
An impressive result concerning that I had only 12 faces to work with. I used 10 for the training set, 1 for the testing set, and one for making sure there is no overfitting of the results.
Both tests were respectively, 0.7 and 0.667. I’m happy. This actually demonstrates the potential of my algorithm. Mainly because I expect accuracy to increase with the larger data set.
I would like to add a small explanation of how I formed training and testing sets for my algorithm.
I have 30 different data points for each face, totaling 360 rows in the table. However, I could not use random separation of those points for training and test dataset. Although there 30 different data points for each face, I had to treat my dataset as there is only one data point for each face, meaning instead of 360, I had only 12 data points.
This rational comes from population biology, where various data collected about one animal still represent just one animal. And since my test subjects are humans, I had to apply the same principle. The algorithm is supposed to determine agreement with all testing colors for a human face, not a just agreement of one color with a human face. Meaning I have only 1 data point per face, not 30. That’s why I picked last two faces in my dataset to form 2 test data sets, while 10 first faces were used as training data set.
So yeah, in today’s trends of the big data, my dataset looks pathetic. But, my goal was to test principle and play a bit with various aspects of image analysis. 12 data points served their purpose. And I can say it is possible to develop a predictive algorithm that can match fabric/makeup colors to the skin color regardless of the race/skin tone. (Actually, the only off point (128,140) you can see on fig.1., belongs to a blushed UK citizen.)
You can see python notebook explaining the analysis at GitHub.