Hi marvelous people,
I’m back and fully functioning. I had to take a leave to deal with a health scare that ended up in the best possible way. So now, I’m back, and I’m Bad, within pre-set parameters. (Yeah, I do love Red Dwarf TV Show.)
In my previous blog post, I mentioned that I decided to work on the machine learning algorithm capable of determining a ‘color season’ for a person.
And I managed to make an algorithm, based on Random Forest algorithm, that is capable of predicting appropriate fabric/makeup colors for a skin regardless of the race.
Existing color testing methods, that used trained stylists suffered complaints that this method of color determination is race dependent. So to mitigate that complaint, I formed my dataset from human faces of broad ethical/racial background, encompassing a broad range of the skin tones.
Here you can see obvious linear dependence between red and blue color channels of my data set.
There is a medical explanation behind this. I bumped into it a few years ago when I found a curious article about the Blue people from Appalachian region. At the beginning of the last century in Appalachian region, there was a family that had literally blue skin, blue like Smurfs.
Medically, the condition was caused by lack of certain blood enzyme responsible for helping hemoglobin bound with oxygen. So when I started thinking about the what might be behind ‘undertones’ of human skin, I remembered that article and that condition and asked myself can it be that so-call ‘cold,’ ‘neutral’, and ‘warm’ undertones are caused by the same enzyme. My idea was, depending how much enzyme person has, his/her skin will fall into a cold or a warm category, ultimately causing red and blue channels of the skin colors to show some kind of linear dependency.
The same linear dependency you can see in the graph above. Of course, I did not perform a medical analysis. It was bad enough asking people to give me selfies without makeup taken in a diffuse natural light. Asking for a blood sample would be simply too much, no one would volunteer.
Anyway, my next step was to see what can I get from my dataset. An initial look at the dataset did not show any linear dependency between colors of the skin and test colors.
Of course, it would not. It is never that easy.
So, I decided to try Random Forest. And that worked. Predictive algorithm showed a score of 0.7.
An impressive result concerning that I had only 12 faces to work with. I used 10 for the training set, 1 for the testing set, and one for making sure there is no overfitting of the results.
Both tests were respectively, 0.7 and 0.667. I’m happy. This actually demonstrates the potential of my algorithm. Mainly because I expect accuracy to increase with the larger data set.
I would like to add a small explanation of how I formed training and testing sets for my algorithm.
I have 30 different data points for each face, totaling 360 rows in the table. However, I could not use random separation of those points for training and test dataset. Although there 30 different data points for each face, I had to treat my dataset as there is only one data point for each face, meaning instead of 360, I had only 12 data points.
This rational comes from population biology, where various data collected about one animal still represent just one animal. And since my test subjects are humans, I had to apply the same principle. The algorithm is supposed to determine agreement with all testing colors for a human face, not a just agreement of one color with a human face. Meaning I have only 1 data point per face, not 30. That’s why I picked last two faces in my dataset to form 2 test data sets, while 10 first faces were used as training data set.
So yeah, in today’s trends of the big data, my dataset looks pathetic. But, my goal was to test principle and play a bit with various aspects of image analysis. 12 data points served their purpose. And I can say it is possible to develop a predictive algorithm that can match fabric/makeup colors to the skin color regardless of the race/skin tone. (Actually, the only off point (128,140) you can see on fig.1., belongs to a blushed UK citizen.)
You can see python notebook explaining the analysis at GitHub.