My little scientific project is currently paused. The next step in the analysis is wavelet analysis, and I’m now reading existing Python manuals so that I can adjust that tool for my needs. This and some separate issues (cold) made my research last week be just learning.
So I decided to write today about something else. Yesterday I was listening to a TED talk about bias in algorithms. Speaker Joy Buolamwini talked about how the general algorithm for facial analysis has a built-in bias.
It cannot see her face.
And the reason is annoyingly simple. People who were training the algorithm used training data that does not correctly represent the population.
When I developed my machine learning algorithm for color detection based on the person skin undertone, I bumped into a similar problem. But at the time I thought there is some mysterious error in my portion of the code. Faces that software could not prepare for me, I did manually. There was not much of those, but every face that could not be processed was notably not white. But despite that, I still thought it is my error, and not that algorithm for facial recognition is not properly trained.
See, my whole project started as sort of rebellion against existing bias in a well-spread method for determining suitable fashionable colors. Over and over I would hear the method works the best for white people. For others, it is more of a miss than hit.
I did a bit of research, mostly because I was intrigued by blue people of Appalachian mountains, and learned that this undertone (warm/cold), which cosmetic industry uses to sell you proper makeup, is based on the very simple biological cause. An amount of particular enzyme in your blood, an enzyme that helps hemoglobin in its oxygen carrying function, the same enzyme also determines the undertone of your skin. The catch is, every human has this enzyme, and variation depends on genes not connected with racial determination. Thus, a properly trained algorithm should work for everyone.
And I did make an algorithm that was able to determine colors for anyone, despite the race. And it worked because values I used for evaluation of skin undertone fell in lovely linear regression pattern. The only spread came from extremely pale people who sent me photos of their very blushed skin. The rest of the people formed a beautiful, positively sloped, linear correlation between warm and cold components of the individual’s undertone. Race did not have any effect. Blushing did.
So I support Joy’s action. If you are developing machine learning algorithm that targets the whole human population, then use examples from the entire human population in correct proportions to get an exact algorithm. Otherwise, you’re just wasting your time. (BTW. Khan Academy has a lovely explanation how to sample populations. Check it out if you need it.)
If you wish to help the lovely lady in her fight for more correct algorithms, go here.