Data analysis and reality

Recently I was watching a TED talk about the how Machine intelligence makes human morals more important.
And Zeynep has a point. The way we use the data science today is more-less a black box. People input the data, allow the machine learning algorithm to spit out the results and then just go with it, without even bothering to double check.

Zeynep presented the example where a black-box algorithm was predicting the likelihood of criminal re-offending, assigning the higher probability to a woman who did not do anything subsequently and releasing the criminal who ended up re-offending with the quite violent crime. Basically failing in its primary task, to make the society safer.
To me, as someone who did scientific research for a decade, it was painfully obvious that whoever made that particular black-box did not bother to base it on the facts, but used the personal biases instead.

The machine learning algorithm is as good as the premises you feed it. If you feed it initially with the false assumptions, it will produce the false results. And that is a simple truth.

This particular quality of any algorithm is leaving them wide open for any kind of covert manipulation. Whoever inputs the initial teaching data set, basically makes an algorithm find just the cases that match the premises behind that initial training data set.

The problem arises if your algorithm clashes with the reality. It is hard to make an algorithm that does not reflect a bias (cognitive or some other) of the persons who made it. That’s why scientists receive so much training on how to avoid that particular pothole. And it is hard work to go against your own beliefs, to open your mind to something that runs counter to all what you think it is correct. But when you make a machine learning algorithm that supposed to predict reality, you have to do that. Otherwise, the algorithm you make is worthless.

Sadly in today’s society, we are usually closed in our bubbles, in little spheres of a belief that confirms what we think, making us believe we know reality, forcing us to make algorithms that reflect our bubble, and not the reality itself.
So I would add, beside importance of our morals we have to question our own biases, we have to recognize that we have biases, and we have to fight to minimize their influence. Because, in the end, reality will not conform to what we believe but slam us with the surprise. In the talk, the surprise was the real criminal being released and making violent re-offending crime, crushing down all the prejudices and biases creators of that particular algorithm used, and ultimately making the society less safe, failing in its fundamental purpose.

So, if you are searching for a new data scientist who supposes to help your company/organization predict reality and give you the edge, make sure that person you hire is aware of the existence of biases and willing to question herself/himself every step of a way. Because only then you will end up with the data science product that actually matches reality and gives you the edge you’re seeking.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s