An unbiased data set is critical for training AI algorithms


Approximate Reading Time: 2 minutes

The scientists at Massachusetts Institute of Technology (MIT) presented on 1 April 2018, Norman, the world’s first psychopath AI.


Since 2016, MIT has presented the Nightmare Machine which was an AI that generated scary imagery. It then also developed Shelly, the world’s first collaborative AI horror writer and finally the AI-powered Deep Empathy.

The name Norman, we assume might be taken from the most celebrated psychological horror film, directed by Alfred Hitchcock, Psycho inspired from the character of a disturbed motel manager, Norman Bates.

MIT claims that Norman was developed to prove that the data set used to train a machine learning algorithm can significantly influence its behaviour or decision making capabilities.

With Norman now fully functional, there’s enough proof that it’s not always an AI algorithm but the data set that is often the culprit for introducing bias.

As per the research website, Norman (AI) was given extensive exposure to the darkest and the goriest things from Reddit. Post this Norman was trained to generate captions for images.

“We trained Norman on image captions from an infamous subreddit (the name is redacted due to its graphic content) that is dedicated to document and observe the disturbing reality of death,” say the makers of Norman from MIT.

Keeping ethical concerns bias was only introduced by showing image captions from the subreddit. These were then matched with randomly generated inkblots (instead of images of real people dying).

Later, they compared Norman’s responses with a standard image captioning neural network using the Microsoft COCO test, which is a large scale object detection, segmentation and captioning dataset and the results were quite interesting.

More comparisons can be checked out on the MIT page.


As expect the biased dataset heavily influenced Norman’s understanding of random inkblots. What the standard network perceived images like normal human beings would, Norman would come up with some rather scary responses to the same images, thus giving enough evidence that its not the algorithm but the data set that introduces bias and unfairness, indicating that those dealing with AI need to be very careful and picky about the dataset being used the train the same.


You may also like...

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

error: Unauthorized Content Copy Is Not Allowed !!
%d bloggers like this: