IBM, a company usually at the forefront of AI research and development is now in the news for the wrong reason. A report has found that the American IT giant has been using around one million images from Flickr to train its facial recognition AI, without the permission of the people in the photos.
In a bid to train AI algorithms into recognising people of colour better, IBM in January revealed its new “Diversity in Faces” dataset. To create this dataset of faces, IBM drew upon a huge collection of around 100 million Creative Commons-licensed images released by Flickr’s former owner, Yahoo, for research purposes.
While CC image databases are often used for research purposes, taking into account the potential uses of facial recognition technology, many people might not want their faces used for this particular kind of face training. Especially, if it involved people being marked out by gender or race.
According to a report NBC News, the publication was able to view the collection where it was found that all the images were annotated according to various estimates like age and gender, as well as physical details — skin tone, size and shape of facial features, and pose.
The publication also created a handy little tool if you want to check whether you’re included by simply dropping your username.
But while IBM was using Creative Commons licensed images, the company hadn’t actually informed those whose faces appear in the almost one million images what their actual faces, not just images, were being used for.
IBM could argue that the image subjects have given permission for their photos to be used under the Creative Commons license, what they do not have explicit consent for is that these faces be used for training AI facial recognition algorithms.
Responding to the allegation of wrongful usage, IBM in a statement to The Verge said, “We take the privacy of individuals very seriously and have taken great care to comply with privacy principles.” It noted that the dataset could only be accessed by verified researchers and only included images that were publicly available. It added, “Individuals can opt-out of this dataset.”