At the ongoing F8, Facebook has detailed how it has been taking billions of photos shot by users on its photo-sharing platform Instagram to train its own image recognition models.
According to a report by TechCrunch Facebook has been sifting through millions of photos along with their hashtags and is now using that data to train its sophisticated artificial intelligence deep learning models.
Hashtags may seem to be what makes the task easier, but surprisingly, it was a challenge for Facebook to go through and sort out those tags to understand billions of images. This is more so because there is no real logic when people use them on the service.
With the largest tests used about 3.5 billion Instagram images, they also came with about 17,000 hashtags.
With so many hashtags, Facebook had to come up with a system to clean up what users had submitted and to do the same at scale.
So instead of starting off with the images, Facebook had to work on the hashtags with the “pre-training” research that is focused on developing systems to find out and pick the commonly used ones.
The next step was to find the more specific hashtags over the more commonly used ones, which is what the group called the “large-scale hashtag prediction model”.
Obviously privacy is an issue. So Facebook only used what amounts to public data and steered clear of private accounts.
Indeed, what Facebook was looking at was not personal photos, but objects for image recognition. More like identifying between two dog breeds, plans, food and more instead of identifying between two human beings.
With that said accuracy of this data was not an important factor. What is impressive is how the pre-training processes were used to clear out the noise and make the billions of images more useful in order to be used as training data. Indeed, the same data can be used in more ways than one, but in the case of Facebook the most effective use of it according to the report, is to combat abuse on the platform.