12. März 2021
A major problem for training machine learning models for image recognition is the availability of large amounts of visual data that is compliant with data privacy regulations. Our Deep Natural Anonymization Technology automatically anonymizes personally identifiable information in image and video data while keeping relevant visual information and context. This analysis shows that Deep Natural Anonymization has no significant impact on the training of machine learning models compared to using the original images. It is therefore a valuable tool to protect identities when working with image data for the training of machine learning models.
Sample image from Cityscapes dataset after processing it with Deep Natural Anonymization.
How do you evaluate the impact of DNAT on machine learning?
Our aim is to use both the original data and the anonymized data and understand the differences in model accuracy between the two training paths. Keeping the hyperparameters same for both training paths enables us to say that the differences, if any, are related to the differences between the original and anonymized data.
We choose a standardized publicly available dataset called Cityscapes. It contains images of street scenes recorded from a varied range of locations, in different weather conditions and spanning different dates and times. We use brighter AI’s DNAT to create an anonymized copy of the entire Cityscapes dataset.
We select a detection and instance segmentation approach called Mask R-CNN for our experiment, most notably due to its applicability to our dataset and its state-of-the-art performance across multiple public benchmarks.
What are the results of your analysis?
Through experiments, we demonstrate that brighter AI’s DNAT does not have any significant impact on the accuracy of training a state-of-the-art machine learning model named Mask R-CNN on the public Cityscapes dataset. We show that the difference of the mean average precision (mAP) between training such a model on original versus anonymized data is negligible.