12. March 2021
A major problem for training machine learning models for image recognition is the availability of large amounts of visual data that is compliant with data privacy regulations. Our Deep Natural Anonymization (DNAT) automatically anonymizes personally identifiable information in image and video data while keeping relevant visual information and context. This analysis shows that Deep Natural Anonymization has no significant impact on the training of machine learning models compared to using the original images. It is a valuable tool to protect identities when working with image data to train machine learning models.
Sample image from Cityscapes dataset after processed by Deep Natural Anonymization.
How do you evaluate the impact of DNAT on machine learning?
We aimed to use both unmodified data and anonymized data to understand the differences in model accuracy. Keeping the hyperparameters the same for both training paths enables us to say that the differences, if any, are related to the differences between the unmodified and anonymized data.
We chose a standardized publicly available dataset called Cityscapes. It contains images of street scenes recorded from various locations, in different weather conditions, and spanning different dates and times. We used brighter AI’s DNAT to create an anonymized copy of the entire Cityscapes dataset.
We selected a detection and instance segmentation approach called Mask R-CNN for our experiment, most notably due to its applicability to our dataset and its state-of-the-art performance across multiple public benchmarks.
What are the results of your analysis?
Through experiments, we conclude that brighter AI’s DNAT does not have any significant impact on the accuracy of training a state-of-the-art machine learning model named Mask R-CNN on the public Cityscapes dataset. We show that the difference in the mean average precision (mAP) between training such a model on original versus anonymized data is negligible.