10. August 2022
A recent lawsuit again raised the public’s concern about privacy risks related to autonomous driving.
In July 2022, the Federation of German Consumer Organizations (VZBV) filed a lawsuit against Tesla. One of the accusations Tesla faces is particularly targeted at its sentry mode. Under the sentry mode, the attached cameras on parked Tesla vehicles constantly record their surroundings to protect the vehicle from theft and vandalism. The recordings contain personally identifiable information (PII) of pedestrians and other vehicles passing by without their consent.
Under the GDPR, turning on a camera in public is only allowed for very limited and highly regulated cases (e.g. Crime hotspots by the Police only and verified by e.g. the Police Tasks Act in Bavaria (Art. 32 Polizeiaufgabengesetz)). For many companies in industries where publicly collected data is needed for product development and advancement, the question is, can data be legally collected in the real uncontrolled environment at all? Especially for autonomous vehicle development, such data is especially essential.
Autonomous driving has always been associated with data privacy issues. Manufacturers are continuously asked to answer rising privacy concerns regarding their advanced driver assistance systems (ADAS) and autonomous vehicles (AV). The lawsuit against Tesla again raised attention to a field of relatively neglected privacy concerns: how to protect privacy data of passing pedestrians and vehicles during ADAS and AV development and testing? Especially considering such data is essential to the development and testing of ADAS and AV.
The Importance of Video Data for ADAS and AV Development
Both ADAS and AV are designed to reduce the number of traffic accidents by minimizing human errors during driving. The operation and decision-making of both systems are based on the information they gather from the surrounding environment. In order to ensure driving safety, both ADAS and AV include several essential safety-critical applications, such as Automatic Emergency Braking (AEB), Vulnerable road user dection, intentenion prediction, etc. The research and development of these models require an enormous amount of data collected from real-world traffic by on-board sensors, radar, cameras, GPS and lidars. The data will be used to train the model to opt for the safest option to prevent accidents.
Compared to data collected from simulated/controlled environments, real-life traffic data offer more diverse situations where many uncontrollable and unpredictable situations can occur. For example, a child running across the street chasing a ball, or a dog coming in front of the car out of nowhere. These unexpected but possible scenarios are referred to as edge cases. However, the number of these cases accumulates to the same as the common elements of surprise during testing, forming a long-tail effect. Some experts in this field claim the unpredictability and uncontrollability of edge cases are holding AV development back because it is these cases that determine if AVs are safer than human drivers. Currently, the general direction for edge cases is to expose machine learning algorithms to more data. Recordings from real traffic may not be able to cover all scenarios that may occur, but it is a basis for building a simulation database for training purposes.
Privacy Risk in Data Collection during Autonomous Driving Development
Automotive companies have realized the benefit of real-life data and are doing research drives. It sounds like a very pleasing step forward, but many overlooked the privacy issue such research drives are causing.
During the test drives, data of uninvolved passer-bys is inevitably recorded and stored. The GDPR steers to be an opt-in regulation, where consent is required to ensure its legal implementation. For commercial use cases like Tesla’s sentry mode, simply stating data collection is out of legitimate interest is not enough, as neither does it comply with the core idea of the GDPR, nor does it fit into any circumstances of lawful data processing regulated by Article 6 of the GDPR. The problem is, when recording real-time traffic data, it is almost impossible to get the consent of every passer-by when testing in uncontrolled environments.
Solution to the Dilemma
So does this mean that data collection for developing products outweighs the risks of data misuse and breaches? Or does innovation have to be sacrificed to protect the public’s data?
Data anonymization can be a solution to this dilemma. As it is stated in Recital 26 of the GDPR, “[the GDPR] does not therefore concern the processing of such anonymous information [1], including for statistical or research purposes.” Similar statements can also be found in other data protection regulations of other countries.
At brighter AI, we believe in data protection and technological innovation. We refuse to believe we live in a world where we have to make trade-offs between data privacy and tech innovation. Therefore, we developed our state-of-the-art anonymization, Deep Natural Anonymization (DNAT). DNAT is an advanced solution to protect PII captured in image and video data. This technology automatically detects and anonymizes personal information such as faces and license plates, and generates a synthetic replacement that reflects the original attributes. Meanwhile, critical information for machine learning and AI innovation, such as age, glance, emotion, etc. are kept. Therefore, the solution protects the identities of individuals and vehicles while keeping necessary information for analytics or machine learning.
General video redaction techniques include blurring the PIIs, which leads to loss of information and context of the image. DNAT, on the other hand, replaces the original PII with an artificial one that has a natural appearance and preserves the content information of the image. It preserves the semantic segmentation, and the semantic segmentation consistency is measured. If you are interested in our solution, feel free to contact us for a demo or a free trial.
[1] Anonymous information: information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.