Like most AIs, Google’s Assistant isn’t actually Skynet, and doesn’t actually learn by itself. Just as with Siri and Alexa, Assistant is improved – at least in part – by humans transcribing interactions and helping Assistant to better understand what it’s being asked to do.
In Google’s case, recent news highlighted that contractors were engaged in Europe to hear audio recordings that resulted in a failed action (i.e. Assistant didn’t understand what to do), transcribe them to words, and discern what was actually being asked.
Android Community reports that in some cases, these recordings were made when Assistant hadn’t been triggered (and thus there shouldn’t have been recordings in the first place), and others contained personal information (some of it sensitive) which people might reasonably assume Google wouldn’t be giving out willy nilly.
Well, it turns out that a Belgian news outlet got its hands on more than a thousand private conversations sent for transcription by Google, and this has led the search giant to suspend human language reviews of Assistant interactions. Apparently, only around 0.2% of voice clips are used for these purposes, but when you consider how many Assistant interactions there are on even a daily basis, that’s still a huge number of recordings being sent to people and places unknown for transcription and analysis.
As 9to5Google reports, Google has suspended the transcription services after it emerged yesterday that the company was being investigated by Germany for potential breaches of the GDPR regulation:
Germany’s Hamburg Commissioner for Data Protection and Freedom of Information has released a press statement indicating that they are investigating Google’s handling of Assistant voice recordings. Additionally, Google must temporarily cease their manual review of Assistant queries in the EU, as a proactive step to protect citizens in the event that the three-month investigation finds Google to be in violation of the GDPR.
Commissioner Johannes Caspar further explained his decision to have Google immediately stop listening to Assistant recordings and laid out an initial plan for the investigation.
The use of speech assistance systems in the EU must comply with the data protection requirements of the GDPR. In the case of the Google Assistant, there are currently considerable doubts about this. […] As a first step, further questions about the functioning of the speech analysis system need to be answered. The data protection authorities will then have to decide on the final measures that are necessary for their data protection-compliant operation.
In a statement to The Verge, Google indicated that they’ve already ceased what they call “language reviews” and are working with German authorities on the best course of action to help customers better understand how their data may be used.
We are in touch with the Hamburg data protection authority and are assessing how we conduct audio reviews and help our users understand how data is used. These reviews help make voice recognition systems more inclusive of different accents and dialects across languages. We don’t associate audio clips with user accounts during the review process, and only perform reviews for around 0.2% of all clips.
Undoubtedly this move will slow Assistant’s improvement in the EU, as the transcription / review process was intended to help Assistant to better understand the huge range of different accents and dialects in use in EU (and elsewhere). On the flip side, though, it seems that a large amount of personal information was disclosed to contractors to carry out this improvement, and that’s a genuine concern for the privacy conscious everywhere.