
In its current version, the Article 50 of the European Union AI ACT [1] is devoted to transparency obligations which, among others, explicitly states that Deployers of an AI generated contents, including audio shall disclose that the content has been artificially generated or manipulated.
Portuguese public administration has already started using speech as a privileged way of interaction with the public. An example is the virtual assistant for information about the digital mobile key [2].
As synthetic speech, also referred to audio deep fakes when used with pernicious purposes, becomes increasingly prevalent, it presents significant challenges in distinguishing between human and synthetic speech. A 2023 study revealed that “Humans cannot reliably detect speech deepfakes” [3]. Public administration services relying on speech, such as the emergency service number 112, should ensure the authenticity of their communications to prevent impersonation, spread of misinformation, rational use of resources, and comply with regulatory requirements.
To verify the authenticity of communications, the project proposes to develop, as proof of concept, an API (Application Program Interface) and the corresponding Portuguese synthetic speech detector for Portuguese public administration services. Currently there are several synthetic speech detectors available, most of them for the English language. Due to a series of specificities such as the nuances of prosodic and phonetic features, these studies exhibit limited generalization across different languages.
The main innovation in this project lies in its focus on Portuguese. According to Wikipedia [4], Portuguese has approximately 300 million total speakers, is one of the more spoken language in the world and is spread on all continents. Despite all of this, the language is severely underrepresented in current synthetic speech detection technologies. As of now, a thorough literature search found a single work on deepfake detection tailored for the Portuguese language but was primarily focused on video-extracted features [5]. Thus, this project aims at addressing a critical gap in the existing solutions.
Given that the Portuguese society is currently composed by Portuguese speakers from different origins, the goal is to develop a Portuguese synthetic speech detector able to deal not only with PT-PT (European Portuguese) but also with the different accents, dialects, and variants of the Portuguese language, from the North to the South, from Azores to Madeira, but also for Portuguese speakers from countries such as Angola, Brazil, Cabo Verde, Guiné, Macau, Mozambique, São Tomé e Principe, and Timor-Leste.
Our approach builds on state-of-the-art deep learning models, including those for transfer learning and domain adaptation [29, 30], to create an innovative, robust, and accurate detector capable of identifying synthetic Portuguese speech. In particular, the project starts by examining what is the performance for Portuguese of models such as Res-TSSDNet [6] or TE-ResNet [7]. These are the state of the art detectors for English as they are the winners of the ASVspoof challenge, a bi-annual competition geared towards encouraging this type of research [8].
The project will include a continuous literature search and review for methods in synthetic speech detection, the acquisition and annotation of the required data set with both real and synthetic Portuguese audio, employment of generative data augmentation techniques, and the development of the synthetic speech detector model and its API.
Palavras chave - Deepfake; Generative AI; Deep Learning; synthetic speech detection
Universidade do Algarve

Qual o nível de satisfação com a experiência de utilização do novo ualg.pt?