In the last decade, voice assistants are becoming more and more popular. First with smartphones, artificial intelligence driven voice assistants become common for many persons. But with the launch of Amazon’s Alexa home devices, and then Google Home, a new category of smart speakers started to invade our homes, always ready to answer their owners voice commands. In the last two years the use of smart devices almost doubled, showing this is a trend to watch, and the Covid-19 pandemic caused an increase in its use.
These new interactive devices created new opportunities and challenges for content creators. In one hand, this is one of the most natural interfaces available – the user just has to speak – but to sort out what kind of answer he expects from the device is not straight forward.
The first voice recognition tool was created by IBM in 1961. In the following decades other companies made progress in word recognition. But the big leap, in fact, only arrived in the second decade of the 21st century, with Apple's launch of Siri in 2011. But a decade after, the technology is still not very reliable and available in a limited number of languages: by September 15, 2020 we had Alexa available in 8 languages, Google Nest in 13, and Siri in 21.
Despite different devices and brands, the operation of voice assistants is similar. A keyword or question activates the system that turns voice into text, then into data, and then returns the path to answer the user's request in voice, using artificial intelligence, machine learning and algorithms to accurately meet the user's request.
Media outlets are no exception in the discovery and exploration of these voice interfaces, which pose new questions and rules, making content creators to go deep into new challenges, trying to sort out how to interact with users, first, then to discover what contents should be tailored to each user and, finally, the format of the news stories more adequate to voice.
Since the first experiments to deliver news through voice assistants in 2017, several attempts were made, but mostly in English language, which experts agree pose no understanding problems. In other languages, it is still common to find flaws and errors of interpretation, adding a layer of complexity for all non-English speaking users and producers.
To achieve these goals, we made an extensive review of the available academic research and international industry reports relating voice assistants, smart speakers, artificial intelligence, and algorithms to establish a basis for the empirical research. We developed semi-structured interviews with journalists and content producers for voice-activated smart speakers in several European non-English speaking countries, and also in Brazil, with the goal to understand the challenges and constraints on the production side. We also developed an online survey to smart speakers’ users in the Portuguese speaking countries, to help us understand: 1) the users profile; 2) needs, expectations, experience, and satisfaction; 3) interaction quality.
The choice for the Portuguese language as a case study for the research is due to convenience, but also for it’s representativity at a global level: it is the 6th more spoken in the World, the 3th from European origin, the most spoken on the South Hemisphere, and official language in 9 countries from Europe, South America, Africa, Asia and Oceania.
Through the analysis of the Portuguese language countries – with focus in Portugal and Brazil –, we will try to identify the main challenges and problems that non-English media can face to create and deliver news though voice-activated smart assistants.