Iscte, through its research unit ISTAR_Iscte, has recently secured the ORAL - kriOl(u) laRge lAnguage modeLs project, a consortium initiative with the University of Cape Verde, co-funded by the OEI – Organisation of Ibero-American States (https://oei.int/pt/), under its funding programme for “Development, implementation and/or training in platforms, applications or technological resources geared towards multilingualism” within the scope of the 2025 Support Fund - OEI-Portugal. The project has a budget of USD 72,000 over a 15-month period.
The main objective of the initiative is to enable speakers of Cape Verdean Creole (ISO: kea), a Portuguese-based language that emerged in the early 15th century, both in Cape Verde and the diaspora, to benefit from the digital transformation in their native language.
The project also aims to support public policies that promote the recognition, standardisation and officialisation of Cape Verdean Creole, in close cooperation with public institutions in Cape Verde responsible for language development – such as the Ministry of Education, the Ministry of Culture and Creative Industries, and Instituto Camões – Institute for Cooperation and Language – as well as other civil society stakeholders, including the Cape Verdean Mother Language Association (ALMA-CV). The initiative seeks to facilitate the inclusion of the language in the global digital transformation landscape, where languages such as Portuguese and Spanish are already well established.
This project will create and make available the following language resources and natural language processing tools, currently non-existent for Cape Verdean Creole, ensuring open access and free code to the speaker community:
- kea text corpora (in the Santiago and São Vicente dialectal varieties);
- Parallel kea:pt-pt text corpora;
- The first large-scale language model (LLM) for generating written text and dialogue in kea;
- The first large-scale language model for bidirectional translation kea ↔ pt-pt;
- A kea phonetic glossary and demonstration web application;
- kea speech corpora in the Santiago and São Vicente dialects;
- The first speech recognition system for kea;
- A chatbot capable of written interaction in kea;
- A voicebot capable of spoken interaction in kea (with speech recognition);
- Public APIs for programmatic access to the above-mentioned resources, models, and corpora, published on Hugging Face (https://huggingface.co/).
In addition to technological development, the initiative includes training actions for civil servants, teachers, researchers, students, entrepreneurs, and citizens with special needs, enabling them to integrate these tools into their organisational and business processes. Furthermore, the social and institutional impact of the use of these technologies will also be assessed.
Project Information:
- Iscte Budget: USD 46,368.00
- Iscte Co-PIs: Prof. Miguel Sales Dias and Prof. António Raimundo
- UniCV Budget: USD 25,632.00
- UniCV Co-PIs: Prof. Dominika Swolkien and Prof. Ana Karina Moreira
- Duration: 15 months, starting in October 2025.
Português