Home » Collaboration between Google and AI Singapore to Develop Multilingual Dataset for ASEAN Nations Including Thai to Enhance LLM Progression

Collaboration between Google and AI Singapore to Develop Multilingual Dataset for ASEAN Nations Including Thai to Enhance LLM Progression

Google collaborates with the AI Singapore project to launch the SEALD project (Southeast Asian Languages in One Network Data) to create language datasets for use with large language models (LLMs) focusing on the ASEAN region, with the initial set of languages including Bahasa Indonesia, Thai, Tamil, Filipino, and Burmese.

The project is not only limited to data sets but also involves developing language translation models, establishing best practices for dataset creation, creating language conversion tools (translocalization), and disseminating guidelines for building models in Southeast Asian languages. The data obtained from this project will be open source for other agencies to use in developing LLMs in the future.

Currently in the process of compiling data sets, once completed, it will be made available for download to the general public.
Source – AI Singapore

TLDR: Google and AI Singapore collaborate on SEALD project to create language datasets for large language models focusing on Southeast Asian languages, promoting open-source data for future LLM development.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Google Announces Pixel 8 Launch Event on October 4th

Google offers entrepreneurs access to shared storage space, now allocating 30 GB per user instead of individual accounts.

The Ascendance of Emojis ✨ as Iconic Symbols of the AI Era Embraced by Google, OpenAI, Spotify