Home ยป Collaboration between Google and AI Singapore to Develop Multilingual Dataset for ASEAN Nations Including Thai to Enhance LLM Progression

Collaboration between Google and AI Singapore to Develop Multilingual Dataset for ASEAN Nations Including Thai to Enhance LLM Progression

Google collaborates with the AI Singapore project to launch the SEALD project (Southeast Asian Languages in One Network Data) to create language datasets for use with large language models (LLMs) focusing on the ASEAN region, with the initial set of languages including Bahasa Indonesia, Thai, Tamil, Filipino, and Burmese.

The project is not only limited to data sets but also involves developing language translation models, establishing best practices for dataset creation, creating language conversion tools (translocalization), and disseminating guidelines for building models in Southeast Asian languages. The data obtained from this project will be open source for other agencies to use in developing LLMs in the future.

Currently in the process of compiling data sets, once completed, it will be made available for download to the general public.
Source – AI Singapore

TLDR: Google and AI Singapore collaborate on SEALD project to create language datasets for large language models focusing on Southeast Asian languages, promoting open-source data for future LLM development.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Experimental Disruptive Application by Google Challenges Financializing Apps | Exclusive Singapore Test

Google Accepts U.S. Attorney’s Request to Address Android Store Restrictions, Enhances Sideload Interface for Enhanced User-Friendliness

Enhancing User Experience: iOS Chrome Now Allows Address Bar Relocation – Mimicking Safari’s Innovative Feature