Home ยป Unveiling SEALD: A Quest to Foster LLM Open Source Initiatives in the Context of ASEAN Region

Unveiling SEALD: A Quest to Foster LLM Open Source Initiatives in the Context of ASEAN Region

Currently, advanced LLM models come in various types, mostly developed from English frameworks or other major global languages, along with datasets and customization influenced by Western perspectives. This presents challenges for countries and regions with unique languages and cultural contexts that may not have access to LLMs. Collaboration between AI Singapore and Google Research on Project SEALD aims to address these issues by developing a Foundation Model proficient in language and social contexts specific to the diverse languages and cultures of Southeast Asia.

Project SEALD’s collaboration involves local partners in multiple countries, including institutions like VISTEC and KBTG in Thailand, to create high-quality open datasets and train LLM models tailored to Southeast Asia’s linguistic and cultural nuances. In this initiative, Google Research plays a dual role in data aggregation across the region and the implementation of innovative research methods within Project SEALD.

One such innovative technique employed is the Composition To Augment Language Model (CALM), enabling the integration of specialized models with an Anchor Model to enhance overall capabilities without excessive retraining. Additionally, MatFormer addresses the need for models of varying sizes by training layers of different sizes simultaneously, allowing users to mix and match model components according to their needs without redundant training.

The SEA-LION model for Southeast Asia, developed on the MPT architecture, has evolved through three versions, leveraging LLAMA 3 and Gemma 2 frameworks to expand capacity and efficiency. However, these versions have yet to incorporate CALM or MatFormer strategies into their training processes, with ongoing experimentation to enhance future model iterations.

Moreover, AI Singapore introduces SEA HELM, a bespoke benchmarking tool like the SEA-LION model, tailored for evaluating language models in Southeast Asia. Presently, the gemma-2-9b-cpt-sea-lionv3-instruct model leads in overall SEA rankings, including Thai language proficiency.

TLDR: Cutting-edge LLM models face challenges in adapting to diverse linguistic and cultural contexts, prompting collaborations like Project SEALD to develop region-specific models with innovative techniques like CALM and MatFormer. Additionally, AI Singapore’s SEA HELM tool benchmarks language models in Southeast Asia, enhancing evaluation and development efforts in the region.

More Reading

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Introducing HuggingChat Assistant: Unleashing the Splendid Chatbot Customization Service by Hugging Face, Complimentary Access Ensured

Collaboration between Google and AI Singapore to Develop Multilingual Dataset for ASEAN Nations Including Thai to Enhance LLM Progression

Co-founders of Character.AI announce return to collaborate with Google