Google introduces DataGemma, an LLM model designed to address hallucination issues often found in LLM models by ensuring accuracy even when provided with misleading information. Google tackles this issue by connecting to current databases for reference.
The platform that Google utilizes, known as Data Commons, is a Knowledge Graph containing over 240 million data points sourced from reliable entities such as the United Nations (UN), World Health Organization (WHO), Centers for Disease Control and Prevention (CDC), and Census Bureaus, resulting in AI that can provide precise information.
Google employs two additional methods in Gemma to connect data with Data Commons. The first method is RIG (Retrieval-Interleaved Generation), where the model must always specify references to Data Commons when providing answers. The second method, RAG (Retrieval-Augmented Generation), requires the model to add context with Data Commons to enhance information before answering.
Testing the performance of LLM with RIG and RAG methods revealed that the model can answer questions with increased accuracy of data by 5-17% compared to traditional models. For more details on DataGemma, visit Hugging Face, which offers access for educational and research purposes.
TLDR: Google introduces DataGemma, an LLM model designed to address hallucination issues, connecting to Data Commons for accurate information and achieving improved accuracy in answering questions through RIG and RAG methods.
Leave a Comment