Ai2 Research Institute Unveils olmOCR: A High-Quality Image-to-Text Conversion Model Supporting Thai Language

Ai2, the research institute founded by Paul Allen, the co-founder of Microsoft, has introduced the olmOCR artificial intelligence model. This model converts images into high-quality text. Testing has shown that when judged by humans, olmOCR performs better than other AI models in the same category.

olmOCR is built from the Qwen2-VL-7B-Instruct, a small model from Alibaba Cloud. However, it has been further customized by training it on 250,000 documents to convert images into text. One technique used by olmOCR is to extract text directly from PDFs, known as anchor text, allowing the LLM to recognize the text within the image, then convert it accordingly. Even when faced with blank images, such as scanned documents, olmOCR still provides excellent results.

The text extracted by olmOCR is automatically arranged based on the reading order. It supports the translation of equations, tables, and handwriting. However, the model currently does not support image-to-text conversion, despite indicating it in the output.

The model is open-source under the Apache 2.0 license, allowing for free usage. It is recommended for educational and research purposes. In addition to releasing the model, Ai2 has also made available the code for training, datasets, and software for running the entire system. Personally, I have tested it and found that it works relatively well with Thai language.

TLDR: Ai2 introduces olmOCR, an AI model that excels in converting images into high-quality text. It outperforms other AI models in the same category when assessed by humans. The model is customizable, supports text extraction from PDFs, and is available for free usage under the Apache 2.0 license.

Ai2 Research Institute Unveils olmOCR: A High-Quality Image-to-Text Conversion Model Supporting Thai Language

More Reading

Sony Slashes Price of PS VR2 from $549 to $399

Phi-4 AI Unleashed by Microsoft: Audio Perception, Visual Recognition, Messaging Response Capabilities with Gemini 2.0 Flash Scoring.

Leave a Comment

Leave a Reply Cancel reply

More Reading

Post navigation

Leave a Comment

Leave a Reply Cancel reply

Appointment of Zico Kolter as Director of ML Department at Carnegie Mellon by OpenAI Adds Fresh Expertise to Company Board

Enhancing Content Discovery: Netflix Explores New Search Approach Empowered by OpenAI’s Deep Exploration Capabilities

xAI, Elon Musk’s artificial intelligence company, acquires X/Twitter business for a staggering 1.5 trillion baht.