Ai2, the research institute founded by Paul Allen, the co-founder of Microsoft, has introduced the olmOCR artificial intelligence model. This model converts images into high-quality text. Testing has shown that when judged by humans, olmOCR performs better than other AI models in the same category.
olmOCR is built from the Qwen2-VL-7B-Instruct, a small model from Alibaba Cloud. However, it has been further customized by training it on 250,000 documents to convert images into text. One technique used by olmOCR is to extract text directly from PDFs, known as anchor text, allowing the LLM to recognize the text within the image, then convert it accordingly. Even when faced with blank images, such as scanned documents, olmOCR still provides excellent results.
The text extracted by olmOCR is automatically arranged based on the reading order. It supports the translation of equations, tables, and handwriting. However, the model currently does not support image-to-text conversion, despite indicating it in the output.
The model is open-source under the Apache 2.0 license, allowing for free usage. It is recommended for educational and research purposes. In addition to releasing the model, Ai2 has also made available the code for training, datasets, and software for running the entire system. Personally, I have tested it and found that it works relatively well with Thai language.
TLDR: Ai2 introduces olmOCR, an AI model that excels in converting images into high-quality text. It outperforms other AI models in the same category when assessed by humans. The model is customizable, supports text extraction from PDFs, and is available for free usage under the Apache 2.0 license.
Leave a Comment