At LanguageTool, Bartmoss St Clair (Head of AI) is pioneering the use of Large Language Models (LLMs) for grammatical error correction (GEC), moving away from the tool's initial non-AI approach to create a system capable of catching and correcting errors across multiple languages.
LanguageTool supports over 30 languages, has several million users, and over 4 million installations of its browser add-on, benefiting from a diverse team of employees from around the world.
Episode Summary -
- LanguageTool decided against using existing LLMs like GPT-3 or GPT-4 due to cost, speed, and accuracy benefits of developing their own models, focusing on creating a balance between performance, speed, and cost.
- The tool is designed to work with low latency for real-time applications, catering to a wide range of users including academics and businesses, with the aim to balance accurate grammar correction without being intrusive.
- Bartmoss discussed the nuanced approach to grammar correction, acknowledging that language evolves and user preferences may vary, necessitating a balance between strict grammatical rules and user acceptability.
- The company employs a mix of decoder and encoder-decoder models depending on the task, with a focus on contextual understanding and the challenges of maintaining the original meaning of text while correcting grammar.
- A hybrid system that combines rule-based algorithms with machine learning is used to provide nuanced grammar corrections and explanations for the corrections, enhancing user understanding and trust.
- LanguageTool is developing a generalized GEC system, incorporating legacy rules and machine learning for comprehensive error correction across various types of text.
- Training models involve a mix of user data, expert-annotated data, and synthetic data, aiming to reflect real user error patterns for effective correction.
- The company has built tools to benchmark GEC tasks, focusing on precision, recall, and user feedback to guide quality improvements.
- Introduction of LLMs has expanded LanguageTool's capabilities, including rewriting and rephrasing, and improved error detection beyond simple grammatical rules.
- Despite the higher costs associated with LLMs and hosting infrastructure, the investment is seen as worthwhile for improving user experience and conversion rates for premium products.
- Bartmoss speculates on the future impact of LLMs on language evolution, noting their current influence and the importance of adapting to changes in language use over time.
- LanguageTool prioritizes privacy and data security, avoiding external APIs for grammatical error correction and developing their systems in-house with open-source models.