Making AI more accurate for automated code refactoring

We as a development community have made significant progress in using AI to speed up coding work. The most success we see is in the IDE where autocompletes help developers with the code they are currently working on and chats explain and generate different code snippets. The developer must review and accept suggestions, which is especially important if AI hallucinates and provides wrong information.

Autocomplete works as well as it does in large part because of the context that is supplied to the large language model (LLM) from the text surrounding the cursor and even from other tabs that the IDE has open. Developers learning how AI-based autocompletion works can game the system to get better accuracy—opening relevant files in the IDE to help with context and make AI more accurate.

This system obviously doesn’t scale if we want to analyze and refactor large codebases simultaneously across multiple repositories. Every single edit with AI is based on immediate context and is suggestive, requiring developer review. If you want to refactor at scale, you need accuracy and trust in your system.

To that end, we are announcing AI-assisted auto-refactoring in the Moderne Platform. It’s the best of both worlds where you get the accuracy and efficiency needed for transforming code across multiple repositories and the flexibility to integrate AI LLMs to augment the work as useful.

In this article, we will take you through integrating AI LLMs in the Moderne Platform to support auto-refactoring work, and a specific use case that was très bien for one of our customers.

Improving accuracy of LLMs for code with Lossless Semantic Trees

Before we get into Moderne’s specific integration of AI, let’s walk through some of the technical thinking and specifics about leveraging AI with rules-based auto-refactoring.

Most LLMs are trained on natural language and source code as text and do not take into consideration the difference between natural language and code. The latter has a structure, strict grammar, and a compiler that can deterministically resolve it, which could be leveraged.

There has been some research to fine-tune the models on a structured code representation such as the AST (abstract syntax tree), CST (concrete syntax tree), Data flow, Control flow, and even program execution. However, AI tuning based on those models will take time (and compute power!). Additionally, these trees lack semantic data about code, such as type attribution, which is crucial for accurate codebase searches.

Today, business applications such as chatbots augment LLM training using a technique called retrieval-augmented generation (RAG). RAG improves the accuracy and reliability of generative AI models by using embeddings that fetch data from relevant external sources, guiding the model’s decision-making process. Embeddings allow us to supply context to the model, including data it wasn’t trained on plus more text than a real-time IDE context window could supply.

When working with large codebases, we similarly need a way to retrieve relevant data from the user's codebase as context and feed it to an LLM alongside the query. Fortunately, through the OpenRewrite auto-refactoring open-source project, we have a new code representation called the lossless semantic tree (LST) that includes type attribution and other metadata. This full-fidelity representation of code is an essential foundation for searching and transforming code accurately—and can do the same job that embeddings do for natural language text. Moreover, the Moderne Platform serializes LSTs to disk where they are horizontally scalable, which takes refactoring work from single-repo mode to multi-repo mode.

The Moderne Platform, with aggregated LSTs, allows us to query large codebases and provide the results to the LLM for its query or generation. We can then validate the LLM suggestion with our 100% accurate refactoring system.

In summary, for working with code, Moderne can provide highly relevant context to LLMs using LSTs, which are extremely rich data representations of your code. We use open-source LLMs to securely deploy this solution on our platform, ensuring that no code leaves this environment. Additionally, with our platform, we can effectively test and select the best LLM for a given task.

Integrating AI LLMs in OpenRewrite rules-based recipes

The Moderne Platform runs OpenRewrite recipes (or programs) against pre-built and cached LSTs to search and transform code. This allows us to execute recipes across multiple repositories providing near-real-time feedback for real-time code analytics.

When creating recipes, many declarative formats are available for ease of use, but users can develop custom programmatic recipes for maximum flexibility. It is possible to implement many types of integrations from custom OpenRewrite recipes as well. One such example is LaunchDarkly feature flag removal that calls LaunchDarkly to identify unused feature flags, removes functionality behind a feature flag from the codebase, and then calls LaunchDarkly again to remove the unused feature flags. A recipe can also wrap and execute other tools, such as in our JavaScript codemods integration.

LLMs are another important tool we can integrate into recipes to support new and interesting use cases. We’ve focused on OSS-specialized LLMs that can run on CPUs for maximum operational security and efficiency within the Moderne Platform. This enables us to provision the models on the same CPU-based worker nodes where the recipes are manipulating the LST code representations. Through testing and measuring various models, which are downloadable from Hugging Face, we can identify the best ones and iterate on the selection as new models arrive every day. Model selection can make a difference between getting something done accurately and quickly or not at all.

On the Moderne Platform, models run as a Python-based sidecar using the Gradio Python library. Each Python sidecar hosts a different model, allowing for a variety of tasks to be performed. A recipe then has several tools in its toolbox, and different recipes can also use the same sidecar. For example, a recipe that computes the distribution of languages in codebase comments and a recipe that fixes misencoded comments in French can both use the sidecar Python process that hosts a model that can predict the language based on a text input.

When a recipe is running on a worker, it can search LSTs for the necessary data, pass it to the appropriate LLM, and receive a response. The LST only sends to the model the parts that need to evaluated or transformed. The recipe then inserts the LLM response back into LST. The Moderne Platform produces diffs for developers to review and commit back to source code management (SCM), ensuring models are doing their job with precision. See the process in Figure 1.

Figure 1. AI LLMs as sidecars for recipes to leverage on CPU-based workers

We deploy our microservices with immutable infrastructure, so both the main microservice and the Python sidecar with LLMs can be part of the base image and start together. The Python process is efficient, starting only the first time the recipe is called on that specific worker.

LLM-based recipes can also perform the same functions as other recipes, such as emitting data tables and visualizations or integrating with SCM for pull requests and commits.

Une étude de cas avec l'intelligence artificielle: Finding and fixing misencoded French

One of our customers came to us with a problem ripe for an automated fix. Their older code had gone through multiple stages of character encoding transformation through the years leading to misencoded French characters being unrenderable. Furthermore, misencoded French characters in Javadoc comments were causing the Javadoc compiler itself to fail, which meant consumers of that code did not have ready access to documentation on the APIs they were using.

French characters can have accents such as é or è or might be even ç or œ. These special French characters could be found in comments, Javadocs, and basically anywhere there was textual data in their codebase. ASCII is a 7-bit character encoding standard that was designed primarily for the English alphabet, supporting a total of 128 characters (0-127). This set includes letters, digits, punctuation marks, and control characters but does not include accented characters like "é" or other non-English characters. When a character has an encoding issue, it will be replaced by ? or �.

Perhaps the problem started with a source file that was originally created with a character encoding like ISO-8859. Because there was no marker within a text file of what its character encoding was, subsequent tools might guess wrong and think it was encoded with Windows-1252. Maybe there was an attempt to standardize on UTF-8. Most Latin characters like the standard alphabet are represented with the same bytes in all of these encodings, so the file doesn't get totally mangled, but characters not common in English or characters with diacritics are not the same in all encodings. So over the lifetime of the file, characters with diacritics were entered under a variety of different encodings until there was no longer any one encoding with which the file could be interpreted to correctly display everything.

With the Moderne Platform and a little help from AI, we were able to solve this problem quickly. We decided to use AI to figure out what the words are supposed to be and to fill in the appropriate modern UTF-8 characters.

Watch our video on this use case and then read on to learn more about how it all works.

How the fixMisencoded recipe works with AI models

Using OpenRewrite’s framework, we wrote the recipe to use AI to fix the misencoded comments and Javadocs. The recipe walks through the codebase until it finds either a comment or a Javadoc. It then sends the text in those to a sidecar Python process that will generate a predicted fix for the misencoded text.

Figure 2. Sending specific comments to the AI LLM for processing

Before fixing the misencoded text, we first check if the text is in French to minimize unnecessarily fixing code that isn’t in French. We do so by using an XLM-RoBERTa transformer model with a classification head on top that is fine-tuned on language identification.

Figure 3. Identifying the natural language to focus the fixes

The sidecar Python process fixes misencoded text by first loading in a frequency French dictionary from the Leipzig Corpora Collection. The Leipzig corpora have multiple languages, sizes, and sources. The corpus we chose is constructed from a collection of 1 million sentences in French, sourced through the random exploration of various websites. Moreover, you have the option to select a specific country if localization presents any concerns. This corpus consists of French words and the frequency of their appearances within a million sentences.

We load the corpora using SymSpell, which is a Python library for quick text correction, and make the predictions of fixed words using their implementation. We iterate through the words in the input text, and if there is a misencoded character in that word, we use SymSpell to check which word from the corpora is most likely. We use the number of misencoded characters to determine the maximum edit distance parameter.

The recipe that uses the fixMisencoded sidecar Python process can then use the fixed text and change the Javadoc or comment. This produces a diff file that can be used to do a commit on the Moderne Platform as shown in Figure 4.

Figure 4. Showing diffs in the Moderne Platform of fixes for misencoded French

See more examples of fixed text:

The recipe can now fix the encoding issues, which leads not only to more readable code but also the ability to compile the Javadoc. This in turn leads to better usability downstream. This recipe is a key example of how Moderne can assist in updating your code, enhancing its readability, and boosting developer satisfaction and productivity.

Combining strengths of rules-based refactoring and AI in the Moderne Platform

OpenRewrite’s recipes are precise, deterministic, and fast. Large language models are creative, versatile, and powerful. At Moderne, we combine the strengths of both.

In our case study on misencoded French text, fixing these issues is challenging for both purely rules-based systems and LLMs alone. Rules-based systems can't identify the natural language in comments, and LLMs struggle to identify comments or other code syntax/semantic data. By using recipes to guide and focus the LLM, we achieve more predictable and reliable results, which our customers can trust for their large, complex codebases.

Using an LLM alone would not solve the problem for our customer. In the examples shown in Figure 5, we see instances where ChatGPT, an LLM, fails to fix misencoded comments. For example, in the first instance it fails to understand that “this” represents the keyword instead of the determinant. It also fixes “class” to “classe.” which could be frustrating for developers. In the second example, it doesn't recognize that this comment isn’t a question, leading to incorrect fixes.

Figure 5. ChatGPT LLM alone fails to fix misencoded French

The Moderne Platform provides the framework for a recipe to walk your codebase (LSTs) in a deterministic way, calling the AI model only when needed. This not only safeguards and focuses the model to precise places in your code, but also makes models more efficient as they are only used when needed. The transformation possibilities for your code are truly endless.

To learn more about the flexibility of the Moderne Platform and how we’re leveraging AI to improve large codebases, please contact us.

Announcing AI-assisted refactoring in the Moderne Platform: Where a computer gets AI

Key Takeaways

Improving accuracy of LLMs for code with Lossless Semantic Trees

Integrating AI LLMs in OpenRewrite rules-based recipes

Une étude de cas avec l'intelligence artificielle: Finding and fixing misencoded French

How the fixMisencoded recipe works with AI models

Combining strengths of rules-based refactoring and AI in the Moderne Platform

Back to Blog

Back to Engineering Blog