Hello Reader 👋
I hope you are doing well today!
These days, I'm experimenting with AI assistants. I want to see what they can or can't do when it comes to legacy codebases. There are shiny promises that go like "AI will refactor the code for us". But how reliable are these? Let's find out…
This article is probably the first one of a series. I'm gonna try out the Rubberduck VS Code extension. Interestingly, the source code is open! Thus, I've started to contribute.
But in this article, I'll be using Rubberduck from my end-user point of view. And what the duck can do is… fascinating 🦆
If you prefer to read that online, I wrote a blog post for you.
AI has become a hot topic in the past few months. Tools like ChatGPT have been released for the world to play with, and this has raised a lot of interesting questions about how it will change the way we work.
My interest is piqued when it comes to legacy code.
A lot of us are working on existing, poorly documented, untested software that we have to change. Evolving this code without breaking things is a daily challenge. Would AI does that job better? I would actually be excited if tools could do some of the grunt work, so I could spend more time understanding the Problem that is being solved instead of spending so long on the implementation details of the Solution that I need to change.
However, I’m suspicious.
I’ve already wrote about this topic and my experience wasn’t great then. The generated tests weren’t really helpful. I was mostly wasting time.
AI makes mistakes. Yet, it sounds confident. If it doesn’t know, it may just make things up. If I don’t know what the code is doing, how can I hope to detect the lies?
That being said, I’ve also played with ChatGPT. I gave it slices of code and asked it to simplify it, refactor it with patterns, etc.
It works… but not always. I’ve found it creates friction during my development process, namely:
And then, I saw Lars Grammel’s tweet about his new project: Rubberduck, an AI-powered coding assistant for VS Code.
It says it can help you:
Right from VS Code.
Well, let’s try it out and find out what it can really do.
This is a coding exercise that captures real-life software complexity (HTTP requests, database calls, external data…), without adding too much noise. I really like to use it to practice tools and techniques for refactoring legacy code.
For testing Rubberduck, I will tackle the TypeScript version.
The very first step is to run a tool like Prettier to standardize the shape of the code. No more noise in follow-up changes: rbd-1-prettier
The end goal of the exercise is to change the API so the endpoint that returns lift pass prices can return the price for several passes instead of just one.
But the code is hard to re-use, so we would like to refactor it a little bit first.
But how do we know the code is still working as expected after we refactored it? In general, we want to write tests first.
So let’s see if the duck can help us write the missing tests faster… or not.
First of all, we need to write some tests before we touch the code. For this kata, I usually go with:
supertest-as-promised
to run HTTP queries against the top-level applicationtestcontainers
to set up the database and get it to run in the tests (it’s based on Docker)But let’s ask Rubberduck!
To do so I need to:
This started a new chat and I could type in my question:
OK, this is not very helpful. It’s just generic guidance…
I followed up with more details about what’s blocking me and got some advice.
That’s more interesting!
✅ It’s good that it suggested SuperTest for testing the controller. If I wasn’t aware of this, it would have put me on the right track!
🚫 I think going for Sinon to mock the database calls is a mistake at this point. I will refactor the implementation details and I have no control over the connection
API. Mocking it will make refactorings harder. I think a better approach here is to take a step back, consider this a black box, and set up an actual database.
I tried to ask follow-up questions. Rubberduck gave me an interesting option where we could use Sequelize in tests to setup the database:
const Sequelize = require("sequelize")
const sequelize = new Sequelize("database", "username", "password", {
host: "localhost",
dialect: "postgres",
})
const User = sequelize.define("user", {
username: Sequelize.STRING,
birthday: Sequelize.DATE,
})
sequelize
.sync()
.then(() => console.log("Database is synced"))
.catch(err => console.error("Error creating database"))
User.create({
username: "testuser",
birthday: new Date(2000, 0, 1),
})
.then(user => console.log(user.username + " created"))
.catch(err => console.error("Error creating user"))
I think this would work, indeed.
Note however that I couldn’t get the AI to guide me towards testcontainers
, despite mentioning “Docker in tests”. Without AI, I would usually search npmjs with the relevant keywords, which will expose me to it.
To me, this means you should not just blindly listen to AI—it would have been naive to think otherwise. But combining its suggestions with your own experience and other sources can get you further, faster.
If I didn’t know how to get started here, Rubberduck suggestions to use Sequelize would certainly have helped.
But let’s go with testcontainers
anyway.
There is an existing test. To make it run, we need to:
yarn start
to get the server runningyarn test
to run the tests against the serverIt may work to get started, but that won’t go far. If anything, this test can’t easily run on CI. The idea would be to start a Docker container where the whole thing can run reliably for each test run. That’s what testcontainers
can do!
Let’s ask Rubberduck to update the tests for us, and see if it can save us some time writing the boilerplate code:
Then, I type in my request, trying to give some specificities:
After ~10s it generates a diff next to my code. I can inspect it and decide to “Apply” by clicking on a button at the bottom. I must admit that’s a way better flow than having to copy-paste code on ChatGPT. It feels integrated within my development process 😄
Looking at the suggested diff, it’s not bad. There are some good things:
container
variable along with the others and initiated itThere are also some errors:
GenericContainer("mariadb:10.4")
, not passing two arguments. Hopefully, TypeScript will catch that easily.At this point, I have 2 options:
On real code, I would go for option #2 since it would be faster—the AI still takes a few seconds to generate the code and I feel I can do the change faster myself because I won’t wait. It also comforts me with the idea of using AI to assist me, but still own the changes and finalizes what needs to be done.
For the exercise though, I will refine the suggestion to see what it can do:
And that worked!
It’s very interesting how you can follow up on the chat to get a refined suggestion. It means you don’t have to get the first input right on the first try. The main limiting factor today is the ~10s it takes to generate the code again, which makes the feedback loop too long for just developing like that—I can just make the changes myself.
Happy with these changes, I clicked “Apply” to get the code in. I make a few manual tweaks:
GenericContainer("mariadb:10.4")
container
when it’s declared to get type-safety wherever it’s usedI guess I could have refined these points with Rubberduck, but I mostly realized them after I got the code in. That’s fine. Rubberduck got me the headstart I was looking for… TypeScript is getting me through the finish line 👍
Now it’s time to verify if all of that is working!
I don’t have a MariaDB up and running. Nor is the app running. I only have Docker started on my machine. I hit yarn test
and wait…
Failure!
Apparently, something is wrong with the exposed ports. And indeed, in this specific case, AI’s suggested syntax won’t be enough: withExposedPorts(3306)
. The problem is that the port may not be mapped to the 3306 port on the host.
A quick search in testcontainers docs tells me that there is another syntax to bind the ports: withExposedPorts({ container: 3306, host: 3306 })
. Another way would be to get the mapped port and pass it to the source code, but I don’t want to change the source code now. Let’s go with the first option.
I change the code and run yarn test
again:
Success!
I think that illustrates something important: the code suggested by AI has a lot of unknowns and hypotheses. I need to combine it with other tools (static types, reading the API docs, my own knowledge, some sort of tests…) to validate them.
I think it’s important to find a way to verify the suggested code soon after it was merged in.
That being said, the suggested syntax was fine and it helped my lookup for the relevant docs. If I wasn’t familiar with testcontainers
API, that would have saved me quite some time!
Commit, push, and we are here: rbd-2-testcontainers
This is often a difficult part.
What do you test? How do you get started? There is so much going on… I generally recommend using the test coverage to identify parts of the code that are not tested yet. I’ve already detailed this process, the main idea is to vary the inputs to capture the existing behavior
For this exercise, we have 3 parameters: type
, age
, and date
. Spoiler, there are at least 11 scenarios to cover:
Let’s see if Rubberduck can speed that up!
The first test is already given, let’s just rename it.
- it("does something", async () => {
+ it("returns day cost when type is '1jour'", async () => {
Now, let’s ask Rubberduck to generate some tests for us, and see if it can suggest useful scenarios quickly from the source code.
After waiting ~1min it generates a new unsaved file with the test code:
The style doesn’t match the rest of the tests. It tried to write unit tests from scratch. But I can copy-paste the generated scenarios and adapt them. I feel this will be faster than re-generating new tests.
After copying the test cases, I select them and trigger another “Rubberduck: Edit Code 💬” action.
After ~10s I get a diff that looks good. I click apply and I got a code that looks like this:
// Original test
it("returns day cost when type is '1jour'", async () => {
const response = await request(app).get("/prices?type=1jour")
expect(response.body).deep.equal({ cost: 35 })
})
// Tests generated from Rubberduck
it("should return cost 0 when age is less than 6", async () => {
const req = { query: { type: "day", age: 5 } }
const response = await request(app)
.get("/prices")
.query(req)
expect(response.body).to.have.property("cost", 0)
})
it("should return cost 0 when age is undefined", async () => {
const req = { query: { type: "day" } }
const response = await request(app)
.get("/prices")
.query(req)
expect(response.body).to.have.property("cost", 10)
})
// …
But running the tests will fail. With a closer look, I realize there is indeed an error: the req
object should actually be a query
and not have a nested query
attribute! I could use Rubberduck to fix all tests… but I decide to use VS Code multi-cursor instead since I can do it in a few seconds:
I decided to rewrite the original test to look similar to the others. This is when I notice something else is off: the type value is “day” but it should really be “1jour”.
That’s a legit mistake. The source code refers to the type “night” as a special one. AI figured another variant would be “day”. Except that the valid values are in the database, and AI isn’t aware of this yet.
I replace all “day” occurrences with “1jour” and run the tests again:
Well, well, well… When looking closer, there are some obvious mistakes in the generated tests.
One test isn’t testing what it says it is testing:
In fact, this test is even wrong. I can see the body of the test is the same as the original one. The expected cost should be the base cost, and I can confirm that with the test failure. Let’s scrap this one!
As for the other tests, they fail because they consider the base cost to be 10:
But here again: the base cost is set in the database, and depends on the ticket type.
Therefore, I need to correct these expectations manually. The test’s failure helps me figure out the proper behavior. Sometimes the label is wrong, sometimes it’s the expected output that’s incorrect. However, the variation of inputs is the interesting part! Let’s see the scenarios that were properly covered:
That’s 8 scenarios out of 11!
Sure, the generated tests didn’t work. I also had to manually fix the syntax, most labels, and expected values. But I saved quite some time by not having to manually figure out the different parameters that have to change. Rubberduck found them all for me!
With this base, I would then use test coverage to see parts of the code that aren’t tested yet and figure out the missing variations to get them all.
From my experience, Rubberduck generates better tests when the source code doesn’t depend on external sources, like a database or a 3rd-party service. And yet, this was helpful to give me ideas and get me started. It may come in handy when dealing with unfamiliar code that I want to write tests for 👍
The final code is here: rbd-3-tests
While writing this article, I pushed all the code to a forked repository, so you can follow along.
At some point, I couldn’t remember the git syntax to push the tags to that specific remote… So I just open a new Rubberduck chat and asked. It gave me the command I was looking for. All of that without leaving VS Code ❤️
Well, thank you Rubberduck 👍 🦆
That would be my conclusion to this experiment. I was curious about what an AI assistant such as Rubberduck could do:
As a bonus, it gave me easy input to query OpenAI, right from my editor. I was able to prompt random questions while coding without having to switch to my browser 🏆
I would conclude this article with 2 thoughts:
Until next time, take care!
Piles of Tech Debt, no tests, no docs, short deadlines… But you are not alone! Join me and get regular tips, tricks, and experiments to turn unfriendly codebases into insightful ones 💡
Hello Reader 👋 I hope you are doing well today! I recently finished reading “Refactoring at Scale” by Maude Lemaire, while on a nice family trip to Toronto, Canada 🍁 Honestly, I found it quite good. It's packed with interesting insights and good advice built from concrete real-life experiences. It has become one of the few books I would recommend to someone dealing with a huge codebase, dozens of engineers constantly molding it, and pressure to keep delivering business value to customers....
Hello Reader 👋 I hope you are doing well today! Do you often find yourself fighting with the intricacies of legacy code or navigating through convoluted programming structures? In his popular “Refactoring” book, Martin Fowler collects an impressive catalog of moves that can transform the way you approach code maintenance and evolution. If you haven’t read it and are unsure what to expect, I’ve written down a high-level summary of what you will find here. Hopefully, that gives you a better...
Hello Reader 👋 I hope you are doing well today! If you had a magic wand, what would you do with the tangled legacy codebase you are dealing with? For many developers, the answer will go along the lines of: Kill it with Fire!!1!Let’s rewrite the whole thing on a modern stack. Hopefully, Marianne Bellotti, the author of the book with such a provocative title, has better options for you. I've read it cover to cover and I will share with you my personal highlights here. P.S. here’s a shareable...