Thoughts on generative AI and Code - blackforest-embedded.de

Since AI is obviously all the rage these days and there are a lot of companies peddling AI coding assistants I thought there must be something to that hype. And, working for a company that makes their money on an hourly base, I’m also interested in how the rise of coding assistants may or may not affect our business. So, for the past couple of weeks I tried some of the freely available assistants to get a feel as to how they perform and what they were good at.

Performance

I’ve used to assistants in different scenarios, however I did not do a structured evaluation. My first attempt was – obviously – paired with a small Rust code base. The impression there was not all that good. Neither did the assistant seem to actually understand what I was trying to do, nor did it produce great output in general. Fine, I thought, so the Rust code in that particular assistants training data might have been too little. The next try was writing a small Python script from scratch for a very narrow use case. Here the assistant performed impressively, being able to explain code and also able to point out my bugs and suggest what to change to fix them. That left me quite hyped for the tech. The last experiment again was writing a Python script for an upcoming blog post (part 3 of the Zephyr series). The assistants performance there was mixed at best. The task mostly involved light text processing and it would insist on using regular expressions to do that, which is fine in a lot of cases, but conistently fail to produce a regex that actually solved the problem at hand. Even providing it with several example in- and outputs wouldn’t help. At some point I told it to not use regexes altogether, wich got it on the right track and finally got me working code. Mind you we’re talking about ~50 Lines, which took me north of 90 minutes to complete using the AI.

Interference

One thing that was obvious fairly early is that the assistant plugins for VS code will try to make intellisens like suggestions that are – quite often – either useless or outright wrong. Worse in the case of my experiments they seems to override the suggestions by the respective language servers which made for a worse experience.

Impact

The most immediate effect of using an assistant I noticed was that I more or less immediately stopped looking at the code that was generated for me. I basically took it like a Stackoverflow snippet and just used it. Smart? Not so much. But also not a conscious decision on my part. This led to me needing 90 minutes to write 50 lines, which I probably could’ve written myself in half the time, had I bothered to do so. Instead I kept going back and forth with the assistant giving it input and trying out what it spat out. This might be just me being a bit lazy and not being generalizable, however, there ample evidence of programmers operating very mucht the same in the past, basically just pasting code from Stackoverflow without ever trying to understand it. So, I assume there’s some danger here if people are not disciplined in their use of an assistant. Given the performance I’ve seen I wouldn’t be surprised if the AI would generate code with less obvious failure modes. If that is coupled with programmers relying on the AI we have a recipe for huge quality problems. Given the current state I’d say that professional use of an AI assistant means:

Look at the code it generates and understand it.
Stay in the loop, i.e. make sure you write at least part of the code to improve your own understanding.
Treat the code as you would treat your own code, i.e. make sure it has decent test coverage. If tests are generated by the AI, the above rules apply to the tests as well.
If static code analysis is a thing for your project, don’t exempt the AI’s code from that.

If you cannot adhere to the above rules keep your fingers off coding assistants – at least for now, until the tech is more mature.

Business Impact

The experiments left me a bit sceptical. On the one hand these assistants are not going to go away anymore as they – at least superficially – seem to somewhat improve productivity. However, the fact that there is a strong temptation to just take the AI’s code as given without bothering to understand it, is very troublesome. Especially for junior programmers who – in my mind – would stand to gain most from having an automated sparing partner, this might stifle growth if not actively worked against. From a pure business view there’s two sides:

A business using AI can get junior programmers somewhat more productive faster, meaning the programmers will be able to contribute to the bottom line sooner.
However the business will also not be able to sell as many “hours” in the future, as customers will expect increased efficiency through AI use.

In the best case these two points will balance each other out. Today the cost of the assistants is low enough that, even for countries with comparably low wages, we can charge a couple of hours more for a junior programmer per year (and we’re literally talking “a couple of hours” here!), the cost of the assistant is already armortized.

Posts

Leave a Reply Cancel reply