Can we surrender expertise to AI?

Accidentally a v4l2loopback contributor

In early 2025 I somewhat accidentally stumbled into updating a widely used open-source Linux module called v4l2loopback.

“Wait a second, isn’t stephematishun just a data scientist?” I hear you say?

Well yes, but I am also part programmer, and am quite chuffed to have made a contribution to a wider audience than my usual statistical work ☺️.

After a long grind my updates were accepted, closing several open issues. Since then I’ve continued my involvement as a reviewer.

Two recent submissions, in particular, have stood out for how clearly they leant on AI to create. And the differences between my submission versus these is stark. It leads me to question:

Are we ready to surrender expertise to an algorithm, a machine that cannot think, feel, or reason?

For a similar perspective on this, check out Dominik Grabiec’s blog post What AI Really Is. Anyway, back to my story.

What is v4l2loopback and how did I get involved?

What does v4l2loopback do? It allows a user to create a ‘virtual’ camera (loopback) that can be opened in Firefox, Teams etc. Its often used when adding effects to a webcam with OBS Studio, gstreamer, or ffmpeg, something like:

    📹️      → OBS Studio    → loopback     → Zoom
/dev/video0   ∟ Add effects   /dev/video10

I came across this module while making the linuxgreenscreen demo. I had trouble accessing the loopback device via linuxpy - a nice little Python package to access Linux subsystems. You probably aren’t interested in the details, but here are the main issues: linuxpy #52, #55, v4l2loopback #191 and #598.

I started out with some naive and bad fixes in November and December 2024; see PR #599.

It’s not that they didn’t, at least partly, fix the immediate issue.

They did.

But I’d look at parts of the code and wonder:

Why do we do this, and why here?

It felt broken. I didn’t understand the work.

Becoming a serious contributor to the module

On and off over the next two months, I investigated v4l2loopback further. Note that:

I had no background in the Linux Userspace API - Video for Linux (V4L2) which outlines how to provide a camera device.
I had no experience with kernel development.
I knew some C and had experience with concurrency.

Still, this was a big learning effort. For example:

There are about 21,000 words of relevant documentation (see e.g. §1.1, §1.25, §3, §4.1, and some §7) - or 30,000 tokens if we’re talking in today’s favourite terms.
There’s also code in the linux kernel that I referred to; in particular some interesting comments about polling (torvalds/linux 726daf6).
Not to mention v4l2loopback.c itself weighs in at 3200 lines - or another 29,000 tokens 😉.

There were many fun deadlocks 🧱 along the way. If you learn nothing else from this, take my advice: use a virtual machine when testing kernel code, e.g. with virt-manager/QEMU.

By January 2025 I had a draft for what I considered a solid ‘refresh’ of the module. I wasn’t finished, however, this effort was serious, so:

I tested agianst the v4l2-compliance tool;
I ran tests and examples from v4l2loopback of common use cases and features (e.g. timeout image);
I performed ‘in the wild’ usage i.e. with browsers, OBS Studio, and more.

The process was about more than simply generating code. It was a careful effort to learn and develop myself as well as the module. I do not see such learning and development with LLM-assisted submissions.

Reviewing LLM-assisted submissions

Truly unhinged output

The first LLM-assisted submission, around 02/2025, made some bold claims:

This pull request implements dynamic buffer management for v4l2loopback, directly addressing the TODO item “improve buffering (salsaman)” in the project roadmap.

It was immediately clear that the submission was full of issues, lies, and was unhinged. Most of the benefits claimed were not real, in fact:

It used more memory because it required an extra ring buffer of images.
It made performance worse by increasing memory overhead.
It claimed to cover important (new) use cases, but all the cases presented were catered for by the module as is.
It claimed to have tests and CI/CD, but none were submitted.
It regressed on recently-gained V4L2 UAPI compliance.

Nothing at all like the testing, learning, and development that I did was done here.

Worse, the author made copyright claims to unrelated code, and I would argue they hadn’t written any code seeing as it was all generated by an LLM.

Botched responses to feedback

The second LLM-assisted submission, circa 01/2026, was much better. This might be in part because the LLM was newer, but also the author seemed to have much stronger credentials than the first.

However, the submission changed behaviour needlessly and did not test for regression.

More annoyingly, the code would have been reasonably easy to fix by hand given my feedback. Instead, however, the author sent my feedback as input to an LLM (or agent). We then had repeatedly botched submissions, i.e. ‘context rot’. I saw:

scope expanded without justification;
new errors;
submission text that was low-information and slop-like.

It would have been much faster for everyone if the developer had simply read my feedback and made the changes I asked for.

Can we trust AI expertise?

In these two examples, we see that even an excellent developer can struggle to provide useful code with an LLM. The LLM doesn’t think like an expert.

I’d go further and say, actually, an LLM cannot think at all.

An LLM simply predict words.

Of course some would say:

“In order for the LLM to predict the next word, it needs to know what it’s talking about.” - Geoffrey Hinton

That statement is, of course, false. An LLM has no ability to know because there is no thought.

Dominik Grabiec’s blog post What AI Really Is suggests:

LLMs can generate a lot of plausible looking text … … LLMs are just text generation programs, with no actual knowledge inside of them.

We likely fool ourselves that an LLM ’thinks’ to generate text because that’s what we do to generate text. In other words, we project human attributes onto a machine.

Dominik also says (and I agree):

Given that the (LLM-generated) text is also unreliable and with questionable accuracy, it means that the costs of reading through and fact checking it has been externalised from the writer of the text to the people reading it.

LLMs are only useful if people do the work to attain the expertise to know when output is shit 💩.

Of course, that’s simply my opinion, and based on my own limited experience. Perhaps that experience is an outlier. Perhaps projects with stronger test suites can be more safely maintained with LLM output.

Still, I’m seeing more LLM-generated submissions that add tests. At some point then we have to ask:

“Quis custodiet ipsos custodes?” (Who watches the watchmen?)

Are we ready to let the next generation of programmers simply write prompts for LLMs, become dependent on the hardware, software, and energy needed to run them, and surrender human expertise in code?

I say, no.

Appendix

Get stuffed with your ‘good catch’, ’thanks’.

In general, I find that LLM output is needlessly personified. LLM output cannot ’thank’ because there is no feeling of gratitude behind it. The text cannot flatter with a ‘good catch’ because there is no admiration. I see output that suggests a claim to habit, experience, and use first person (‘I normally use …’) and, to me anyway, that is totally nonsensical.

And annoying.

A dickhead (personal opinion) like Richard Dawkins might contemplate the consciousness of AI but I do not.

Output that indicates perspective, feeling, and agency are a massive giveaway that LLM output is mere mimicry of the text that a human produces from thought, emotion, and for lack of a better term, spirit.

Energy Usage of LLMs

I wanted to add a little appendix to discuss energy usage.

I’ve made a back-of-envelope estimate that, had I used Claude Code to develop PR #611, each query would take about 51Wh of energy. I base this on the following rough values:

Input tokens: ~80K;
- ~30K for v4l2loopback.c;
- ~50K for carefully selected V4L2 UAPI documentation + other documents.
Output tokens: ~10K.
Energy usage per token: (https://simonpcouch.com/blog/2026-01-20-cc-impact/):
- 0.390Wh/1000 input tokens;
- 1.950Wh/1000 output tokens.

51Wh is equivalent to 4-5 hours of running my beefy laptop. Not an extreme amount of energy, especially if each query (and there may be many for a single PR) can save me some hours on my laptop.

However, how much time can LLM output truly save? If I didn’t also spend time learning about the V4L2 UAPI - the bulk of my time spent - how could I have tested or trusted the results? What would have happened to the later LLM-assisted submissions if I was not capable of reviewing them?

It seems to me that the true cost is difficult to estimate. In any case, I feel this is secondary to the problem of what we are sacrificing with heavy reliance on LLM output.