R&D_01 · MAIN PROJECT

Kern OS

A local-first control plane for autonomous software-building agents.

Kern OS is the environment I use to build software with AI agents, including parts of this site. You hand it a goal, like add a feature or fix a bug, and it runs a coding model through the work the way a part moves down a line: every step is planned, every change is tested before it is kept, and the whole run is recorded so you can see exactly what happened.

I built it because agents in a chat window fall apart on anything real. An hour in, you cannot say what actually changed, whether the tests genuinely passed, or how to reach the same result again. Better prompts do not fix that; the right machinery around the model does.

So every task runs through the same five stages, writes evidence at each one, and has to pass verification before a single line reaches the main branch. Any run can be replayed from its records later. It all runs on my own machines: the models can be remote, but the state and the control stay here.

  1. plan
  2. implement
  3. verify
  4. replay
  5. promote

Every mission moves through the same five stages, in order. A background service owns the loop and remembers everything; the apps on top only watch and give commands.

267 kernel modules
33 daemon subsystems
340+ integration tests
4 operator surfaces

ARCHITECTURE

Three layers

The base layer is a kernel of 267 modules. Each one is a pure function package: state in, normalized immutable JSON out, no side effects. That property is what lets the command line, the terminal UI, the desktop app and the test suite share one implementation of the business logic instead of four drifting copies.

On top of the kernel runs a Node.js daemon with 33 subsystems. It owns mission state, the autonomy loops, evidence capture, verification and promotion. If I close a window, the mission keeps running; the interfaces hold no state of their own, everything authoritative sits in the daemon and on disk.

The top layer is four interchangeable operator surfaces: a CLI, a terminal workbench, a React desktop app packaged with Tauri, and an experimental Rust frontend built with Leptos. All four render mission state through the same shared formatting modules, so a plan prints identically in a terminal and in the GUI.

ARENA

Multiple candidates, verified selection

Hard tasks rarely come out right on the first attempt, so Kern OS does not bet on one. It can hand the same task to several agents at once, each working in its own private copy of the code (a git worktree) so they never collide. Each one comes back with a finished change and a record of how it got there.

Then the candidates are scored: did it run cleanly, is its trace intact, how large is the change, does it pass the static checks. The best one goes on to review and, if needed, repair rounds, and it still has to clear critic, security and benchmark checks before it is allowed to merge.

What gets stored is the scores and fingerprints, not the full code or transcripts, so a result stays reproducible without the records ballooning over time.

POLICY

Hard limits on every change

Agents love to commit things they should not: build output, downloaded libraries, huge generated files. A shared rulebook knows over eighty kinds of those throwaway paths and checks every change against hard limits, at most eighty files and a megabyte of code, before the change is allowed to count.

It runs at every point a change can enter: candidate checks, merges, benchmark reports. The idea is to make the junk-commit failure impossible by rule, instead of hoping a review catches it.

MEMORY

Continuity between sessions

When a session ends, the knowledge should not die with it. Kern OS keeps a structured memory of facts, decisions and lessons, with links that record which piece of knowledge replaced which, so it knows not just what it learned but what it stopped believing.

A new session reads that memory instead of re-reading old chat logs. It stays sharp on the project without dragging the entire history along behind it.

BENCHMARKS

Disclosed measurement

The benchmark pipeline runs Terminal-Bench, SWE-Bench and custom suites under a disclosure contract: every result records exactly which configuration produced it, a single model or the full Kern OS loop. Submissions store compact metadata only.

This matters because agent numbers get misreported all the time. A harness that retries, repairs and verifies will beat a bare model on any benchmark, and the disclosure keeps the two from getting mixed up, including by me.

SKILLS

Gated self-improvement

Once a way of doing something proves itself across enough missions, Kern OS can save it as a reusable skill. But a skill has to earn a track record and be switched on deliberately before any mission is allowed to use it, and the compiled result stays open to inspection.

Letting a system rewrite its own abilities is the easiest way to cause quiet, lasting damage. That is why this path has more gates than anything else in the codebase.

PRACTICE

Engineering discipline

The project carries over 340 integration tests on the native Node test runner. A release gate chains typecheck, build, lint, security scan and the full suite. Missions maintain durable contract files for architecture, plan, progress, verification and risk. A complete example application, built end to end inside the system, serves as a permanent regression surface.

The same infrastructure carries my commercial work: the model dispatch, the verification discipline and the pipeline orchestration in Kern OS come directly out of producing final images and films with AI pipelines under production deadlines.