Architecture

Why a controller? #

thurkube replaces the Argo Workflows -based agent orchestration in thurspace with a single, purpose-built operator. Argo Workflows is general-purpose; agent runs are not. By collapsing the abstraction down to one CRD per agent run, the surface a user has to learn shrinks from "templates, parameters, artifacts, retry strategies" to a flat spec with named references.

The controller side handles everything that used to be repeated YAML: ConfigMap render, ServiceAccount + RBAC creation, env-var wiring from Secrets, optional PVC, schedule translation. Spec changes redeploy automatically via a configHash stored in .status .

API group #

All eight CRDs live in thurkube.thurbeen.eu/v1alpha1 and are namespaced. References between them are by name within the same namespace — there is no cross-namespace addressing.

api group
AgentJob       # orchestration unit, references the others
AgentRuntime   # container image + mount conventions
AgentAuth      # Secret-key reference for the auth token
AgentRole      # allowedTools list
AgentSkill     # reusable skill from a GitHub repository
McpServer      # local command or remote URL
Repository     # GitHub repo + token
ClusterAccess  # RBAC rules → SA + ClusterRole + Binding

Reconciler loop #

The controller is built on kube-rs 's Controller primitive with watchers on AgentJob and on every owned child kind. Reconcile is idempotent and runs on every change to the parent or a watched child.

reconcile flow
watch AgentJob  resolve refs
   render ConfigMap (compute configHash)
   apply ServiceAccount + RBAC (if ClusterAccess set)
   apply PVC (if persist=true)
   apply Job / CronJob (server-side, fieldManager=thurkube)
   patch .status: phase, lastRunTime, configHash, owned[]
   requeue 5 min (steady) or 30 s (transient error)

schedule present ⇒ emit a CronJob with the supplied timezone ; schedule absent ⇒ emit a one-shot Job . The shape of the underlying pod template is the same.

Owned resources #

The controller watches and owns five child kinds. Each gets the standard label set so you can list everything for a job in one query.

Child kind Created when Purpose
Job schedule is unset One-shot agent run.
CronJob schedule is set Recurring agent runs on the supplied cron expression.
ConfigMap Always Rendered runtime configuration mounted at the runtime's configPath .
ServiceAccount Always Pod identity. Bound to a ClusterRole when ClusterAccess is set.
PersistentVolumeClaim persist: true Mounted at the runtime's persistPath for cross-run state.

Standard labels applied to every child:

labels
app.kubernetes.io/managed-by:           thurkube
thurkube.thurbeen.eu/agentjob:          <name>
thurkube.thurbeen.eu/agentjob-namespace: <ns>
thurkube.thurbeen.eu/owner-uid:         <uid>

Drift detection #

Every reconcile hashes the rendered ConfigMap data and stores the digest in .status.configHash . When the hash changes, the controller redeploys the underlying Job/CronJob so a new run picks up the new config. Cluster operators get a clean signal in kubectl describe without having to diff Pod specs.

RBAC materialization #

When an AgentJob references a ClusterAccess , the controller materializes a per-job ServiceAccount, ClusterRole, and ClusterRoleBinding from the supplied rules . That avoids leaking permissions across agent jobs and gives each job a least-privilege identity scoped to exactly what its prompt needs.

ClusterAccess is a thin schema over Kubernetes' PolicyRule type — anything you can write in a ClusterRole, you can write here.

Health & readiness #

The controller process exposes /healthz (always 200 once started) and /readyz (200 only after the first successful watcher init) on :8080 . The Helm chart wires these into Kubernetes livenessProbe and readinessProbe via configurable port and intervals.

Logging uses tracing with a JSON formatter to stdout — pick it up with whatever logging stack the cluster already runs. Set RUST_LOG (or logLevel in values.yaml ) to tune verbosity.

Module layout #

Module Responsibility
src/main.rs Entry point, CLI ( --crd prints all CRDs as YAML), tokio runtime, signal handling.
src/crd/ One file per CRD. Defines *Spec / *Status with schemars -derived JSON Schema.
src/controller/ Reconciler entry ( mod.rs ), the AgentJob reconcile ( agentjob.rs ), build.rs for child resources, resolve.rs for ref lookup, finalizer.rs , status.rs , context.rs .
src/health.rs Hyper server for /healthz and /readyz .
tests/architecture_rules.rs Module isolation enforcement at test time.
tests/crd_schema.rs CRD YAML validity and schema invariants.
tests/e2e.rs Real-cluster lifecycle tests ( #[ignore] , run with --ignored against k3d in CI).