Architecture
Why a controller? #
thurkube replaces the Argo Workflows -based agent orchestration in thurspace with a single, purpose-built operator. Argo Workflows is general-purpose; agent runs are not. By collapsing the abstraction down to one CRD per agent run, the surface a user has to learn shrinks from "templates, parameters, artifacts, retry strategies" to a flat spec with named references.
The controller side handles everything that used to be repeated YAML: ConfigMap render,
ServiceAccount + RBAC creation, env-var wiring from Secrets, optional PVC, schedule
translation. Spec changes redeploy automatically via a
configHash
stored in
.status
.
API group #
All eight CRDs live in
thurkube.thurbeen.eu/v1alpha1
and are namespaced. References between them are by name within the same namespace —
there is no cross-namespace addressing.
AgentJob # orchestration unit, references the others
AgentRuntime # container image + mount conventions
AgentAuth # Secret-key reference for the auth token
AgentRole # allowedTools list
AgentSkill # reusable skill from a GitHub repository
McpServer # local command or remote URL
Repository # GitHub repo + token
ClusterAccess # RBAC rules → SA + ClusterRole + Binding
Reconciler loop #
The controller is built on
kube-rs
's
Controller
primitive with watchers on
AgentJob
and on every owned child kind. Reconcile is idempotent and runs on every change to the
parent or a watched child.
watch AgentJob → resolve refs
→ render ConfigMap (compute configHash)
→ apply ServiceAccount + RBAC (if ClusterAccess set)
→ apply PVC (if persist=true)
→ apply Job / CronJob (server-side, fieldManager=thurkube)
→ patch .status: phase, lastRunTime, configHash, owned[]
→ requeue 5 min (steady) or 30 s (transient error)
schedule
present ⇒ emit a
CronJob
with the supplied
timezone
;
schedule
absent ⇒ emit a one-shot
Job
. The shape of the underlying pod template is the same.
Owned resources #
The controller watches and owns five child kinds. Each gets the standard label set so you can list everything for a job in one query.
| Child kind | Created when | Purpose |
|---|---|---|
Job |
schedule
is unset
|
One-shot agent run. |
CronJob |
schedule
is set
|
Recurring agent runs on the supplied cron expression. |
ConfigMap |
Always |
Rendered runtime configuration mounted at the runtime's
configPath
.
|
ServiceAccount |
Always | Pod identity. Bound to a ClusterRole when ClusterAccess is set. |
PersistentVolumeClaim |
persist: true
|
Mounted at the runtime's
persistPath
for cross-run state.
|
Standard labels applied to every child:
app.kubernetes.io/managed-by: thurkube
thurkube.thurbeen.eu/agentjob: <name>
thurkube.thurbeen.eu/agentjob-namespace: <ns>
thurkube.thurbeen.eu/owner-uid: <uid>
Drift detection #
Every reconcile hashes the rendered ConfigMap data and stores the digest in
.status.configHash
. When the hash changes, the controller redeploys the underlying Job/CronJob so a new
run picks up the new config. Cluster operators get a clean signal in
kubectl describe
without having to diff Pod specs.
RBAC materialization #
When an
AgentJob
references a
ClusterAccess
, the controller materializes a per-job ServiceAccount, ClusterRole, and
ClusterRoleBinding from the supplied
rules
. That avoids leaking permissions across agent jobs and gives each job a least-privilege
identity scoped to exactly what its prompt needs.
ClusterAccess
is a thin schema over Kubernetes'
PolicyRule
type — anything you can write in a ClusterRole, you can write here.
Health & readiness #
The controller process exposes
/healthz
(always 200 once started) and
/readyz
(200 only after the first successful watcher init) on
:8080
. The Helm chart wires these into Kubernetes
livenessProbe
and
readinessProbe
via configurable port and intervals.
Logging uses
tracing
with a JSON formatter to stdout — pick it up with whatever logging stack the cluster
already runs. Set
RUST_LOG
(or
logLevel
in
values.yaml
) to tune verbosity.
Module layout #
| Module | Responsibility |
|---|---|
src/main.rs |
Entry point, CLI (
--crd
prints all CRDs as YAML), tokio runtime, signal handling.
|
src/crd/ |
One file per CRD. Defines
*Spec
/
*Status
with
schemars
-derived JSON Schema.
|
src/controller/ |
Reconciler entry (
mod.rs
), the
AgentJob
reconcile (
agentjob.rs
),
build.rs
for child resources,
resolve.rs
for ref lookup,
finalizer.rs
,
status.rs
,
context.rs
.
|
src/health.rs |
Hyper server for
/healthz
and
/readyz
.
|
tests/architecture_rules.rs |
Module isolation enforcement at test time. |
tests/crd_schema.rs |
CRD YAML validity and schema invariants. |
tests/e2e.rs |
Real-cluster lifecycle tests (
#[ignore]
, run with
--ignored
against k3d in CI).
|