Skip to main content

Operator Configuration

BubuStack controllers read operator configuration from a ConfigMap. The default installation wires the ConfigMap referenced by the --config-namespace and --config-name flags, and the sample values live in config/manager/operator-config.yaml.

Who this is for

  • Platform engineers operating the BubuStack controller.
  • Anyone tuning scheduling, limits, or defaults cluster-wide.

What you'll get

  • Where configuration lives and how it is loaded.
  • The precedence rules for overrides.
  • A complete list of supported config keys and defaults.

Only the keys listed here are consumed by the controller. Unknown keys are ignored.


Precedence

Manager flags and environment variables are process-level settings. They are not sourced from the ConfigMap and are not overridden by it.

  1. StepRun overrides (when supported for a setting).
  2. Story policy overrides (spec.policy).
  3. Operator ConfigMap defaults.

Defaults When The ConfigMap Is Missing

If the operator ConfigMap does not exist at startup, the controller continues with built-in defaults from DefaultOperatorConfig (which uses DefaultControllerConfig).


Manager Flags And Environment

These settings are read at process start from flags and environment variables. They are not part of the operator ConfigMap.

Flag / EnvDefaultPurpose
--config-namespacebobrapet-systemNamespace containing the operator ConfigMap.
--config-namebobrapet-operator-configName of the operator ConfigMap.
--metrics-bind-address0Metrics endpoint bind address. 0 disables the metrics service.
--metrics-securetrueServe metrics over HTTPS with authn/authz when enabled.
--metrics-cert-pathemptyDirectory containing metrics server TLS certs.
--metrics-cert-nametls.crtMetrics server TLS certificate filename.
--metrics-cert-keytls.keyMetrics server TLS key filename.
--webhook-cert-pathemptyDirectory containing webhook TLS certs.
--webhook-cert-nametls.crtWebhook TLS certificate filename.
--webhook-cert-keytls.keyWebhook TLS key filename.
--health-probe-bind-address:8081Liveness/readiness probe bind address.
--leader-electfalseEnable leader election for the controller manager.
--leader-election-idd3a8b358.bubustack.ioLeader election ID to prevent clashes.
--leader-election-namespaceemptyNamespace for leader election resources. Defaults to the in-cluster namespace and fails out of cluster if unset.
--enable-http2falseEnable HTTP/2 for metrics and webhook servers.
--tracing-init-timeout10sTimeout for OTLP tracer initialization.
--tracing-shutdown-timeout5sTimeout for OTLP tracer shutdown.
ENABLE_WEBHOOKStrueWhen set to false, disables admission webhooks.

Reload behavior

The operator watches its ConfigMap and applies updates at runtime. Settings that only affect controller wiring (for example, worker counts, rate limiter options) and templating evaluator construction are captured at startup and require a restart to take effect. Other values that are resolved during reconcile may update live.


Scheduling Controls

spec.policy.queue assigns StoryRuns to a scheduling queue. Queue names must be DNS-1123 labels and are lowercased at runtime.

spec.policy.priority defines ordering within the queue. Higher values run first. Ordering is strict within the queue and enforced without preemption. Priority aging can be enabled per queue to prevent starvation.

KeyDefaultPurposeWhy it exists
storyrun.global-concurrency0Caps total running StepRuns across all queues.Provides a safety valve for cluster load.
storyrun.queue.<name>.concurrency0Caps running StepRuns within a queue.Prevents one workload class from starving others.
storyrun.queue.<name>.default-priority0Priority used when spec.policy.priority is unset.Keeps ordering deterministic without forcing per-story config.
storyrun.queue.<name>.priority-aging-seconds60Adds effective priority based on queued time.Prevents starvation under strict priority ordering.

Global Controller Behavior

KeyDefaultPurposeWhy it exists
controller.max-concurrent-reconciles10Global reconcile worker cap (fallback for per-controller zeros).Prevents runaway reconcile fan-out.
controller.requeue-base-delay0 (uses per-controller defaults)Base delay for exponential requeue backoff.Avoids hot loops during transient failures.
controller.requeue-max-delay0 (uses per-controller defaults)Maximum requeue backoff delay.Keeps retries bounded in time.
controller.cleanup-interval1hInterval for background cleanup loops.Prevents GC from running too often.
controller.reconcile-timeout30sDeadline for a single reconcile loop (0 disables the deadline).Guards against stuck reconciles.
controller.max-story-with-block-size-bytes65536Upper bound for Story with block size.Protects etcd and API server memory.

Images And Resources

KeyDefaultPurposeWhy it exists
images.default-engramemptyDefault Engram image when not specified elsewhere.Provides a cluster-wide baseline image.
images.default-impulseemptyDefault Impulse image when not specified elsewhere.Provides a cluster-wide baseline image.
images.pull-policyIfNotPresentImage pull policy for workloads.Controls cache vs freshness.
resources.default.cpu-request100mDefault CPU request for workloads.Ensures fair scheduling.
resources.default.cpu-limit500mDefault CPU limit for workloads.Prevents noisy neighbors.
resources.default.memory-request128MiDefault memory request.Ensures fair scheduling.
resources.default.memory-limit512MiDefault memory limit.Prevents memory exhaustion.
resources.engram.cpu-requestempty (inherits resources.default.cpu-request)Engram-specific CPU request override.Tailors runtime costs for Engrams.
resources.engram.cpu-limitempty (inherits resources.default.cpu-limit)Engram-specific CPU limit override.Prevents a single Engram from saturating nodes.
resources.engram.memory-requestempty (inherits resources.default.memory-request)Engram-specific memory request override.Reserves memory for Engrams.
resources.engram.memory-limitempty (inherits resources.default.memory-limit)Engram-specific memory limit override.Prevents OOM cascades.

Retry And Timeouts

KeyDefaultPurposeWhy it exists
retry.max-retries3Default retry limit for StepRuns.Balances resilience vs load.
timeout.default-step30mDefault step timeout when not specified.Prevents infinite execution.
timeout.approval-defaultinherits timeout.default-stepDefault approval timeout for gate steps.Avoids stale approval waits.
timeout.external-data-defaultinherits timeout.default-stepTimeout for external data access.Bounds waiting on external systems.
timeout.conditional-defaultinherits timeout.default-stepTimeout for conditional evaluation.Prevents stuck conditional loops.

Security Defaults

KeyDefaultPurposeWhy it exists
security.run-as-non-rootfalseForces workloads to run as non-root.Reduces privilege risk.
security.read-only-root-filesystemfalseMounts root filesystem as read-only.Limits write surface in containers.
security.allow-privilege-escalationfalseDisables privilege escalation.Blocks common container escapes.
security.drop-capabilitiesALLLinux capabilities to drop.Minimizes kernel attack surface.
security.run-as-user0Default UID for workloads.Allows a cluster-wide UID baseline (0 means root).
security.automount-service-account-tokenfalseDefault SA token mount toggle.Reduces token exposure.
security.service-account-namedefaultDefault ServiceAccount name.Ensures predictable identity.

Job And Retention

KeyDefaultPurposeWhy it exists
job.backoff-limit3Job retry limit for failed Pods.Avoids infinite Job retries.
job.ttl-seconds-after-finished3600TTL for completed Jobs.Cleans up finished Pods.
streaming.ttl-seconds-after-finished0TTL for streaming workloads.Keeps long-lived runs unless configured.
job.restart-policyNeverRestart policy for job Pods.Matches batch semantics.
storyrun.retention-seconds86400Retention for StoryRun objects.Controls history retention vs etcd load.

Loop Primitives

KeyDefaultPurposeWhy it exists
loop.max-iterations10000Maximum iterations per loop step.Prevents unbounded loops.
loop.default-batch-size100Default batch size for loop iterations.Controls processing granularity.
loop.max-batch-size1000Maximum batch size per iteration.Prevents memory issues.
loop.max-concurrency10Concurrent loop iterations.Controls parallelism within loops.
loop.max-concurrency-limit50Hard cap on loop concurrency.Safety valve for cluster load.

Templating

KeyDefaultPurposeWhy it exists
templating.evaluation-timeout30sTimeout for template evaluation.Prevents slow templates from blocking reconciliation.
templating.max-output-bytes65536Maximum evaluated output size.Avoids large payload explosions.
templating.deterministicfalseRestricts non-deterministic helpers.Improves replay safety.
templating.offloaded-data-policyinjectHow to handle templates that reference offloaded data. error: fail evaluation. inject: materialize pod to hydrate + re-evaluate (requires materialize-engram). controller: in-process S3 hydration (no extra pod). Per-Story override: annotation bubustack.io/controller-resolve: "true".Controls correctness vs convenience.
templating.materialize-engrammaterializeEngram used to materialize offloaded refs.Centralizes hydration behavior.

Reference, Telemetry, Debug

KeyDefaultPurposeWhy it exists
references.cross-namespace-policydenyCross-namespace reference policy.Enforces tenant boundaries.
telemetry.enabledfalseTracing toggle.Allows controlled overhead.
telemetry.trace-propagationfalseControls trace propagation.Keeps distributed traces consistent.
debug.enable-verbose-loggingfalseIncreases log verbosity.Useful for diagnostics.
debug.enable-step-output-loggingfalseLogs step outputs.Debugging with caution for sensitive data.
debug.enable-metricsfalseMetrics collection toggle.Allows runtime visibility.

Note: debug.enable-metrics controls metric emission in the operator. The metrics endpoint is controlled by the manager flags (--metrics-bind-address, --metrics-secure) in bobrapet/cmd/main.go.

OpenTelemetry export

The operator exports traces using the OTLP gRPC exporter. Configure the exporter via standard OTEL environment variables (set on the controller manager pod):

  • OTEL_EXPORTER_OTLP_ENDPOINT (e.g., otel-collector:4317)
  • OTEL_TRACES_EXPORTER (otlp or none)
  • OTEL_SERVICE_NAME (overrides default service name)
  • OTEL_RESOURCE_ATTRIBUTES (additional resource tags)

Example:

export OTEL_EXPORTER_OTLP_ENDPOINT=otel-collector.observability:4317
export OTEL_TRACES_EXPORTER=otlp
export OTEL_SERVICE_NAME=bobrapet-operator

StoryRun Controller

KeyDefaultPurposeWhy it exists
storyrun.max-concurrent-reconciles8Worker count for StoryRun reconciles (0 uses controller default).Prevents controller overload.
storyrun.rate-limiter.base-delay50msBase delay for StoryRun backoff.Avoids hot loops.
storyrun.rate-limiter.max-delay5mMax delay for StoryRun backoff.Keeps retries bounded.
storyrun.max-inline-inputs-size1024Max inline spec.inputs size in bytes.Prevents etcd bloat.
storyrun.binding.max-mutations-per-reconcile8TransportBinding mutation budget.Limits per-reconcile work.
storyrun.binding.throttle-requeue-delay2sDelay when mutation budget is hit.Applies backpressure.

StepRun Controller

KeyDefaultPurposeWhy it exists
steprun.max-concurrent-reconciles15Worker count for StepRun reconciles (0 uses controller default).Prevents overload under high fan-out.
steprun.rate-limiter.base-delay100msBase delay for StepRun backoff.Avoids hot loops.
steprun.rate-limiter.max-delay2mMax delay for StepRun backoff.Keeps retries bounded.

Story Controller

KeyDefaultPurposeWhy it exists
story.max-concurrent-reconciles5Worker count for Story reconciles (0 uses controller default).Protects API server from bursty updates.
story.rate-limiter.base-delay200msBase delay for Story backoff.Avoids hot loops.
story.rate-limiter.max-delay1mMax delay for Story backoff.Keeps retries bounded.
story.binding.max-mutations-per-reconcile4TransportBinding mutation budget.Limits per-reconcile work.
story.binding.throttle-requeue-delay3sDelay when mutation budget is hit.Applies backpressure.

Engram Controller

KeyDefaultPurposeWhy it exists
engram.max-concurrent-reconciles5Worker count for Engram reconciles (0 uses controller default).Protects API server from bursty updates.
engram.rate-limiter.base-delay200msBase delay for Engram backoff.Avoids hot loops.
engram.rate-limiter.max-delay1mMax delay for Engram backoff.Keeps retries bounded.
engram.default-max-inline-size4096Default inline size for Engram IO.Triggers offloading when exceeded.
engram.default-grpc-port50051Default gRPC port.Ensures consistent connectivity.
engram.default-grpc-heartbeat-interval-seconds10Default gRPC heartbeat interval.Detects disconnected clients.
engram.default-storage-timeout-seconds300Default storage timeout.Bounds remote storage calls.
engram.default-graceful-shutdown-timeout-seconds20Default graceful shutdown timeout.Allows orderly shutdown.
engram.default-termination-grace-period-seconds30Pod termination grace period.Allows cleanup on shutdown.
engram.default-max-recv-msg-bytes10485760Max gRPC receive size.Prevents oversized messages.
engram.default-max-send-msg-bytes10485760Max gRPC send size.Prevents oversized messages.
engram.default-dial-timeout-seconds10Dial timeout for gRPC.Avoids hanging connects.
engram.default-channel-buffer-size16Channel buffer size.Bounds memory usage.
engram.default-reconnect-max-retries10Max gRPC reconnect retries.Prevents infinite reconnect loops.
engram.default-reconnect-base-backoff-millis500Base backoff for reconnect.Spreads reconnection attempts.
engram.default-reconnect-max-backoff-seconds30Max backoff for reconnect.Bounds wait time.
engram.default-hang-timeout-seconds0Hang detection timeout.Surfaces stalled connections.
engram.default-message-timeout-seconds30Message timeout for gRPC calls.Prevents stuck calls.

Storage Defaults

KeyDefaultPurposeWhy it exists
controller.storage.provideremptyDefault storage backend.Ensures a known storage target.
controller.storage.s3.bucketemptyDefault S3 bucket name.Centralizes storage location.
controller.storage.s3.regionemptyDefault S3 region.Required for AWS-compatible SDKs.
controller.storage.s3.endpointemptyS3 endpoint override.Supports non-AWS S3 backends.
controller.storage.s3.use-path-stylefalsePath-style addressing toggle.Required by some S3-compatible stores.
controller.storage.s3.auth-secret-nameemptySecret with S3 credentials.Centralizes credential lookup.
controller.storage.file.pathemptyDefault file storage path inside workload.Required for file-backed storage.
controller.storage.file.volume-claim-nameemptyDefault RWX PVC name for file storage.Enables shared file storage across workloads.

Transport Controller

KeyDefaultPurposeWhy it exists
controller.transport.grpc.enable-downstream-targetstrueInject downstream targets in batch mode.Keeps streaming topology consistent.
controller.transport.grpc.default-tls-secretemptyDefault TLS Secret for gRPC.Allows centralized TLS config.
controller.transport.heartbeat-interval30sTransport heartbeat interval.Detects missing data plane agents.
controller.transport.heartbeat-timeout2mTransport heartbeat timeout.Marks bindings stale when missed.

Impulse Controller

KeyDefaultPurposeWhy it exists
impulse.max-concurrent-reconciles5Worker count for Impulse reconciles (0 uses controller default).Protects API server from bursty triggers.
impulse.rate-limiter.base-delay200msBase delay for Impulse backoff.Avoids hot loops.
impulse.rate-limiter.max-delay1mMax delay for Impulse backoff.Keeps retries bounded.

Template Controller

KeyDefaultPurposeWhy it exists
template.max-concurrent-reconciles2Worker count for Template reconciles (0 uses controller default).Templates change rarely.
template.rate-limiter.base-delay500msBase delay for Template backoff.Avoids hot loops.
template.rate-limiter.max-delay10mMax delay for Template backoff.Keeps retries bounded.
  • Prerequisites — System dependencies and storage setup.
  • Quickstart — Get running in under 10 minutes.
  • Architecture — System architecture and module map.
  • CRD Design — CRD resource model and relationships.
  • Roadmap — What's planned and where to contribute.