Skip to main content

Roadmap

BubuStack is a working platform. You can deploy it today. But we're honest about the gaps — and transparent about what comes next.

Want these features sooner? Join the team. Early contributors shape the project. We use an open contributor ladder: Contributor → Reviewer → Maintainer. Ship code, get recognized, grow your role. See Get Involved.

What works today

  • Batch workflows — DAG-based Stories with steps, conditions, parallel execution, gates, and waits.
  • Streaming workflows — Long-lived topologies with Bobravoz gRPC transport, backpressure, flow control.
  • Reusable components — Growing catalog of Engrams and Impulses with versioned templates.
  • GitOps-native — Every resource is a CRD. Deploy with Flux, Argo CD, or plain kubectl.
  • Observability — OpenTelemetry tracing, structured errors, step-level metrics.
  • Go SDK — Build Engrams and Impulses with testkit, conformance suites, secrets management, and the latest-only StoryTrigger / EffectClaim contract. Mounted runtime bundles, durability tightening, and streaming ABI consolidation are published follow-on tracks, not hidden backlog.
  • Component registry and CLIbubu-registry already provides a Git-backed registry plus the bubu CLI for scaffolding, search, show, pull, install, and PR-based publishing. Public catalog breadth, SemVer-aware resolution, and supply-chain metadata are still active roadmap work.
  • Bubuilder consolebubuilder already provides a web console and API for StoryRuns, StepRuns, jobs, trigger/effect resources, logs, offloaded artifacts, trace links, and YAML inspection/editing. Story authoring and production deployment hardening are still active tracks.
  • Storage offloading — S3-compatible backends for large payloads.
  • Security — RBAC, pod security defaults, guarded cross-namespace policies, and webhook validation. One manager-role reduction is still on the roadmap.

Known gaps

These are real limitations. They affect what you can build today.

GapImpactWorkaround
No feedback loops / cyclesCan't do "retry with LLM feedback until quality > 0.8". DAG-only by design.Chain executeStory with conditions
No durable executionPod dies mid-workflow, no automatic replay from checkpoint.External state store + step retries
No mutable shared stateSteps communicate only through immutable DAG edges. No shared KV store.Pass state through step outputs
No mid-execution event injectionCan't inject events into running workflows (except gate/wait primitives).Gate primitives for simple cases
No mixed batch+streamingA Story is either batch OR streaming. Not both.Separate Stories with Impulse chaining

Addressable gaps (contributors welcome)

GapImpactWorkaround today
Dynamic step spawning at runtimeHigh for AI agentsmap-reduce-adapter Engram
Custom template functionsMediumModify core/templating/funcs.go
Plugin/extension hooks on controllerMediumExtended Story Hooks (planned near-term)
State recovery across StoryRun retriesMediumExternal state store
Pub/sub between workflowsMediumExtended Story Hooks + CloudEvents (planned near-term)

Operator maintenance backlog

ItemWhy it mattersStatus
Transport streaming type consolidationTransport streaming settings are duplicated across api/v1alpha1 and api/transport/v1alpha1 (~530 LOC). Consolidating to a single source reduces maintenance burden and eliminates the adapter layer in pkg/transport/settings.go.Backlogged — CRD schema change required
Deprecated API cleanupRemove no-op legacy controllers (internal/controller/transport/), migrate GetEventRecorderFor to GetEventRecorder (6 controllers), clean stale nolint suppressions.Backlogged — low effort, safe

Operator hardening backlog (current tree)

One verified architectural item remains in the operator:

ItemWhy it mattersCurrent safety postureWhat remains
Manager RBAC redesign for Secrets and runner identitiesThe manager still needs cluster-wide create/get/patch on Secret and ServiceAccount objects to reconcile managed runner identities, trigger-data Secrets, and authorized cross-namespace S3 auth copies.Runtime collision guards already refuse unmanaged or mismatched objects, so the current path no longer blindly adopts or overwrites existing names.Redesign ownership and namespace boundaries so secret propagation and managed runner identity keep working without a broad cluster-scoped mutation grant.

Help us improve

We welcome folks who will improve the ecosystem quality, docs, codebase, coverage, architecture, API design — or just share ideas and be active community members.

Concrete areas where we need help:

  • Test the platform — Deploy it, break it, and open issues in the owning repo. Runtime/platform bugs belong in bobrapet; docs/site bugs belong in website. Bug reports are contributions. Every issue helps us improve.
  • Improve documentation — Fix gaps, add examples, clarify confusing sections. Good docs lower the barrier for everyone.
  • Registry and CLI — Expand bubu-registry package coverage, tighten version resolution and validation, and improve bubu publishing/discovery workflows.
  • Bubuilder UX and authoring — Extend the inspector-first console, improve workflow authoring, and harden shared-cluster deployment defaults.
  • Operator security model hardening — Redesign manager ownership/RBAC so secret propagation and managed runner identities keep working without broad cluster-wide Secret / ServiceAccount mutation grants.
  • New testkit development — Improve harnesses, add assertion helpers, expand conformance suites.
  • New storage backends — GCS, Azure Blob, and beyond. The S3 interface is the current boundary.
  • New SDKs — Python SDK, TypeScript SDK. Same ABI contract as the Go SDK.
  • New transport operators — Community-contributed transport adapters beyond Bobravoz gRPC. The Agentic Ingress Layer will provide a framework for pluggable protocol drivers.
  • New Engrams and Impulses — Expand the catalog. Every new component makes the platform more useful.
  • Ingress protocol drivers — SIP, WebRTC, MQTT, and other inbound protocol adapters for the Agentic Ingress Layer.

SDK contributor backlog (latest contract only)

The Go SDK now follows the latest-only trigger and effect contracts, but the pre-release hardening backlog is still real. The items below are already published and should be treated as the active SDK work queue, not as vague future ideas.

Current pre-release rules:

  • Optimize for the latest contract and the simplest correct implementation.
  • Do not add backward-compatibility layers unless there is an explicit upgrade promise to protect.
  • Keep StoryTrigger as the trigger-admission path and EffectClaim as the effect-reservation path until there is a deliberate contract change.
  • Remove deprecated or transitional behavior instead of preserving it for hypothetical users.

Active SDK repo work

ItemWhy it mattersCurrent track
ExecuteEffectOnce completion orderingA completed EffectClaim must not outrun the StepRun mirror or replay semantics become ambiguous.bubu-sdk-go#68
Trigger admission and status-write pressureHigh-rate impulses still create avoidable API churn through polling and hot status patches.bubu-sdk-go#66, bubu-sdk-go#67, bubu-sdk-go#69, RFC #78
Durable replay state for effects and signalsBounded status rings are not the right durable authority for replay and dedupe decisions.bubu-sdk-go#70, bubu-sdk-go#71, RFC #76
Streaming packet ABI cleanupThe latest contract must settle raw-binary versus structured-envelope behavior before other SDKs are added.bubu-sdk-go#72, bubu-sdk-go#73, bubu-sdk-go#74, RFC #77
Runtime-bundle adoptionThe SDK needs a deliberate mounted-bundle loading path for the artifact-backed runtime plan.bobrapet#39
Docs, Godoc, and low-risk helper cleanupSmall clarity fixes are still welcome as long as they do not widen contracts or reintroduce legacy behavior.Start in docs/sdk/go-sdk.md, exported APIs, and targeted tests.

Bobravoz transport hardening backlog

Bobravoz gRPC is a first-class transport surface in the published roadmap. These are the active latest-only hardening tracks for the hub and connector runtime.

ItemWhy it mattersCurrent track
Hub identity binding and connector supersession fencingThe hub must trust workload identity, not caller metadata, and it must stop superseded connectors from continuing to influence live streams.bobravoz-grpc#44, bobravoz-grpc#45
Buffer sizing and watch-driven completion trackingDefault hub limits and polling-based completion loops still create unnecessary memory and API pressure.bobravoz-grpc#46, bobravoz-grpc#47, bobravoz-grpc#52
Telemetry and endpoint derivation cleanupHigh-cardinality metrics and hardcoded endpoint derivation make production rollouts harder to reason about and scale.bobravoz-grpc#48, bobravoz-grpc#49, RFC #78
Admission availability postureThe current fail-closed webhook stance needs either high availability or an explicitly different operational posture.bobravoz-grpc#50
Test harness isolationBobravoz e2e work must stop depending on ambient kube state and broad cleanup outside owned resources.bobravoz-grpc#51

Registry and Bubuilder backlog

bubu-registry and bubuilder already exist. The work here is about hardening and expanding them, not inventing them from scratch.

ItemWhy it mattersCurrent track
Registry package guaranteesThe registry is intentionally minimal today: latest resolution is lexicographic, and there is no signing, provenance, digest pinning, or OCI packaging yet.bubu-registry
Registry catalog growth and publishing docsThe Git-backed workflow works, but official package coverage, examples, and publishing guidance still need to grow so component authors can use it without reverse-engineering the repo layout.bubu-registry, examples, website
Bubuilder authoring workflowThe console is already useful for inspection and YAML operations, but the Story Builder is still a placeholder and authoring flows need to catch up with the runtime inspector.bubuilder
Bubuilder production postureToken-based auth, redaction, and storage integration exist today, but shared-cluster deployments still need clearer production guidance and safer default posture.bubuilder, website

Contribution guardrails for SDK work

  • Prefer additive tests first, then small code changes.
  • Prefer removing pre-release complexity over adding compatibility shims.
  • Do not reintroduce removed paths such as direct client-created StoryRun admission, legacy binding env payloads, or Secrets.Raw() unless the roadmap explicitly calls for a supported migration path.
  • If a path only exists to preserve an unreleased older shape, delete it instead of adding another shim.
  • Do not widen schemas or transport behavior “for compatibility” unless there is a real user-facing upgrade contract to preserve.
  • If a change touches streaming or trigger/effect semantics, include exact failure mode and verification steps in the PR.
  • If a change needs cross-repo coordination (core, bobravoz-grpc, bobrapet), open a design discussion before implementation.

Release & activity signals

Track project activity from these source-of-truth pages:

Vision

Where we want to take BubuStack, roughly in priority order.

Near-term

  • Manager RBAC redesign (#38) — Preserve secret propagation and managed runner identities without broad cluster-wide Secret / ServiceAccount mutation grants.
  • Artifact-backed runtime payload delivery (#39) — Replace env-heavy runtime payload injection with mounted runtime bundles backed by Secret, ConfigMap, or shared storage depending on sensitivity and size. This should reduce pod-template churn, simplify SDK/runtime loading, and make large evaluated configs easier to inspect. It also requires companion SDK support so components can load runtime context from mounted files instead of only env vars.
  • Unified runtime durability contract (RFC #76) — Settle where signals, effects, and Bobravoz replay store durable truth so StepRun.status stays the summary mirror and replay decisions remain correct across restarts.
  • Canonical streaming ABI (RFC #77) — Finalize raw-binary versus structured-envelope rules, authoritative MessageID / TimestampMs fields, and packet-shape constraints across tractatus, Bobravoz, and the Go SDK before additional language SDKs ship.
  • Runtime telemetry offload path (RFC #78) — Move high-rate signal, effect, impulse, and streaming-runtime activity off hot CRD status writes while keeping useful operator observability.
  • Extended Story Hooks (#40) — Inject events into running workflows from impulses, controllers, or external systems. Extends the existing lifecycle hook mechanism (steprun.ready, storyrun.ready) with external hook injection via the transport hub. Solves mid-execution event injection and enables feedback loops, cross-workflow pub/sub, and plugin hooks — without forking the operator.
  • Loop primitive (#41) — Bounded iteration with exit conditions. Implemented as recursive executeStory under the hood. Doesn't violate DAG model.
  • Workflow checkpointing (#42) — Durable execution semantics with automatic recovery from last checkpoint.
  • CloudEvents adoption (#43) — Standardize impulse and hook events as CloudEvents for interop with the CNCF ecosystem (Knative Eventing, Argo Events, Tekton Triggers). Low effort, high strategic value. This is also the boundary future SDK helpers and external runtimes should target instead of repo-local event wrappers.
  • Registry hardening and catalog growth — Keep the Git-backed registry simple while adding SemVer-aware resolution, stronger validation, provenance/signing metadata, and broader official/community package coverage.
  • Bubuilder authoring and deployment UX — Move beyond the current inspector-first console with a real Story builder, better catalog/edit flows, and clearer production auth/deployment guidance.

Medium-term

  • External integration architecture (RFC) — Agentic Ingress Layer (SIP, Twilio, WebRTC, MQTT, HTTP SSE), production MCP Gateway (session routing, circuit breakers, health monitoring), and Native A2A protocol support (AgentCard CRD for agent discovery). See the RFC discussion for the full architecture proposal. Part of this work is clarifying whether Bobravoz-style external runtimes are samples or first-class ingress surfaces, plus how they attach correlation/session metadata to the transport layer.
  • Transport type consolidation (#44) — Merge duplicated streaming settings across api/v1alpha1 and api/transport/v1alpha1 (~530 LOC).
  • Storage backend expansion (#45) — Add GCS and Azure Blob drivers alongside existing S3-compatible backends.
  • Python SDK (RFC) — Same ABI contract as Go SDK. CloudEvents adoption makes the event contract language-agnostic, reducing SDK-specific code, but the transport/config ABI in tractatus and core also needs to converge on more typed contracts than today's flexible map-shaped transport config. It also depends on the canonical packet contract in RFC #77.
  • TypeScript SDK (RFC) — Same ABI contract as Go SDK. Like the Python SDK, this depends on tightening the shared transport/config ABI instead of carrying map-shaped transport descriptors into every new SDK. It also depends on the canonical packet contract in RFC #77.

Long-term

  • Multi-cluster federation (#46) — Global workflows across regions. CloudEvents and Agentic Ingress provide the cross-cluster event transport layer.
  • Compliance primitives (#47) — Audit trail CRDs, cost attribution, EU AI Act traceability. CloudEvents envelope provides standardized provenance fields.
  • Mixed batch+streaming Stories (#48) — Single Story with both patterns. Extended Hooks provides the data bridge between batch and streaming steps within a StoryRun.

How to pick up work

  1. Browse operator/runtime issues, Bobravoz transport issues, Go SDK issues, registry/CLI issues, Bubuilder issues, or website/docs issues, especially anything tagged good first issue or help wanted.
  2. Use GitHub Discussions to coordinate cross-repo work, especially registry policy, Bubuilder UX/API direction, and runtime contract changes.
  3. For large features, open a design discussion before writing code.
  4. See Get Involved for contribution guidelines and the contributor ladder.