Skip to content

[WIP] Add standalone control-aware bounded channel for future engine integration#2466

Draft
lquerel wants to merge 13 commits intoopen-telemetry:mainfrom
lquerel:feature/control-channel-redesign
Draft

[WIP] Add standalone control-aware bounded channel for future engine integration#2466
lquerel wants to merge 13 commits intoopen-telemetry:mainfrom
lquerel:feature/control-channel-redesign

Conversation

@lquerel
Copy link
Copy Markdown
Contributor

@lquerel lquerel commented Mar 30, 2026

** Not ready for review yet **

Change Summary

Add a new standalone control-channel crate that defines a bounded control-aware channel for OTAP node-control traffic.

This change is intended as a first step toward the broader engine redesign tracked in #2465. It introduces the control-channel design, tests, benchmarks, and documentation in a standalone form so the control-plane semantics can be reviewed and validated before integration into the engine runtime.

The new channel is designed for the control-path requirements of the OTAP engine:

  • reserved lifecycle delivery for DrainIngress and Shutdown
  • bounded and batched Ack / Nack handling
  • coalesced best-effort control for TimerTick / CollectTelemetry
  • latest-wins Config
  • bounded fairness and deadline-bounded terminal shutdown progress
  • role-specific APIs for receiver vs non-receiver nodes
  • a single-owner implementation aligned with the engine's thread-per-core model
  • FIFO blocked-sender wakeups for completion-capacity backpressure
  • explicit completion metadata support via AckMsg<PData, Meta> /
    NackMsg<PData, Meta> for future engine unwind-state integration

This PR does not yet wire the new control channel into receivers, processors, exporters, or pipeline_ctrl. That integration is planned in the later steps of #2465.

What issue does this PR close?

How are these changes tested?

  • Added unit tests for the standalone control-channel crate covering:
    • lifecycle acceptance and ordering
    • completion batching
    • bounded fairness
    • shutdown deadline behavior
    • blocked-sender wake behavior and cancellation safety
    • completion metadata preservation
    • invalid config rejection
  • Added a benchmark comparing the standalone control-aware channel against the
    current engine control-channel paths under heavy Ack / Nack traffic and
    mixed control noise
  • Verified with: cargo xtask check

Benchmark results

Ack/Nack Only

Mode Implementation Time (ms) Throughput (Melem/s) Delta vs current
Local current_local 7.22 13.85 baseline
Local control_aware 6.39 15.64 ~+12.9% thrpt / -11.4% time
Shared baseline current_shared 16.68 5.99 reference only

Ack/Nack + Control Noise

Mode Implementation Time (ms) Throughput (Melem/s) Delta vs current
Local current_local 7.51 13.31 baseline
Local control_aware 6.49 15.42 ~+15.8% thrpt / -13.6% time
Shared baseline current_shared 17.01 5.88 reference only

Notes:

  • control_aware is the standalone single-owner channel introduced in this PR.
  • current_shared is included as a reference against the current engine shared path.
  • The benchmark still runs on a current-thread runtime and is intended to compare the current engine paths with the standalone single-owner control-channel design before engine integration.

Are there any user-facing changes?

No user-facing behavior changes yet.

This PR adds the standalone control channel and its validation/benchmarking surface, but does not yet integrate it into the engine runtime.

@github-actions github-actions bot added the rust Pull requests that update Rust code label Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant