Handle OS signals (SIGTERM/SIGINT) for graceful pipeline shutdown by cijothomas · Pull Request #2325 · open-telemetry/otel-arrow

cijothomas · 2026-03-14T00:38:40Z

The main executable has no signal handling today— when K8s sent SIGTERM (or a local user hit Ctrl+C), the process was killed immediately without draining in-flight data.

This PR adds OS signal handling that follows the same double-signal convention as the Go OTel Collector:

First SIGINT/SIGTERM → sends graceful shutdown messages to all pipelines with a 60s drain deadline
Second signal → forces immediate exit via process::exit(1)

codecov · 2026-03-14T00:41:37Z

Codecov Report

❌ Patch coverage is 14.28571% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.55%. Comparing base (bde436e) to head (573aa7d).
⚠️ Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2325      +/-   ##
==========================================
- Coverage   87.58%   87.55%   -0.03%     
==========================================
  Files         571      571              
  Lines      194095   194550     +455     
==========================================
+ Hits       169996   170339     +343     
- Misses      23573    23685     +112     
  Partials      526      526

Components	Coverage Δ
otap-dataflow	`89.57% <14.28%> (-0.05%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.61% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.44% <ø> (ø)`
quiver	`91.91% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gouslu · 2026-03-14T18:27:30Z

rust/otap-dataflow/crates/controller/src/lib.rs

+                        // Give pipelines a generous deadline to drain (60 s by default —
+                        // matches the default Kubernetes terminationGracePeriodSeconds).
+                        let deadline =
+                            std::time::Instant::now() + std::time::Duration::from_secs(60);


nit: named constant for 60 secs?

gouslu · 2026-03-14T18:29:21Z

rust/otap-dataflow/crates/controller/src/lib.rs

+
+                        // ── Second signal: force exit ───────────────────────
+                        let signal_name = Self::recv_termination_signal().await;
+                        otel_error!(


Currently it seems like it is sync, but if this ever uses a bufferred writer (which I doubt it will) this message may not get printed before exit. Maybe an eprintln! right under this for good measure?

agree. Will swap to eprintln.

This is the purpose of raw_error!
I would prefer to use it for consistency.

lalitb · 2026-03-20T07:23:31Z

rust/otap-dataflow/crates/controller/src/lib.rs

+                        let mut errors = Vec::new();
+                        for sender in &senders {
+                            if let Err(e) = sender.try_send_shutdown(
+                                deadline,


One correctness issue here: try_send_shutdown() can drop the shutdown request if the pipeline control channel is full at signal time. That makes graceful shutdown best-effort under backpressure.

For this PR, I think the smallest fix could be a bounded retry inside the existing rt.block_on(async { ... }) block, e.g. retry try_send_shutdown() a few times with a short tokio::time::sleep(...).await between attempts before giving up and logging the error. That avoids trait changes and closes the immediate gap.

As a follow-up, we can either add a proper async shutdown send on the trait or move shutdown onto a dedicated out-of-band signal such as a watch channel.

Longer term, a dedicated out-of-band shutdown signal (e.g. watch channel per pipeline) also gives us a clean reusable ShutdownHandle that supervisor and OpAMP can call directly - same shutdown path regardless of trigger source.

sapatrjv · 2026-03-26T21:03:58Z

rust/otap-dataflow/crates/controller/src/lib.rs

+        }
+
+        #[cfg(not(unix))]
+        {


Here is the windows equivalent that handles both ctrl_c, ctrl_break.
#[cfg(windows)]
{
use tokio::signal::windows::{ctrl_c, ctrl_break};

let mut sigint = ctrl_c() .expect("failed to register Ctrl-C handler"); let mut sigterm = ctrl_break() .expect("failed to register Ctrl-Break handler"); tokio::select! { _ = sigterm.recv() => "CTRL_BREAK (SIGTERM-equivalent)", _ = sigint.recv() => "CTRL_C (SIGINT-equivalent)", }

}

FYI: On windows platform Ctrl+C can't be reliably sent to a process without console handle and it will be ignored. Safest option is to use CTRL_BREAK.

Handle OS signals (SIGTERM/SIGINT) for graceful pipeline shutdown

573aa7d

github-project-automation bot added this to OTel-Arrow Mar 14, 2026

github-actions bot added the rust Pull requests that update Rust code label Mar 14, 2026

gouslu reviewed Mar 14, 2026

View reviewed changes

lalitb reviewed Mar 20, 2026

View reviewed changes

jmacd mentioned this pull request Mar 25, 2026

Enable Memory & CPU Profiling for df_engine #2420

Open

sapatrjv reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle OS signals (SIGTERM/SIGINT) for graceful pipeline shutdown#2325

Handle OS signals (SIGTERM/SIGINT) for graceful pipeline shutdown#2325
cijothomas wants to merge 1 commit intoopen-telemetry:mainfrom
cijothomas:cijothomas/shutdown

cijothomas commented Mar 14, 2026

Uh oh!

codecov bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

gouslu Mar 14, 2026

Uh oh!

gouslu Mar 14, 2026

Uh oh!

cijothomas Mar 16, 2026

Uh oh!

jmacd Mar 19, 2026

Uh oh!

lalitb Mar 20, 2026 •

edited

Loading

Uh oh!

sapatrjv Mar 26, 2026

Uh oh!

sapatrjv Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

cijothomas commented Mar 14, 2026

Uh oh!

codecov bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gouslu Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

gouslu Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

cijothomas Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

jmacd Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sapatrjv Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

sapatrjv Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Mar 14, 2026 •

edited

Loading

lalitb Mar 20, 2026 •

edited

Loading