Skip to content

Perf test - try in windows runners#2309

Draft
cijothomas wants to merge 49 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/perfwin-1
Draft

Perf test - try in windows runners#2309
cijothomas wants to merge 49 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/perfwin-1

Conversation

@cijothomas
Copy link
Copy Markdown
Member

Attempting to see if we can replicate subset of perf runs on Windows. Given there is no dedicated perf machines, we have to try the GitHub runners.
Starting with just the idle state to see if this is even feasible in CI.

@github-actions github-actions bot added rust Pull requests that update Rust code ci-repo Repository maintenance, build, GH workflows, repo cleanup, or other chores labels Mar 13, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.71%. Comparing base (fccab3d) to head (ac2b239).
⚠️ Report is 47 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2309      +/-   ##
==========================================
- Coverage   87.73%   87.71%   -0.02%     
==========================================
  Files         578      578              
  Lines      198325   198325              
==========================================
- Hits       173992   173970      -22     
- Misses      23807    23829      +22     
  Partials      526      526              
Components Coverage Δ
otap-dataflow 89.74% <ø> (-0.02%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.61% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 52.44% <ø> (ø)
quiver 91.91% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -0,0 +1,94 @@
# 100kLRPS Performance Test (Windows) - OTLP in, OTLP out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hybrid test that starts with 100K EPS and then goes back to the idle state in the same run would be useful.
It will show:

  1. Whether the CPU/Memory characteristics return back to the expected idle state characteristics
  2. How long it takes to return back to the idle state characteristics

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent suggestion. I think we should have it for the existing ones as well, not just windows.

@sjmsft
Copy link
Copy Markdown
Contributor

sjmsft commented Mar 16, 2026

Since some environments will want to cap the max resource utilization (CPU and/or Memory) of the otel receiver/df engine, it will be good to have perf tests that run with these knobs enabled.
For example,

  • If the df engine is run under a windows os job object that constraints memory and cpu, what effect would it have on the operations?
  • If there are any back pressure knobs that are exposed by otel-arrow, then enabling those during the perf tests would also be good.

cijothomas and others added 6 commits March 17, 2026 12:46
Blocked on open-telemetry#2194

Trying to introduce batch processor to Perf tests, so as to catch ^
issues earlier. And also to actually measure the perf impact of
batching!

---------

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
Co-authored-by: Laurent Quérel <l.querel@f5.com>
…2265)

### Summary
  
The traffic generator's static data source produces hardcoded resource
attributes with no way to customize them. This makes it impossible to
load-test content-based routing pipelines (e.g. `content_router` routing
by `tenant.id`) using the built-in generator.

This adds an optional `resource_attributes` field to the traffic
generator config. It accepts three forms:

  **Single map** (all batches carry the same attributes):
  ```yaml
  resource_attributes:
    tenant.id: prod
    service.namespace: frontend
```

  **List of maps** (equal round-robin rotation per batch):
```yaml
  resource_attributes:
    - {tenant.id: prod, service.namespace: frontend}
    - {tenant.id: ppe, service.namespace: backend}
```

 **Weighted list** (proportional batch split):
``` yaml
resource_attributes:
    - attrs: {tenant.id: prod, service.namespace: frontend}
      weight: 3
    - attrs: {tenant.id: ppe, service.namespace: backend}
      weight: 1
```

  The weighted form produces a 75%/25% batch split, simulating realistic skewed  multi-tenant traffic on a single connection. All three forms are backward-compatible - existing configs are unaffected (defaults to empty).

 ### Implementation

  Rotation uses a precomputed index table built once at startup - [0, 0, 0, 1] for the 3:1 example above. The hot path is a single modulo lookup:

  slot  = rotation[batch_rotation_index % rotation.len()]
  attrs = &entries[slot].attrs

  `batch_rotation_index` is a dedicated counter incremented once per emitted batch, fully independent of `signal_count`. Rotation advances exactly once per OTLP message regardless of batch size.

 
 ### Limitations

  - `resource_attributes` only applies to `data_source: static`
  - With `generation_strategy: pre_generated`, only the first attribute set is used - rotation requires fresh (or templates)
  - Rotation order is naive ([0, 0, 0, 1] for 3:1), not smooth interleaved; smooth weighted round-robin is left as a follow-up

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Laurent Quérel <laurent.querel@gmail.com>
Co-authored-by: albertlockett <a.lockett@f5.com>
…hutdown on shared pdata channels (open-telemetry#2310)

# Change Summary

fix: SharedReceiver::try_recv maps Empty to Closed causing spurious
shutdown on shared pdata channels.

## What issue does this PR close?

spontaneous bug find.

## How are these changes tested?

Unit tests. Temporarily reverted the bug and validated that
`test_recv_when_false_shared_empty_alive_no_shutdown` test fails.

## Are there any user-facing changes?

No

Co-authored-by: Laurent Quérel <l.querel@f5.com>
…er. (open-telemetry#2262)

# Change Summary

The Geneva exporter was missing the `start_periodic_telemetry()` call,
which meant `CollectTelemetry` messages were never delivered - all
exporter metrics showed zero despite successful uploads.

This PR fixes that and adds metrics to match what Azure Monitor exporter
already tracks:
   
   - `batches_uploaded` / `batches_failed` — batch-level success/failure
   - `records_uploaded` / `records_failed` — individual record counts
   - `bytes_uploaded` — compressed payload throughput
   - `upload_duration` — upload latency (Mmsc)
   - `encode_duration` — encode + compress latency (Mmsc)
- `conversion_errors`, `empty_payloads_skipped`, `unsupported_signals` —
error path counters
   
   Tests updated to handle the new telemetry timer message.

## What issue does this PR close?

* Closes #NNN

## How are these changes tested?

through unit tests.

## Are there any user-facing changes?

N/A
…2292)

# Change Summary

Next part of open-telemetry#1847 and open-telemetry#2086

Moves:
* fanout_processor
* filter_processor
* signal_type_router
* batch_processor

## How are these changes tested?

* Unit tests / CI
* Compiled and ran `df_engine` and confirmed all nodes are still
available

## Are there any user-facing changes?

No
gouslu and others added 23 commits March 17, 2026 12:46
# Change Summary

Fixes the values pushed to the heartbeat table.

## How are these changes tested?

Local, manual testing and unit tests.

## Are there any user-facing changes?

No
…ry#2314)

# Change Summary

Next part of open-telemetry#1847 and open-telemetry#2086

Moves:
* attributes_processor
* content_router
* durable_buffer_processor
* retry_processor
* transform_processor

## How are these changes tested?

* Unit tests / CI
* Compiled and ran `df_engine` and confirmed all nodes are still
available

## Are there any user-facing changes?

No
Removed redundant exponential backoff
The Azure SDK already performs exponential backoff internally (e.g., 6
retries over 72s for IMDS via ManagedIdentityCredential). Our additional
exponential backoff (5s → 30s with jitter) on top of that added
negligible value (4–30% extra wait) and unnecessary complexity. Replaced
with a fixed 1-second pause to prevent tight-spinning between SDK retry
cycles.


Improved get_token_failed WARN message
Added a message field that tells operators:
Token acquisition failed
The exporter will keep retrying (counteracting the SDK's inner error
text which says "the request will no longer be retried")
The "retries exhausted" language in the error refers to an internal
retry layer, not the exporter's outer loop
Full error details remain available at DEBUG level via
get_token_failed.details.


Before (two noisy WARN lines per failure, misleading retry timing):

```txt
WARN get_token_failed     [attempt=1, error=Auth error: ManagedIdentityCredential authentication failed. retry policy expired and the request will no longer be retried]
WARN retry_scheduled      [delay_secs=5.23]
```

After (single clear WARN per failure, self-explanatory):

```txt
WARN get_token_failed     [message=Token acquisition failed. Will keep retrying. The error may mention retries being exhausted; that refers to an internal retry layer, not this outer loop., attempt=1, error=Auth error (token acquisition): ManagedIdentityCredential authentication failed. retry policy expired and the request will no longer be retried]
```
This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [duckdb](https://redirect.github.com/duckdb/duckdb-python)
([changelog](https://redirect.github.com/duckdb/duckdb-python/releases))
| `==1.4.4` → `==1.5.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/duckdb/1.5.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/duckdb/1.4.4/1.5.0?slim=true)
|

---

### Release Notes

<details>
<summary>duckdb/duckdb-python (duckdb)</summary>

###
[`v1.5.0`](https://redirect.github.com/duckdb/duckdb-python/releases/tag/v1.5.0):
DuckDB Python 1.5.0 &quot;Variegata&quot;

[Compare
Source](https://redirect.github.com/duckdb/duckdb-python/compare/v1.4.4...v1.5.0)

This is the 1.5.0 release of DuckDB's Python bindings. For a list of
changes in DuckDB core, have a look at the [DuckDB release
notes](https://redirect.github.com/duckdb/duckdb/releases/tag/v1.5.0)
and [the
blogpost](https://duckdb.org/2026/03/09/announcing-duckdb-150.html).

##### Breaking Changes

- **Dropped Python 3.9 support.** The minimum supported version is now
Python 3.10.
- **Removed deprecated `duckdb.typing` and `duckdb.functional`
modules.** These were deprecated in 1.4.0. Use `duckdb.sqltypes` and
`duckdb.func` instead.
- **Renamed `column` parameter to `expression`** in relational API
functions (e.g., `min`, `max`, `sum`, `mean`, etc.) to better reflect
that these accept expressions, not just column names.
- **Deprecated `fetch_arrow_table()` and `fetch_record_batch()`** on
connections and relations. Use the new `to_arrow_table()` and
`to_arrow_reader()` methods instead.

##### New Features

- **Polars LazyFrame projection and filter pushdown.** DuckDB can now
push down projections and filters when scanning Polars LazyFrames,
including support for cast nodes and unstrict casts.
- **Polars Int128 / UInt128 support.**
- **VARIANT type support** — Python conversion, NumPy array wrapping,
and type stubs.
- **TIME\_NS type support** — nanosecond-precision time values across
Python, NumPy, and Spark type systems.
- **Profiling API** — new `get_profiling_info()` and
`get_profiling_json()` methods on connections, plus a refactored
`query_graph` module with improved HTML visualization (dark mode,
expandable phases, depth).
- **`to_arrow_table()` and `to_arrow_reader()`** — new methods on
connections and relations as the preferred Arrow export API.

##### Performance

- **`__arrow_c_stream__` on relations** — relations now export via the
Arrow PyCapsule interface using `PhysicalArrowCollector` for zero-copy
streaming.
- **Unified Arrow stream scanning** via `__arrow_c_stream__`, with
filter pushdown only when pyarrow is present.
- **Arrow schema caching** to avoid repeated lookups during scanning.
- **Arrow object type caching** to avoid repeated detection.
- **Empty params treated as None for `.sql()`** — avoids unnecessary
parameter binding overhead.
- **Simplified GIL management** for `FetchRow`.

##### Bug Fixes

- **Fixed Python object leak in scalar UDFs** — `PyObject_CallObject`
return values are now properly stolen to avoid reference count leaks.
- **Fixed reference cycle** between connections and relations that could
prevent garbage collection.
- **Relations now hold a reference to their connection**, preventing
premature connection closure.
- **Fixed fsspec race condition** in the Python filesystem
implementation.
- **Fixed numeric conversion logic** — improved handling of large
integers (fallback to VARCHAR) and UNION types.
- **`pyarrow.dataset` import is now optional** — no longer fails if
pyarrow is installed without the dataset module.
- **Thrown a reasonable error** when an Arrow array stream has already
been consumed.

##### Build & Packaging

- **jemalloc enabled on Linux x86\_64 only** (aligned with DuckDB core),
removed as a separately bundled extension.
- **MSVC runtime linked statically** on Windows — eliminates the VS2019
workaround from
[duckdb/duckdb#17991](https://redirect.github.com/duckdb/duckdb/issues/17991).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 8am on Monday" (UTC),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/open-telemetry/otel-arrow).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My42Ni40IiwidXBkYXRlZEluVmVyIjoiNDMuNjYuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…emetry#2331)

This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[charset-normalizer](https://redirect.github.com/jawah/charset_normalizer)
([changelog](https://redirect.github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md))
| `==3.4.5` → `==3.4.6` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/charset-normalizer/3.4.6?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/charset-normalizer/3.4.5/3.4.6?slim=true)
|

---

### Release Notes

<details>
<summary>jawah/charset_normalizer (charset-normalizer)</summary>

###
[`v3.4.6`](https://redirect.github.com/jawah/charset_normalizer/blob/HEAD/CHANGELOG.md#346-2026-03-15)

[Compare
Source](https://redirect.github.com/jawah/charset_normalizer/compare/3.4.5...3.4.6)

##### Changed

- Flattened the logic in `charset_normalizer.md` for higher performance.
Removed `eligible(..)` and `feed(...)`
  in favor of `feed_info(...)`.
- Raised upper bound for mypy\[c] to 1.20, for our optimized version.
- Updated `UNICODE_RANGES_COMBINED` using Unicode blocks v17.

##### Fixed

- Edge case where noise difference between two candidates can be almost
insignificant.
([#&#8203;672](https://redirect.github.com/jawah/charset_normalizer/issues/672))
- CLI `--normalize` writing to wrong path when passing multiple files
in.
([#&#8203;702](https://redirect.github.com/jawah/charset_normalizer/issues/702))

##### Misc

- Freethreaded pre-built wheels now shipped in PyPI starting with 3.14t.
([#&#8203;616](https://redirect.github.com/jawah/charset_normalizer/issues/616))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 8am every weekday" (UTC),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/open-telemetry/otel-arrow).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My42Ni40IiwidXBkYXRlZEluVmVyIjoiNDMuNjYuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…etry#2332)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [azure_core](https://redirect.github.com/azure/azure-sdk-for-rust) |
workspace.dependencies | minor | `0.32.0` → `0.33.0` |
| [azure_identity](https://redirect.github.com/azure/azure-sdk-for-rust)
| workspace.dependencies | minor | `0.32.0` → `0.33.0` |

---

### Release Notes

<details>
<summary>azure/azure-sdk-for-rust (azure_core)</summary>

###
[`v0.33.0`](https://redirect.github.com/Azure/azure-sdk-for-rust/releases/tag/azure_identity%400.33.0)

[Compare
Source](https://redirect.github.com/azure/azure-sdk-for-rust/compare/azure_core@0.32.0...azure_core@0.33.0)

#### 0.33.0 (2026-03-09)

##### Breaking Changes

- Support for `wasm32-unknown-unknown` has been removed
([#&#8203;3377](https://redirect.github.com/Azure/azure-sdk-for-rust/issues/3377))
- `ClientCertificateCredential::new()` now takes `SecretBytes` instead
of `Secret` for the `certificate` parameter. Pass the raw PKCS12 bytes
wrapped in `SecretBytes` instead of a base64-encoded string wrapped in
`Secret`.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 8am on Monday" (UTC),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/open-telemetry/otel-arrow).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My42Ni40IiwidXBkYXRlZEluVmVyIjoiNDMuNjYuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Network errors (connect failures, timeouts) were **invisible** in the
exporter's metrics dashboard — all HTTP status counters showed 0 even
during total export failure. Added a laclient_network_errors counter
that increments on each failed HTTP attempt before a response is
received, making connectivity issues immediately diagnosable.

Tested by turning wifi off and running exporter. The new counters helps
troubleshoot quickly
…lemetry#2337)

This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
|
[opentelemetry-api](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-api/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-api/1.39.1/1.40.0?slim=true)
|
|
[opentelemetry-exporter-otlp](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-exporter-otlp/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-exporter-otlp/1.39.1/1.40.0?slim=true)
|
|
[opentelemetry-exporter-otlp-proto-common](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-exporter-otlp-proto-common/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-exporter-otlp-proto-common/1.39.1/1.40.0?slim=true)
|
|
[opentelemetry-exporter-otlp-proto-grpc](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-exporter-otlp-proto-grpc/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-exporter-otlp-proto-grpc/1.39.1/1.40.0?slim=true)
|
|
[opentelemetry-exporter-otlp-proto-http](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-exporter-otlp-proto-http/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-exporter-otlp-proto-http/1.39.1/1.40.0?slim=true)
|
|
[opentelemetry-proto](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-proto/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-proto/1.39.1/1.40.0?slim=true)
|
|
[opentelemetry-sdk](https://redirect.github.com/open-telemetry/opentelemetry-python)
| `==1.39.1` → `==1.40.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/opentelemetry-sdk/1.40.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/opentelemetry-sdk/1.39.1/1.40.0?slim=true)
|

---

### Release Notes

<details>
<summary>open-telemetry/opentelemetry-python
(opentelemetry-api)</summary>

###
[`v1.40.0`](https://redirect.github.com/open-telemetry/opentelemetry-python/blob/HEAD/CHANGELOG.md#Version-1400061b0-2026-03-04)

[Compare
Source](https://redirect.github.com/open-telemetry/opentelemetry-python/compare/v1.39.1...v1.40.0)

- `opentelemetry-sdk`: deprecate `LoggingHandler` in favor of
`opentelemetry-instrumentation-logging`, see
`opentelemetry-instrumentation-logging` documentation

([#&#8203;4919](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4919))
- `opentelemetry-sdk`: Clarify log processor error handling expectations
in documentation

([#&#8203;4915](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4915))
- bump semantic-conventions to v1.40.0

([#&#8203;4941](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4941))
- Add stale PR GitHub Action

([#&#8203;4926](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4926))
- `opentelemetry-sdk`: Drop unused Jaeger exporter environment variables
(exporter removed in 1.22.0)

([#&#8203;4918](https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4918))
- `opentelemetry-sdk`: Clarify timeout units in environment variable
documentation

([#&#8203;4906](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4906))
- `opentelemetry-exporter-otlp-proto-grpc`: Fix re-initialization of
gRPC channel on UNAVAILABLE error

([#&#8203;4825](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4825))
- `opentelemetry-exporter-prometheus`: Fix duplicate HELP/TYPE
declarations for metrics with different label sets

([#&#8203;4868](https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4868))
- Allow loading all resource detectors by setting
`OTEL_EXPERIMENTAL_RESOURCE_DETECTORS` to `*`

([#&#8203;4819](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4819))
- `opentelemetry-sdk`: Fix the type hint of the `_metrics_data` property
to allow `None`

([#&#8203;4837](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4837)).
- Regenerate opentelemetry-proto code with v1.9.0 release

([#&#8203;4840](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4840))
- Add python 3.14 support

([#&#8203;4798](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4798))
- Silence events API warnings for internal users

([#&#8203;4847](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4847))
- opentelemetry-sdk: make it possible to override the default processors
in the SDK configurator

([#&#8203;4806](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4806))
- Prevent possible endless recursion from happening in
`SimpleLogRecordProcessor.on_emit`,

([#&#8203;4799](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4799))
and
([#&#8203;4867](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4867)).
- Implement span start/end metrics

([#&#8203;4880](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4880))
- Add environment variable carriers to API

([#&#8203;4609](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4609))
- Add experimental composable rule based sampler

([#&#8203;4882](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4882))
- Make ConcurrentMultiSpanProcessor fork safe

([#&#8203;4862](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4862))
- `opentelemetry-exporter-otlp-proto-http`: fix retry logic and error
handling for connection failures in trace, metric, and log exporters

([#&#8203;4709](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4709))
- `opentelemetry-sdk`: avoid RuntimeError during iteration of view
instrument match dictionary in MetricReaderStorage.collect()

([#&#8203;4891](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4891))
- Implement experimental TracerConfigurator

([#&#8203;4861](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4861))
- `opentelemetry-sdk`: Fix instrument creation race condition

([#&#8203;4913](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4913))
- bump semantic-conventions to v1.39.0

([#&#8203;4914](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4914))
- `opentelemetry-sdk`: automatically generate configuration models using
OTel config JSON schema

([#&#8203;4879](https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4879))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 8am on Monday" (UTC),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/open-telemetry/otel-arrow).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My42Ni40IiwidXBkYXRlZEluVmVyIjoiNDMuNjYuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…zation (open-telemetry#2318)

# Change Summary

- Pre-serialize resource+scope fields once per ScopeLogs as a JSON byte
prefix
- Write per-record fields directly to byte buffer using itoa/ryu,
bypassing serde_json::Map entirely
- Add write_json_string, write_json_hex, write_field_value_json for
zero-allocation JSON output
- Make config and metrics modules public for benchmark access
- Add criterion benchmark under
contrib-nodes/benches/exporters/azure_monitor_exporter/
- Added contrib bench to rust-bench workflow.

Benchmark results (1000 records):
  Original:       1.60ms (~625K records/s)
  Hoisted:        1.36ms (~735K records/s)  +17%
  Hoisted + Direct Serialization:  425us  (~2.35M records/s) +275%

## What issue does this PR close?

## How are these changes tested?

- Existing unit tests cover already mapping uniqueness, also added tests
to make sure that overlapping fields are rejected by the config
valiation across resource -> scope -> log hiearchy.
- Added tests for validating against encoding issues for non ASCII
characters.

## Are there any user-facing changes?

None.
…ry#2336)

This PR contains the following updates:

| Package | Change |
[Age](https://docs.renovatebot.com/merge-confidence/) |
[Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|
| [pydantic-core](https://redirect.github.com/pydantic/pydantic-core) |
`==2.41.5` → `==2.42.0` |
![age](https://developer.mend.io/api/mc/badges/age/pypi/pydantic-core/2.42.0?slim=true)
|
![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/pydantic-core/2.41.5/2.42.0?slim=true)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the [Dependency
Dashboard](..open-telemetry/issues/417) for more information.

---

### Configuration

📅 **Schedule**: Branch creation - "before 8am on Monday" (UTC),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/open-telemetry/otel-arrow).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My42Ni40IiwidXBkYXRlZEluVmVyIjoiNDMuNjYuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
open-telemetry#2323)

… to core-nodes crate

# Change Summary

Next part of open-telemetry#1847 and open-telemetry#2086

Moves:

* perf_exporter
* topic_exporter
* internal_telemetry_receiver
* topic_receiver

## How are these changes tested?

* Unit tests / CI
* Compiled and ran `df_engine` and confirmed all nodes are still
available

## Are there any user-facing changes?

No
# Change Summary

Last Wednesday's SIG meeting discussion alluded to me that our OTAP
query engine probably had some had some performance to be gained in its
filtering implementation. I ran the profiler on the benchmarks and, lo
and behold, we were spending a lot of time creating `RoaringBitmap`s and
checking if IDs were contained therein.

Note - when filtering we were using these `RoaringBitmap`s in two
places:
a) when the filtering be an attribute value in a query like `where
attributes["x"] = "y"`, we filter the attribute record batch where `key
== "x" and str == "y"`, then create a bitmap of the parent_id column,
then we scan the parent record batch, determining which values from the
`id` column are in this bitmap, and keep only those rows. In these
situations, we may also and/or/not the bitmaps together if the query was
something like `attributes["x1"] == "y1" and attributes["x2"] == "y2"`.
b) when we're finished with the entire filter execution, we will have
determined which rows to keep for the parent batch, and then we need to
recursively filter the child record batches. We would do this by
creating a bitmap of the ID column on the parent batch, then scanning
the child `parent_id` column, checking which rows were in this bitmap
and keeping only those rows.

This PR replaces the usage of `RoaringBitmap` with a simpler
implementation based on an implementation called `IdBitmap` which
contains pages, each containing a flat bitmap of based on an array of
u64s. This is much faster because, unlike `RoaringBitmap`, we never have
to do search for the container during insert/contains. While the
`RoaringBitmap` is designed for potentially sparse data, our data is
dense where in the vast majority of cases the ID columns will be 0...max
with few or no gaps, so a more traditional bitmap makes sense here.

This PR also reuses the heap allocations of the bitmap between batches.
Note that because we potentially create many of these bitmaps during a
filter execution (see note above about logically combining attribute
filters), we implement reuse via a pool.

Benchmark results:
| Benchmark | Batch Size | Before | After | Change |
|---|---|---|---|---|
| `simple_field_filter` | 32 | 7.58 µs | 7.61 µs | ~0% |
| `simple_field_filter` | 1024 | 22.32 µs | 14.39 µs | **-35.5%** |
| `simple_field_filter` | 8192 | 140.86 µs | 65.67 µs | **-53.4%** |
| `simple_attr_filter` | 32 | 10.17 µs | 9.93 µs | **-2.4%** |
| `simple_attr_filter` | 1024 | 41.43 µs | 22.75 µs | **-45.1%** |
| `simple_attr_filter` | 8192 | 308.44 µs | 120.45 µs | **-60.9%** |
| `attr_or_attr_filter` | 32 | 12.92 µs | 12.57 µs | **-2.7%** |
| `attr_or_attr_filter` | 1024 | 56.77 µs | 31.04 µs | **-45.3%** |
| `attr_or_attr_filter` | 8192 | 427.42 µs | 167.81 µs | **-60.7%** |
| `attr_and_prop_filter` | 32 | 10.84 µs | 10.92 µs | ~0% |
| `attr_and_prop_filter` | 1024 | 35.98 µs | 21.54 µs | **-40.1%** |
| `attr_and_prop_filter` | 8192 | 263.80 µs | 100.78 µs | **-61.8%** |
| `attr_and_attr_filter` | 32 | 4.32 µs | 4.38 µs | ~0% |
| `attr_and_attr_filter` | 1024 | 13.40 µs | 9.41 µs | **-29.8%** |
| `attr_and_attr_filter` | 8192 | 92.65 µs | 49.49 µs | **-46.6%** |
| `attr_and_or_together_filter` | 32 | 16.31 µs | 16.37 µs | ~0% |
| `attr_and_or_together_filter` | 1024 | 52.78 µs | 34.01 µs |
**-35.6%** |
| `attr_and_or_together_filter` | 8192 | 387.77 µs | 164.41 µs |
**-57.6%** |
| `or_short_circuit` | 32 | 3.38 µs | 3.25 µs | **-3.8%** |
| `or_short_circuit` | 1024 | 24.19 µs | 8.49 µs | **-64.9%** |
| `or_short_circuit` | 8192 | 127.50 µs | 48.67 µs | **-61.8%** |

(Note that filter benchmarks `attr_and_or_together_filter` &
`and_attrs_short_circuit` are omitted from the table because they filter
out all the rows before any bitmap operations, so there is no
performance change).

<!--
Replace with a brief summary of the change in this PR
-->

## What issue does this PR close?

<!--
We highly recommend correlation of every PR to an issue
-->

* Closes open-telemetry#2330

## How are these changes tested?

Unit tests

## Are there any user-facing changes?

No


## Future work/followups

We should probably look at other places we use `RoaringBitmap` for this
same purpose to see if we can get some performance benefit from using
this new IdBitmap type. For example, I think this same type of procedure
is also done in filter processor.
# Change Summary

Allow users to define test containers that will start up and run
alongside the pipeline engine to test pipeline nodes that need an
external system to talk to.

Allow users to describe connections to test containers in the Scenario.
For example users and can configure their Generator/Capture to
send/receive from a test container. Can also specify suv Pipeline nodes
that should connect to a test container. All test container wiring is
handled by the framework, user just needs to specify what internal port
they want to talk to.

Updated README.md with new functions and examples of the test container
feature

## What issue does this PR close?

* Closes open-telemetry#2258 

## How are these changes tested?

Added unit tests

## Are there any user-facing changes?

no

---------

Co-authored-by: Drew Relmas <drewrelmas@gmail.com>
Old runner:

- name: `oracle-bare-metal-64cpu-512gb-x86-64`
- 512gb memory
- Oracle Linux 8

New runner:

-  name: `oracle-bare-metal-64cpu-1024gb-x86-64-ubuntu-24`
- 1024gb memory
-  Ubuntu 24

I realize this could have some impact on benchmark baselines, so please
post on open-telemetry/community#3333 once you
have migrated and are comfortable with the old one being removed.
…pen-telemetry#2306)

# Implement zero-copy view for OTAP Traces

Fixes open-telemetry#2053 (Traces portion).

This PR introduces `.traces` sub-module inside
`pdata/src/views/otap.rs`, implementing the `OtapTracesView`.

## Changes made
- Created `OtapTracesView` in `crates/pdata/src/views/otap/traces.rs`.
- Added zero-copy traversal elements mirroring the OTLP traces model:
  - `OtapResourceSpansView`
  - `OtapScopeSpansView`
  - `OtapSpanView`
  - `OtapEventView`, `OtapLinkView`, `OtapStatusView`
- Exposed the `traces` module in `otap.rs`.
- Adapted array access patterns to use standard traits like
`ByteArrayAccessor` and `StringArrayAccessor`.
- Modified `OtapAttributeView` in `logs.rs` to expose `key` and `value`
fields using `pub(crate)` natively so it can be re-used by `traces.rs`.

## Validation results
- `cargo test -p otap-df-pdata` passes.
- No memory leaks introduced; logic is completely zero-copy across all
RecordBatch abstractions for traces.
- Unit tests (`test_create_otap_traces_view`, `test_span_fields`,
`test_span_status`, `test_missing_optional_columns`,
`test_events_iteration`) were run and executed successfully without
lifetime / compilation panics.

Co-authored-by: albertlockett <a.lockett@f5.com>
…elemetry#2351)

# Change Summary

As noted in [this workflow
run](https://github.com/open-telemetry/otel-arrow/actions/runs/23181837916/job/67356323365?pr=2306)
- recent Renovate updates to some python dependencies for perf testing
didn't work properly.

open-telemetry#2336 updated `pydantic-core` from `2.41.5` to `2.42.0`. However, this
is an indirect dependency of `pydantic`, which is a direct dependency in
the requirements file:


https://github.com/open-telemetry/otel-arrow/blob/59ef72fdbe2003a1425bb5c700d3de0579ffb050/tools/pipeline_perf_test/orchestrator/requirements.txt#L6

`pydantic` `2.12.5` requires EXACTLY `pydantic-core` `2.41.5`, so it was
a bad Renovate update.

Based on Renovate docs, we should be able to disable indirect dependency
updates like this by matching on `matchDepTypes: ["indirect"]` and
disabling.

In addition, it seems we have some mismatch between python versions used
in the repo: `3.11` and `3.14`. This can also lead to bad side effects
if lock files are generated with a version different from the one being
used at runtime during workflow runs.
…rtup stalls (open-telemetry#2335)

# Change Summary

Fixes a startup race where pipeline cores could get stuck in `Pending`
on high-core machines.
 
Engine lifecycle events (`Admitted`, `Ready`) shared the same fixed-size
bounded observed-state channel as lossy async log events. Under startup
burst, `Admitted` could be dropped when `send_timeout(1ms)` expired.
When `Ready` arrived later, the state machine rejected the `Pending ->
Ready` transition as invalid, leaving the core stuck.

This change separates reliability classes: engine lifecycle events now
go through a dedicated unbounded channel, while async log events stay on
the existing bounded lossy path.

The unbounded channel is intentional: engine events are low-volume,
correctness-critical, and naturally bounded by pipeline/core lifecycle
activity.

**Alternate approaches considered:** 

- Increasing the bounded channel size would only reduce the probability
of failure under burst; it would not guarantee delivery of lifecycle
events.
- Making the state machine accept `Pending` -> `Ready` would mask
dropped lifecycle events instead of fixing delivery.

## What issue does this PR close?

* Closes open-telemetry#2328 

## How are these changes tested?

```bash
$ cargo test -p otap-df-state -- --nocapture
$ cargo check -p otap-df-controller -p otap-df-state -p otap-df-telemetry -p otap-df-config
```

## Are there any user-facing changes?

No
… crate (open-telemetry#2339)

# Change Summary

Next part of open-telemetry#1847
and open-telemetry#2086

## How are these changes tested?

* Unit tests / CLI
* Compiled and ran `df_engine` and confirmed all nodes are still
available

## Are there any user-facing changes?

No
…2346)

# Change Summary

I previously added payload definitions in open-telemetry#2240 which were mostly
focused on solving the dictionary key size problem. This PR builds on
that work, but with a more refined implementation that also includes
things like required vs optional columns as a pre-req for open-telemetry#2289.

The major changes:

- Added an otap `Schema` type and redefined the payloads according to
that. This is a better construct that is somewhat symmetrical to arrow
Schemas, allows for defining recursive Struct and List types, and
subsequently removes the requirement to have nested lookup tables.
- Added deep equality checks between record batches and `Schema` types
- Updated some small bits of the `transform` module for the changes 



## What issue does this PR close?

* Part of open-telemetry#2289

## How are these changes tested?

Unit

## Are there any user-facing changes?

No

---------

Co-authored-by: albertlockett <a.lockett@f5.com>
…erator (open-telemetry#2347)

Need to see the pipeline performance when using large payloads etc. Its
repeating same string, so compresses well - addressing that (by true
randomn values) would be a future addition.
…elemetry#2211)

Add a `node.processor` metric set with a `process.success.duration` and
`process.failed.duration` Mmsc instrument for measuring the wall-clock
duration of the work done in a process() call. A closure is used to
prevent inclusion of async-await points in the measurement.

The metric is registered via the node telemetry context.

This is intended to be gated by MetricLevel >= Normal

Fixes open-telemetry#2210.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
@github-actions github-actions bot added query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches labels Mar 17, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 17, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: cijothomas / name: Cijo Thomas (ac2b239)

@cijothomas
Copy link
Copy Markdown
Member Author

https://github.com/open-telemetry/community/blob/main/assets.md#large-windows-runners Looks like we have access to Windows runners, so we can try that instead of Github runners which are not very powerful and gives flaky behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-repo Repository maintenance, build, GH workflows, repo cleanup, or other chores query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.