Fix duplicate attribute keys in transform_attributes by gyanranjanpanda · Pull Request #2423 · open-telemetry/otel-arrow

gyanranjanpanda · 2026-03-24T20:42:51Z

Fix Duplicate Attribute Keys in `transform_attributes`

Changes Made

This PR resolves issue #1650 by ensuring that dictionary keys are deduplicated when transformations such as rename are applied, as required by the OpenTelemetry specification ("Exported maps MUST contain only unique keys by default").

To accomplish this while maintaining strict performance requirements, we replaced the previous RowConverter deduplication strategy with a new high-performance, proactive pre-filter:

We injected filter_rename_collisions into transform_attributes_impl inside otap-dataflow/crates/pdata/src/otap/transform.rs.
Before a rename is processed, this function reads the parent_ids and target keys. It uses the IdBitmap type to find any existing target keys whose parent_id maps back to an old key that will be renamed.
It proactively strips those collision rows from the batch via arrow::compute::filter_record_batch before the actual transform happens.

Testing

Extended the AttributesProcessor unit tests (test_rename_removes_duplicate_keys) to explicitly verify that renaming an attribute resulting in a collision automatically discards duplicate keys.
Extended the AttributesTransformPipelineStage in query-engine tests with a parallel case ensuring OPL/KQL query pipelines (project-rename) properly drop duplicates when resolving duplicates.
Refactored otap_df_pdata transform.rs tests to properly expect deduplicated keys using this plan-based method.
Validated logic with cargo test --workspace --all-features.

Validation Results

All tests pass. OTel semantic rules surrounding unique mapped keys map cleanly through down/upstream processors. The IdBitmap intersection approach completely resolves the multi-thousand percent RowConverter performance regressions, dropping collision resolution overhead to essentially zero through efficient bitmap operations.

codecov · 2026-03-24T20:46:32Z

Codecov Report

❌ Patch coverage is 76.28866% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.37%. Comparing base (cffb437) to head (2d813af).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2423      +/-   ##
==========================================
- Coverage   88.39%   88.37%   -0.03%     
==========================================
  Files         603      603              
  Lines      213326   213567     +241     
==========================================
+ Hits       188578   188745     +167     
- Misses      24222    24296      +74     
  Partials      526      526

Components	Coverage Δ
otap-dataflow	`90.39% <76.28%> (-0.04%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.74% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.44% <ø> (ø)`
quiver	`91.94% <ø> (+0.03%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gyanranjanpanda · 2026-03-24T22:51:03Z

@albertlockett and @ThomsonTan waiting for your feed backl

albertlockett

Hey @gyanranjanpanda . I appreciate you taking the time to look at this, but I don't think we can accept this PR as is.

Unfortunately, the benchmarks we have for this code on main are currently broken. But when I apply the fix from #2426 and run the benchmark we see that this change introduces significant performance regression:

transform_attributes_dict_keys/single_replace_no_deletes/keys=32,rows=128,rows_per_key=4
                        time:   [5.1300 µs 5.1348 µs 5.1394 µs]
                        change: [+1027.4% +1031.5% +1035.2%] (p = 0.00 < 0.05)
                        Performance has regressed.


transform_attributes_dict_keys/single_replace_single_delete/keys=32,rows=128,rows_per_key=4
                        time:   [5.5027 µs 5.5091 µs 5.5155 µs]
                        change: [+495.01% +497.37% +499.48%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/no_replace_single_delete/keys=32,rows=128,rows_per_key=4
                        time:   [5.3440 µs 5.3584 µs 5.3746 µs]
                        change: [+577.41% +580.27% +583.40%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/single_replace_no_deletes/keys=32,rows=1536,rows_per_key=48
                        time:   [34.015 µs 34.050 µs 34.086 µs]
                        change: [+4000.2% +4016.4% +4031.3%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/single_replace_single_delete/keys=32,rows=1536,rows_per_key=48
                        time:   [34.390 µs 34.472 µs 34.562 µs]
                        change: [+1421.9% +1433.5% +1443.9%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/no_replace_single_delete/keys=32,rows=1536,rows_per_key=48
                        time:   [34.302 µs 34.340 µs 34.379 µs]
                        change: [+1562.1% +1568.0% +1573.6%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/single_replace_no_deletes/keys=32,rows=8192,rows_per_key=256
                        time:   [171.62 µs 171.78 µs 171.96 µs]
                        change: [+6262.2% +6290.6% +6316.2%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/single_replace_single_delete/keys=32,rows=8192,rows_per_key=256
                        time:   [171.79 µs 171.92 µs 172.06 µs]
                        change: [+1771.2% +1835.7% +1893.0%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/no_replace_single_delete/keys=32,rows=8192,rows_per_key=256
                        time:   [171.20 µs 171.35 µs 171.49 µs]
                        change: [+1962.8% +1981.5% +1998.1%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/single_replace_no_deletes/keys=128,rows=128,rows_per_key=1
                        time:   [4.9566 µs 4.9693 µs 4.9819 µs]
                        change: [+587.52% +592.02% +597.47%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/single_replace_single_delete/keys=128,rows=128,rows_per_key=1
                        time:   [5.6185 µs 5.6284 µs 5.6377 µs]
                        change: [+292.54% +294.19% +296.01%] (p = 0.00 < 0.05)
                        Performance has regressed.

transform_attributes_dict_keys/no_replace_single_delete/keys=128,rows=128,rows_per_key=1
                        time:   [5.2733 µs 5.2831 µs 5.2938 µs]
                        change: [+385.50% +387.73% +389.92%] (p = 0.00 < 0.05)
                        Performance has regressed.

While I expect to see some performance regression because we're doing extra work, I feel that such a serious regression in performance warrants some additional investigation into if/how we can do this in a more efficient way.

Please see my comment here which prescribes an approach that I believe will be more performant than what is currently in this PR: #1650 (comment)

gyanranjanpanda · 2026-03-24T23:04:15Z

thanks for your wonderful guidance i will make sure i could match your expectation

albertlockett · 2026-03-25T18:10:41Z

Hey @gyanranjanpanda I wanted to give you a heads up that I am going to be working on #2014 and there may be some significant changes to the transform_attributes code. I will be touching code in transform_keys as well as transform_attributes_impl. I wanted to give you a heads up in case you want to hold off advancing your work until you can better understand the conflicts

gyanranjanpanda · 2026-03-25T19:51:41Z

Thanks for the heads up! I’ll keep an eye on your changes to #2014 and try to align my work accordingly. If possible, could you share which parts might be most affected so I can avoid overlap? or should i wait after u finished your work i should continue this work

albertlockett · 2026-03-25T22:33:06Z

Thanks for the heads up! I’ll keep an eye on your changes to #2014 and try to align my work accordingly. If possible, could you share which parts might be most affected so I can avoid overlap? or should i wait after u finished your work i should continue this work

It's probably easiest to hold off until I finish to avoid conflicts, but I'll leave it up to you. I think I should have the changes I need to make for #2014 done by early next week, if not sooner.

For now, I'll show you the in-progress changes:
https://github.com/open-telemetry/otel-arrow/compare/main...albertlockett:otel-arrow:albert/2014?expand=1

I was imagining that for #1650 you'd need to make changes to plan_key_replacements or plan_key_deletes (which actually haven't been modified) to produce ranges to be deleted in transform_keys.

albertlockett · 2026-03-27T11:20:26Z

@gyanranjanpanda the changes I mentioned that could cause conflicts have now been merged (see #2442)

gyanranjanpanda · 2026-03-27T11:50:03Z

i will fix this code as soon as possible while looking your merged pr

…metry#1650) When renaming attribute key 'x' to 'y', any existing row with key 'y' sharing a parent_id with a row having key 'x' would produce a duplicate. This commit fixes that by: - Adding find_rename_collisions_to_delete_ranges() which uses IdBitmap to efficiently detect these collisions in O(N) time - Generating KeyTransformRange::Delete entries that are merged into the existing transform pipeline in transform_keys() and transform_dictionary_keys() - Fixing an early-return in transform_dictionary_keys() that skipped row-level collision deletes when dictionary values had no deletions - Adding read_parent_ids_as_u32() helper for parent_id column access - Adding test_rename_removes_duplicate_keys integration test Only runs collision detection when parent_ids are plain-encoded (not transport-optimized) to avoid incorrect results from quasi-delta encoded values. Closes open-telemetry#1650

gyanranjanpanda requested a review from a team as a code owner March 24, 2026 20:42

github-project-automation bot added this to OTel-Arrow Mar 24, 2026

github-actions bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches labels Mar 24, 2026

gyanranjanpanda force-pushed the fix-duplicate-attributes-1650 branch 2 times, most recently from a210873 to 361e6bd Compare March 24, 2026 22:43

albertlockett requested changes Mar 24, 2026

View reviewed changes

gyanranjanpanda force-pushed the fix-duplicate-attributes-1650 branch from 361e6bd to 06392eb Compare March 24, 2026 22:57

gyanranjanpanda force-pushed the fix-duplicate-attributes-1650 branch 2 times, most recently from c9e8263 to e2862b4 Compare March 27, 2026 20:07

gyanranjanpanda force-pushed the fix-duplicate-attributes-1650 branch from e2862b4 to 2d813af Compare March 27, 2026 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate attribute keys in transform_attributes#2423

Fix duplicate attribute keys in transform_attributes#2423
gyanranjanpanda wants to merge 1 commit intoopen-telemetry:mainfrom
gyanranjanpanda:fix-duplicate-attributes-1650

gyanranjanpanda commented Mar 24, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

gyanranjanpanda commented Mar 24, 2026

Uh oh!

albertlockett left a comment

Uh oh!

gyanranjanpanda commented Mar 24, 2026

Uh oh!

albertlockett commented Mar 25, 2026

Uh oh!

gyanranjanpanda commented Mar 25, 2026

Uh oh!

albertlockett commented Mar 25, 2026

Uh oh!

albertlockett commented Mar 27, 2026

Uh oh!

gyanranjanpanda commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gyanranjanpanda commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix Duplicate Attribute Keys in transform_attributes

Changes Made

Testing

Validation Results

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gyanranjanpanda commented Mar 24, 2026

Uh oh!

albertlockett left a comment

Choose a reason for hiding this comment

Uh oh!

gyanranjanpanda commented Mar 24, 2026

Uh oh!

albertlockett commented Mar 25, 2026

Uh oh!

gyanranjanpanda commented Mar 25, 2026

Uh oh!

albertlockett commented Mar 25, 2026

Uh oh!

albertlockett commented Mar 27, 2026

Uh oh!

gyanranjanpanda commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gyanranjanpanda commented Mar 24, 2026 •

edited

Loading

Fix Duplicate Attribute Keys in `transform_attributes`

codecov bot commented Mar 24, 2026 •

edited

Loading