Skip to content

feat(temporal_reaggregation_processor): Basic aggregations for OTAP payloads#2444

Open
JakeDern wants to merge 24 commits intoopen-telemetry:mainfrom
JakeDern:metrics-trag-processor-2
Open

feat(temporal_reaggregation_processor): Basic aggregations for OTAP payloads#2444
JakeDern wants to merge 24 commits intoopen-telemetry:mainfrom
JakeDern:metrics-trag-processor-2

Conversation

@JakeDern
Copy link
Copy Markdown
Contributor

Change Summary

This is Part 2 of the temporal reaggregation processor which adds some basic aggregation ability and a whole lot of plumbing. There are still many things to fix and implement, but this seemed like a good checkpoint.

This PR lets us:

  • Compute an identity of a metric stream from a view without copying
  • Incrementally build an output record batch while giving random write access to the data point tables so that we can update them with newer data as it comes in.

What issue does this PR close?

How are these changes tested?

Unit.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the rust Pull requests that update Rust code label Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 97.03518% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.27%. Comparing base (5e0c424) to head (4598c2b).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2444      +/-   ##
==========================================
- Coverage   88.32%   88.27%   -0.06%     
==========================================
  Files         603      607       +4     
  Lines      212222   216568    +4346     
==========================================
+ Hits       187444   191165    +3721     
- Misses      24252    24877     +625     
  Partials      526      526              
Components Coverage Δ
otap-dataflow 90.21% <97.03%> (-0.16%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.74% <ø> (+0.11%) ⬆️
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 52.44% <ø> (ø)
quiver 91.94% <ø> (+0.03%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JakeDern JakeDern changed the title [WIP] feat(temporal_reaggregation_processor): Basic aggregations for OTAP payloads feat(temporal_reaggregation_processor): Basic aggregations for OTAP payloads Mar 27, 2026
@JakeDern JakeDern marked this pull request as ready for review March 27, 2026 18:18
@JakeDern JakeDern requested a review from a team as a code owner March 27, 2026 18:18
weaver_resolved_schema = { git = "https://github.com/open-telemetry/weaver.git", tag = "v0.21.2"}
weaver_resolver = { git = "https://github.com/open-telemetry/weaver.git", tag = "v0.21.2"}
weaver_semconv = { git = "https://github.com/open-telemetry/weaver.git", tag = "v0.21.2"}
xxhash-rust = { version = "0.8", features = ["xxh3"] }
Copy link
Copy Markdown
Contributor Author

@JakeDern JakeDern Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a BSL 1.0 license. It seems to be pretty permissive, and I think it would be safe to allow: https://github.com/doumanash/xxhash-rust?tab=BSL-1.0-1-ov-file

If not we can definitely try to find a different library for a 128-bit hash or we can compute two independent 64-bit hashes and combine them which is what the go implementation does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify - BSL 1.0 in this case is "Boost 1.0" not "Business Source License"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmacd Any restrictions you know of on Boost 1.0 license? Our internal guidance allows it explicitly; I can add it to the list if there are no other restrictions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is OK, however please also note:

  • The issue number [Rust-CI Follow-up] Check with CNCF legal over use of Unicode license #313 is gaining weight. I think we should go ahead and if someone thinks we should change this, address later. think BSL-1.0 is OK, however we have to make a couple of updates.
  • The current NOTICE file is a little unclear, we might call this THIRD_PARTY_NOTICES.txt. We should add the BSL-1.0 requirement for xxhash-rs into this file
  • The df_engine I think has to comply with this as well, we need to add a --license mode that prints the the inlined notice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took care of the first two points in this PR, maybe we can follow up with another PR for adding the --license mode for df_engine if that's ok

Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit how the hashed value is serialized to avoid collisions.

Comment on lines +324 to +328
for attr in &buf.entries {
buf.buf.push(HashTag::Key as u8);
buf.buf.extend_from_slice(attr.key());
write_attr_value(&mut buf.buf, attr);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a collision issue. I'm not talking about a probabilistic collision issue but it's a deterministic alias caused by an ambiguous encoding. With this algo:

{ a = bytes([F4 62 F5]) } and { a = bytes([]), b = empty }

will have the same encoding byte for byte. Adding the len as a prefix for each encoded value will probably fix this issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, yeah let me think about how to avoid this. We should probably also patch in go because I adapted it from there and it's the same scheme: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/pdatautil/hash.go

Copy link
Copy Markdown
Contributor Author

@JakeDern JakeDern Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They actually write a suffix as well, which I originally missed, but I don't think it matters. Slightly tweaked collision for the FF suffix:

{ a = bytes([FF F4 62 F6 BB]) } =>        F4 61 F6 FF F4 62 F6 BB FF
{ a = bytes([]),  b = bytes([BB]) } =>    F4 61 F6 FF F4 62 F6 BB FF

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/1ff44f3cc3104d0cc3660945a90f24dcf52efbdc/pkg/pdatautil/hash.go#L173-L178

(updated with an easier example)
(updated again, because I forgot a byte, but it doesn't matter)

Copy link
Copy Markdown
Contributor Author

@JakeDern JakeDern Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the length prefix for the byte arrays will work, but I'm also wondering if we should solve this for keys or not. I think if keys are guaranteed to be UTF-8 and we shift the tags into the F5-FF range then we'd be ok there. Otherwise it's trivial to create the same collision:

{ "a\xF6" = bytes([]) } =>   F4 61 F6 F6 FF
{ "a" = bytes([F6]) } =>     F4 61 F6 F6 FF

But there's a few layers to consider on the "guaranteed to be UTF-8" part...

  1. Our views currently define Strings as pub type Str<'src> = &'src [u8];. It's reasonable to argue that we therefore can't assume anything about encoding because anyone could implement the trait as narrowly as it's defined.

  2. Ignoring that for a second let's look at the existing implementation and ask, "can non-utf8 characters make it to this point in the program?"

For arrow we have some guarantees as the source data was parsed already as a string:

#[inline(always)]
pub(crate) fn get_attribute_key<'a>(
    attr_key: &'a StringArrayAccessor<'a>,
    row_idx: usize,
) -> Option<&'a [u8]> {
    attr_key.str_at(row_idx).map(|s| s.as_bytes())
}

For OTLP, I'm not sure we have such a guarantee because we're just slicing some byte buffer. You could argue that protobuf defines this field as a string and therefore these bytes must be utf-8, but someone could create a buggy or malicious implementation that does not do this and therefore we need to check before assuming:

    fn key(&self) -> otap_df_pdata_views::views::common::Str<'_> {
        loop {
            if let Some((start, end)) = from_option_nonzero_range_to_primitive(self.key_range.get())
            {
                return &self.buf[start..end];
            } else if self.pos.get() >= self.buf.len() {
                break;
            } else {
                self.advance();
            }
        }

        // return empty string when cannot read key
        &[]
    }

I think my opinion here is that someone could manufacture collisions which may or may not hurt depending on scenario so we should either:

  1. Redefine the view type as pub type Str<'src> = &'src str; OR
  2. Put a length field in front of the keys as well

Note: The OpenTelemetry specification only says that attribute names must be valid Unicode but explicitly calls out there's no canonical encoding. This is not really relevant here, but just pointing it out: https://opentelemetry.io/docs/specs/semconv/general/naming/#general-naming-considerations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing a 4-byte length for both string/byte array keys and values for now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JakeDern
Copy link
Copy Markdown
Contributor Author

JakeDern commented Mar 27, 2026

@lquerel Here is a list of things not included in this PR I currently have marked for follow-up in a very approximate order:

  • Support both payload types
  • Support passing through unsupported metrics immediately (we're currently dropping them)
  • Re-use the view objects between batches
  • ID overflow handling
  • Add support for Array and Map attribute types (views and hashing)
  • Add exemplar support
  • Add benchmarks
  • Add settings for bounding aggregator memory
  • Add ack/nack support
  • Support for additional types of metrics if required
  • Support for passthrough settings to exclude certain types from aggregations (gauges, others)

Copy link
Copy Markdown
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Comment on lines +324 to +328
for attr in &buf.entries {
buf.buf.push(HashTag::Key as u8);
buf.buf.extend_from_slice(attr.key());
write_attr_value(&mut buf.buf, attr);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weaver_resolved_schema = { git = "https://github.com/open-telemetry/weaver.git", tag = "v0.21.2"}
weaver_resolver = { git = "https://github.com/open-telemetry/weaver.git", tag = "v0.21.2"}
weaver_semconv = { git = "https://github.com/open-telemetry/weaver.git", tag = "v0.21.2"}
xxhash-rust = { version = "0.8", features = ["xxh3"] }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is OK, however please also note:

  • The issue number [Rust-CI Follow-up] Check with CNCF legal over use of Unicode license #313 is gaining weight. I think we should go ahead and if someone thinks we should change this, address later. think BSL-1.0 is OK, however we have to make a couple of updates.
  • The current NOTICE file is a little unclear, we might call this THIRD_PARTY_NOTICES.txt. We should add the BSL-1.0 requirement for xxhash-rs into this file
  • The df_engine I think has to comply with this as well, we need to add a --license mode that prints the the inlined notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants