-
Notifications
You must be signed in to change notification settings - Fork 81
Add benchmark overview doc #1528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
a459ee4
38ca568
557c524
3357270
bad5cb6
6f33600
2a68ded
0c461ad
e95ab5c
132fd9b
18259e0
8c8bfcc
8961a15
4efa9a1
218676b
b501207
2dbd3f1
c8cc9b8
6cc3316
a1abd95
fa63a4f
e28d208
c407c88
3c67b1d
029b0b3
0e58031
cc02ae9
b0c8806
3d3eb70
d87fddc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,250 @@ | ||||||
| # OpenTelemetry Arrow Performance Summary | ||||||
|
|
||||||
| ## Overview | ||||||
|
|
||||||
| The OpenTelemetry Arrow (OTel Arrow) project is currently in **Phase 2**, | ||||||
| building an end-to-end Arrow-based telemetry pipeline in Rust. Phase 1 focused | ||||||
| on collector-to-collector traffic compression using the OTAP protocol, achieving | ||||||
| significant network bandwidth savings. Phase 2 expands this foundation by | ||||||
| implementing the entire in-process pipeline using Apache Arrow's columnar | ||||||
| format, targeting substantial improvements in data processing efficiency while | ||||||
| maintaining the network efficiency gains from Phase 1. | ||||||
|
|
||||||
| The dataflow engine (df-engine), implemented in Rust, provides predictable | ||||||
| performance characteristics and efficient resource utilization across varying | ||||||
| load conditions. This columnar approach is expected to offer substantial | ||||||
| advantages over traditional row-oriented telemetry pipelines in terms of CPU | ||||||
| efficiency, memory usage, and throughput. | ||||||
|
|
||||||
| This document presents key performance metrics across different load scenarios | ||||||
| and test configurations. | ||||||
|
|
||||||
| ### Test Environment | ||||||
|
|
||||||
| All performance tests are executed on bare-metal compute instance with the | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if this is a shared resource or dedicated, consider calling it out.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is dedicated. Will update |
||||||
| following specifications: | ||||||
|
|
||||||
| - **CPU**: 64 cores (x86-64 architecture) | ||||||
| - **Memory**: 512 GB RAM | ||||||
| - **Platform**: Oracle Bare Metal Instance | ||||||
| - **OS**: Oracle Linux 8 | ||||||
|
|
||||||
| This consistent, high-performance environment ensures reproducible results and | ||||||
| allows for comprehensive testing across various CPU core configurations (1, 4, | ||||||
| and 8 cores etc.) by constraining the df-engine to specific core allocations. | ||||||
cijothomas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ### Performance Metrics | ||||||
|
|
||||||
| #### Idle State Performance | ||||||
|
|
||||||
| Baseline resource consumption with no active telemetry traffic. | ||||||
cijothomas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| | Metric | Value | | ||||||
| |--------|-------| | ||||||
| | CPU Usage | TBD | | ||||||
| | Memory Usage | TBD | | ||||||
|
|
||||||
| These baseline metrics validate that the engine maintains minimal resource | ||||||
| footprint when idle, ensuring efficient operation in environments with variable | ||||||
| telemetry loads. | ||||||
|
|
||||||
| #### Standard Load Performance | ||||||
|
|
||||||
| Resource utilization at 100,000 log records per second (100K logs/sec). Tests | ||||||
| are conducted with three different batch sizes to demonstrate the impact of | ||||||
cijothomas marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| batching on performance. | ||||||
|
|
||||||
| **Test Parameters:** | ||||||
|
|
||||||
| - Total input load: 100,000 log records/second | ||||||
| - Average log record size: 1 KB | ||||||
| - Batch sizes tested: 10, 100, 1000, and 10000 records per request | ||||||
|
|
||||||
| This wide range of batch sizes evaluates performance across diverse deployment | ||||||
| scenarios. Small batches (10-100) represent edge collectors or real-time | ||||||
| streaming requirements, while large batches (1000-10000) represent gateway | ||||||
cijothomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| collectors and high-throughput aggregation points. This approach ensures a fair | ||||||
| assessment, highlighting both the overhead for small batches and the significant | ||||||
| efficiency gains inherent to Arrow's columnar format at larger batch sizes. | ||||||
|
|
||||||
| ##### Standard Load - OTAP -> OTAP (Native Protocol) | ||||||
|
|
||||||
| | CPU Cores | Batch Size | CPU Usage | Memory Usage | | ||||||
| |-----------|------------|-----------|---------------| | ||||||
| | 1 Core | 10/batch | TBD | TBD | | ||||||
| | 1 Core | 100/batch | TBD | TBD | | ||||||
| | 1 Core | 1000/batch | TBD | TBD | | ||||||
| | 1 Core | 10000/batch | TBD | TBD | | ||||||
| | 4 Cores | 10/batch | TBD | TBD | | ||||||
| | 4 Cores | 100/batch | TBD | TBD | | ||||||
| | 4 Cores | 1000/batch | TBD | TBD | | ||||||
| | 4 Cores | 10000/batch | TBD | TBD | | ||||||
| | 8 Cores | 10/batch | TBD | TBD | | ||||||
| | 8 Cores | 100/batch | TBD | TBD | | ||||||
| | 8 Cores | 1000/batch | TBD | TBD | | ||||||
| | 8 Cores | 10000/batch | TBD | TBD | | ||||||
|
|
||||||
| This represents the optimal scenario where the df-engine operates with its | ||||||
| native protocol end-to-end, eliminating protocol conversion overhead. The | ||||||
| thread-per-core architecture demonstrates linear scaling across CPU cores | ||||||
| without contention, allowing the engine to be configured for specific deployment | ||||||
| requirements. | ||||||
|
|
||||||
| ##### Standard Load - OTLP -> OTAP (Protocol Conversion) | ||||||
|
|
||||||
| | CPU Cores | Batch Size | CPU Usage | Memory Usage | | ||||||
| |-----------|------------|-----------|---------------| | ||||||
| | 1 Core | 10/batch | TBD | TBD | | ||||||
| | 1 Core | 100/batch | TBD | TBD | | ||||||
| | 1 Core | 1000/batch | TBD | TBD | | ||||||
| | 1 Core | 10000/batch | TBD | TBD | | ||||||
| | 4 Cores | 10/batch | TBD | TBD | | ||||||
| | 4 Cores | 100/batch | TBD | TBD | | ||||||
| | 4 Cores | 1000/batch | TBD | TBD | | ||||||
| | 4 Cores | 10000/batch | TBD | TBD | | ||||||
| | 8 Cores | 10/batch | TBD | TBD | | ||||||
| | 8 Cores | 100/batch | TBD | TBD | | ||||||
| | 8 Cores | 1000/batch | TBD | TBD | | ||||||
| | 8 Cores | 10000/batch | TBD | TBD | | ||||||
|
|
||||||
| This scenario represents the common case where OpenTelemetry SDK clients emit | ||||||
| OTLP (not yet capable of OTAP), and the df-engine converts to OTAP for egress. | ||||||
| This demonstrates backward compatibility and protocol conversion efficiency | ||||||
| while maintaining linear scaling characteristics across CPU cores. | ||||||
|
|
||||||
| #### Saturation Performance | ||||||
|
|
||||||
| Behavior at maximum capacity when physical resource limits are reached. | ||||||
|
|
||||||
| ##### Saturation Load - OTAP -> OTAP (Native Protocol) | ||||||
|
|
||||||
| | CPU Cores | Maximum Sustained Throughput | Throughput / Core | Memory Usage | | ||||||
| |-----------|------------------------------|-------------------|--------------| | ||||||
| | 1 Core | TBD | TBD | TBD | | ||||||
| | 4 Cores | TBD | TBD | TBD | | ||||||
| | 8 Cores | TBD | TBD | TBD | | ||||||
|
|
||||||
| ##### Saturation Load - OTLP -> OTAP (Protocol Conversion) | ||||||
|
|
||||||
| | CPU Cores | Maximum Sustained Throughput | Throughput / Core | Memory Usage | | ||||||
| |-----------|------------------------------|-------------------|--------------| | ||||||
| | 1 Core | TBD | TBD | TBD | | ||||||
| | 4 Cores | TBD | TBD | TBD | | ||||||
| | 8 Cores | TBD | TBD | TBD | | ||||||
|
|
||||||
| Saturation testing validates the engine's stability under extreme load. The | ||||||
| df-engine exhibits well-defined behavior when operating at capacity, maintaining | ||||||
| predictable performance without degradation or instability. These results | ||||||
| demonstrate the maximum throughput achievable with different CPU core | ||||||
| allocations. The **Throughput / Core** metric provides a key efficiency | ||||||
| indicator for capacity planning. | ||||||
|
|
||||||
| <!--TODO: Document what is the behavior - is it applying backpressure | ||||||
| (`wait_for_result` feature)? or dropping items and keeping internal metric | ||||||
| about it.--> | ||||||
|
|
||||||
| ### Architecture | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a bit weird to have an architecture section in the benchmark document (unless it is talking about the benchmarking environment's own architecture). |
||||||
|
|
||||||
| The OTel Arrow dataflow engine is built in Rust, to achieve high throughput and | ||||||
| low latency. The columnar data representation and zero-copy processing | ||||||
| capabilities enable efficient handling of telemetry data at scale. | ||||||
|
|
||||||
| #### Thread-Per-Core Design | ||||||
|
|
||||||
| The df-engine supports a configurable runtime execution model, using a | ||||||
| **thread-per-core architecture** that eliminates traditional concurrency | ||||||
| overhead. This design allows: | ||||||
|
|
||||||
| - **CPU Affinity Control**: Pipelines can be pinned to specific CPU cores or | ||||||
| groups through configuration | ||||||
| - **NUMA Optimization**: Memory and CPU assignments can be coordinated for | ||||||
| Non-Uniform Memory Access (NUMA) architectures | ||||||
| - **Workload Isolation**: Different telemetry signals or tenants can be assigned | ||||||
| to dedicated CPU resources, preventing resource contention | ||||||
| - **Reduced Synchronization**: Thread-per-core design minimizes lock contention | ||||||
| and context switching overhead | ||||||
|
|
||||||
| For detailed technical documentation, see the [OTAP Dataflow Engine | ||||||
| Documentation](../rust/otap-dataflow/README.md) and [Phase 2 | ||||||
| Design](phase2-design.md). | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Comparative Analysis: OTel Arrow vs OpenTelemetry Collector | ||||||
|
|
||||||
| ### Methodology | ||||||
cijothomas marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| To provide a fair and meaningful comparison between the OTel Arrow dataflow | ||||||
| engine and the OpenTelemetry Collector, we use **Syslog (UDP/TCP)** as the | ||||||
| ingress protocol for both systems. | ||||||
|
|
||||||
| #### Rationale for Syslog-Based Comparison | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| Syslog was specifically chosen as the input protocol because: | ||||||
|
|
||||||
| 1. Neutral Ground: Syslog is neither OTLP (OpenTelemetry Protocol) nor OTAP | ||||||
| (OpenTelemetry Arrow Protocol), ensuring neither system has a native protocol | ||||||
| advantage | ||||||
| 2. Real-World Relevance: Syslog is widely deployed in production environments, | ||||||
| particularly for log aggregation from network devices, legacy systems, and | ||||||
| infrastructure components | ||||||
| 3. Conversion Overhead: Both systems must perform meaningful work to convert | ||||||
| incoming Syslog messages into their internal representations: | ||||||
| - **OTel Collector**: Converts to Go-based `pdata` (protocol data) structures | ||||||
| - **OTel Arrow**: Converts to Arrow-based columnar memory format | ||||||
| 4. Complete Pipeline Test: This approach validates the full pipeline efficiency, | ||||||
| including parsing, transformation, and serialization stages | ||||||
|
|
||||||
| The output protocols are set to each system's native format: OTLP for the | ||||||
| OpenTelemetry Collector and OTAP for the OTel Arrow engine, ensuring optimal | ||||||
| egress performance for each. | ||||||
|
|
||||||
| ### Performance Comparison | ||||||
|
|
||||||
| #### Baseline (Idle State) | ||||||
|
|
||||||
| | Metric | OTel Collector | OTel Arrow | Improvement | | ||||||
| |--------|---------------|------------|-------------| | ||||||
| | CPU Usage | TBD | TBD | TBD | | ||||||
| | Memory Usage | TBD | TBD | TBD | | ||||||
|
|
||||||
| #### Standard Load (100K Syslog Messages/sec) | ||||||
|
|
||||||
| | Metric | OTel Collector | OTel Arrow | Improvement | | ||||||
| |--------|---------------|------------|-------------| | ||||||
| | CPU Usage | TBD | TBD | TBD | | ||||||
| | Memory Usage | TBD | TBD | TBD | | ||||||
| | Network Egress | TBD | TBD | TBD | | ||||||
| | Latency (p50) | TBD | TBD | TBD | | ||||||
| | Latency (p99) | TBD | TBD | TBD | | ||||||
| | Throughput (messages/sec) | TBD | TBD | TBD | | ||||||
|
|
||||||
| #### Saturation | ||||||
|
|
||||||
| | Metric | OTel Collector | OTel Arrow | Improvement | | ||||||
| |--------|---------------|------------|-------------| | ||||||
| | Maximum Sustained Throughput | TBD | TBD | TBD | | ||||||
| | Throughput / Core | TBD | TBD | TBD | | ||||||
| | CPU at Saturation | TBD | TBD | TBD | | ||||||
| | Memory at Saturation | TBD | TBD | TBD | | ||||||
| | Behavior Under Overload | TBD | TBD | TBD | | ||||||
|
|
||||||
| ### Key Findings | ||||||
|
|
||||||
| To be populated with analysis once benchmark data is available. | ||||||
|
|
||||||
| The comparative analysis will demonstrate: | ||||||
|
|
||||||
| - Relative efficiency of Arrow-based columnar processing vs traditional | ||||||
| row-oriented data structures | ||||||
| - Memory allocation patterns and garbage collection impact (Rust vs Go) | ||||||
| - Throughput and latency characteristics under varying load conditions | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Additional Resources | ||||||
|
|
||||||
| - [Detailed Benchmark Results from phase2](benchmarks.md) | ||||||
| - [Phase 1 Benchmark Results](benchmarks-phase1.md) | ||||||
| - [OTAP Dataflow Engine Documentation](../rust/otap-dataflow/README.md) | ||||||
| - [Project Phases Overview](project-phases.md) | ||||||
Uh oh!
There was an error while loading. Please reload this page.