Skip to content

chit786/instrumentation-score

Instrumentation Score Service

Evaluate and improve your Prometheus metrics quality with automated scoring

A production-ready tool that analyzes Prometheus metrics against instrumentation best practices, providing actionable insights to improve observability quality, reduce costs, and maintain healthy metrics.

CI Go License: MIT Spec Compliant


🎬 Demo

demo.mp4

Demo showing metrics analysis, scoring, and HTML report generation


Table of Contents


πŸ“– About

Spec-Compliant Implementation

This project implements the Instrumentation Score specification - originally designed for OpenTelemetry (OTLP), now extended to support Prometheus-compatible metrics.

Why this matters:

  • πŸ“Š Standardized quality measurement across observability stacks
  • 🎯 Proven scoring methodology from the OpenTelemetry community
  • πŸ”„ Vendor-neutral, community-driven approach

Key Features

  • βœ… Automated Scoring: Quality scores (0-100) per job/service
  • βœ… Declarative Rules: Define custom rules in YAML
  • βœ… Multi-Format Output: HTML, JSON, Text, Prometheus metrics
  • βœ… Cost Estimation: Calculate storage costs based on cardinality
  • βœ… S3 Integration: Store and retrieve reports from S3
  • βœ… CI/CD Ready: Easy integration with GitHub Actions, Jenkins, etc.
  • βœ… Performance Tuned: Configurable concurrency for optimal speed

πŸš€ Quick Start

1. Set Credentials

export login="user:api_key"
export url="https://your-prometheus-instance.com/api/prom"

2. Analyze Metrics

/instrumentation-score analyze \
  --output-dir=reports/adaptive \
  --retry-failures-count=3 \
  --batch-rps-limit 1000 \ # number of metrics to allow in batch for single promql query. 
  --batch-mode adaptive \
  --batch-interval-ms 2000 \
  --batch-concurrency 1 \
  --jobs-concurrency 2
  # --collect-label-cardinality \ enable if you want per label cardinality to be calculated
  # --label-cardinality-concurrency 5

3. Evaluate & Get Scores

# All jobs with HTML report
instrumentation-score evaluate \
  --job-dir reports/adaptive \
  --output html \
  --html-file report.html \ # a detailed html report for issues in metrics
  --show-costs \
  --cost-unit-price 0.00615 # $6.15 per 1000 metric

4. View Results

Open report.html for an interactive dashboard with:

  • πŸ“Š Quality scores per job (0-100)
  • πŸ’° Cost breakdown
  • 🎯 Per-metric failure details
  • πŸ’‘ Actionable recommendations

πŸ“¦ Installation

Option 1: Pre-built Binary (Recommended)

Download from the releases page:

# Linux (amd64)
wget https://github.com/chit786/instrumentation-score/releases/latest/download/instrumentation-score-linux-amd64
chmod +x instrumentation-score-linux-amd64
sudo mv instrumentation-score-linux-amd64 /usr/local/bin/instrumentation-score

# macOS (Apple Silicon)
wget https://github.com/chit786/instrumentation-score/releases/latest/download/instrumentation-score-darwin-arm64
chmod +x instrumentation-score-darwin-arm64
sudo mv instrumentation-score-darwin-arm64 /usr/local/bin/instrumentation-score

# macOS (Intel)
wget https://github.com/chit786/instrumentation-score/releases/latest/download/instrumentation-score-darwin-amd64
chmod +x instrumentation-score-darwin-amd64
sudo mv instrumentation-score-darwin-amd64 /usr/local/bin/instrumentation-score

Option 2: Docker

docker pull ghcr.io/chit786/instrumentation-score:latest

Option 3: Build from Source

git clone https://github.com/chit786/instrumentation-score.git
cd instrumentation-score
go build -o instrumentation-score .

πŸ“‹ Commands

analyze

Collect metrics from Prometheus/Mimir and group by job.

instrumentation-score analyze \
  --output-dir ./reports \
  --collect-label-cardinality \
  --additional-query-filters 'cluster=~"prod.*"'

Required Environment Variables:

  • url: Prometheus API URL
  • login: Basic auth credentials (format: user:password)

Key Flags:

  • --output-dir: Where to save reports (required)
  • --collect-label-cardinality: Enable accurate per-label cardinality (recommended for Mimir)
  • --additional-query-filters: PromQL filters to limit scope
  • --retry-failures-count: Retry attempts for transient failures (default: 2)
  • --s3-upload: Upload results to S3

Output:

  • job_metrics_TIMESTAMP/: Per-job metric files
  • metrics_errors_TIMESTAMP.txt: Error log

evaluate

Evaluate metrics against rules and generate reports.

# Evaluate single job
instrumentation-score evaluate \
  --job-file reports/job_metrics_*/api-service.txt \
  --output html \
  --html-file report.html

# Evaluate all jobs
instrumentation-score evaluate \
  --job-dir reports/job_metrics_*/ \
  --output html,json,prometheus \
  --html-file dashboard.html \
  --json-file results.json \
  --prometheus-file metrics.prom \
  --show-costs \
  --cost-unit-price 0.00615

# Evaluate from S3
instrumentation-score evaluate \
  --s3-source \
  --s3-bucket my-bucket \
  --s3-prefix instrumentation-reports/job_metrics_20251102_160000 \
  --output html \
  --html-file dashboard.html

Key Flags:

  • --rules, -r: Rules configuration file (default: rules_config.yaml)
  • --output, -o: Output formats (comma-separated): text, json, html, prometheus
  • --show-costs: Calculate estimated costs
  • --cost-unit-price: Cost per series/month (e.g., 0.00615 = $6.15/1000 series)
  • --min-score: Highlight jobs below threshold
  • --s3-source: Download source data from S3
  • --s3-upload: Upload evaluation results to S3

βš™οΈ Configuration

Environment Variables

Authentication (Required):

export url="https://your-prometheus-instance.com/api/prom"
export login="user:api_key"

Concurrency Tuning (Optional):

export CONCURRENT_METRICS=5              # Metrics processed in parallel
export CONCURRENT_JOBS=3                 # Job queries per metric
export CONCURRENT_LABEL_CARDINALITY=50   # Label cardinality API calls

S3 Configuration (Optional):

export S3_BUCKET=my-metrics-bucket
export S3_PREFIX=instrumentation-reports
export AWS_REGION=eu-west-1

Performance Presets

Conservative (rate-limited APIs):

export CONCURRENT_METRICS=3
export CONCURRENT_JOBS=2
export CONCURRENT_LABEL_CARDINALITY=25

Balanced (default - most cases):

export CONCURRENT_METRICS=5
export CONCURRENT_JOBS=3
export CONCURRENT_LABEL_CARDINALITY=50

Aggressive (fast networks, high capacity):

export CONCURRENT_METRICS=10
export CONCURRENT_JOBS=5
export CONCURRENT_LABEL_CARDINALITY=100

Command-Line Flags Override

Flags override environment variables:

export CONCURRENT_METRICS=5

instrumentation-score analyze \
  --metrics-concurrency 10  # Uses 10, not 5

πŸ—„οΈ S3 Integration

Upload Analysis Results

# Set AWS credentials (choose one method)
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
# OR
export AWS_PROFILE=production
# OR use IAM role (EC2/ECS/EKS)

# Configure S3
export S3_BUCKET=my-metrics-bucket
export S3_PREFIX=instrumentation-reports

# Analyze and upload
instrumentation-score analyze \
  --output-dir ./reports \
  --s3-upload

Download and Evaluate

instrumentation-score evaluate \
  --s3-source \
  --s3-bucket my-bucket \
  --s3-prefix instrumentation-reports/job_metrics_20251102_160000 \
  --output html \
  --html-file dashboard.html

S3 Structure

s3://bucket/prefix/
β”œβ”€β”€ job_metrics_20251102_160000/    # Raw metrics
β”‚   β”œβ”€β”€ job1.txt
β”‚   β”œβ”€β”€ job2.txt
β”‚   └── ...
β”œβ”€β”€ metrics_errors_20251102_160000.txt
└── evaluations/                    # Evaluation results
    └── run-id/
        β”œβ”€β”€ dashboard.html
        β”œβ”€β”€ report.json
        └── manifest.json

πŸ“ Rule System

How It Works

Rules are defined in rules_config.yaml and evaluated against your metrics.

Example Rule:

- rule_id: "PROM-MET-02"
  description: "Metrics must maintain bounded cardinality"
  impact: "Critical"  # Weight: 40
  validators:
    - name: "cardinality_check"
      type: "cardinality"
      data_source: "cardinality"
      conditions:
        - field: "count"
          operator: "lt"
          value: 10000

Impact Levels

Impact Weight Use Case
Critical 40 Security, compliance, system stability
Important 30 Significant quality impact
Normal 20 Standard best practices
Low 10 Nice-to-have improvements

Score Calculation

Uses the Instrumentation Score specification formula:

Score = (Ξ£(P_i Γ— W_i) / Ξ£(T_i Γ— W_i)) Γ— 100

Where:
  P_i = Metrics passed for impact level i
  T_i = Total metrics for impact level i
  W_i = Weight for impact level i

Example:

Job: api-service (100 metrics)

Rule Results:
- PROM-MET-01 (Important, W=30): 95/100 passed
- PROM-MET-02 (Critical, W=40):  80/100 passed
- PROM-MET-03 (Important, W=30): 90/100 passed

Calculation:
Numerator   = (95Γ—30) + (80Γ—40) + (90Γ—30) = 8,750
Denominator = (100Γ—30) + (100Γ—40) + (100Γ—30) = 10,000

Score = (8,750 / 10,000) Γ— 100 = 87.5% 🟒 Good

Creating Custom Rules

See FRAMEWORK.md for detailed guide on creating custom rules.


πŸ“Š Output Formats

Text (Terminal)

instrumentation-score evaluate \
  --job-file reports/job_metrics_*/api-service.txt \
  --output text

Output:

=== Instrumentation Score Report for Job: api-service ===

Total Metrics: 45
Instrumentation Score: 97.63%

Rule Evaluation Results:
------------------------
Rule PROM-MET-01 (Important): 44/45 metrics passed (97.8%)
  Failed metrics: http_request_duration
Rule PROM-MET-02 (Critical): 45/45 metrics passed (100.0%)

JSON

instrumentation-score evaluate \
  --job-dir reports/job_metrics_*/ \
  --output json \
  --json-file results.json

HTML (Interactive Dashboard)

instrumentation-score evaluate \
  --job-dir reports/job_metrics_*/ \
  --output html \
  --html-file report.html \
  --show-costs \
  --cost-unit-price 0.00615

Features:

  • πŸ“Š Sortable job list
  • πŸ’° Cost breakdown
  • πŸ” Searchable metrics
  • πŸ“ˆ Per-metric drill-down
  • πŸ’‘ Failure reasons

Prometheus Metrics

instrumentation-score evaluate \
  --job-dir reports/job_metrics_*/ \
  --output prometheus \
  --prometheus-file metrics.prom

Exports:

  • instrumentation_quality_score{job="..."}
  • instrumentation_rule_metrics_total{job="...",rule_id="...",impact="..."}
  • instrumentation_rule_metrics_failed_total{job="...",rule_id="...",impact="..."}

πŸ”„ CI/CD Integration

GitHub Actions

name: Instrumentation Score
on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM

jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.21'
      
      - name: Build
        run: go build -o instrumentation-score .
      
      - name: Analyze Metrics
        env:
          url: ${{ secrets.PROMETHEUS_URL }}
          login: ${{ secrets.PROMETHEUS_LOGIN }}
          CONCURRENT_METRICS: 10
          CONCURRENT_LABEL_CARDINALITY: 75
        run: |
          ./instrumentation-score analyze \
            --output-dir ./reports \
            --collect-label-cardinality
      
      - name: Generate Report
        run: |
          ./instrumentation-score evaluate \
            --job-dir ./reports/job_metrics_*/ \
            --output html,json \
            --html-file report.html \
            --json-file results.json \
            --show-costs \
            --cost-unit-price 0.00615
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: instrumentation-report
          path: |
            report.html
            results.json

Docker

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o instrumentation-score .

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/instrumentation-score .
COPY rules_config.yaml .
ENTRYPOINT ["./instrumentation-score"]
docker build -t instrumentation-score .

docker run \
  -e url="https://your-prometheus-instance.com/api/prom" \
  -e login="user:api_key" \
  -e CONCURRENT_METRICS=10 \
  -v $(pwd)/reports:/reports \
  instrumentation-score \
  analyze --output-dir /reports --collect-label-cardinality

Kubernetes CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: instrumentation-score
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: instrumentation-score
            image: ghcr.io/chit786/instrumentation-score:latest
            env:
            - name: CONCURRENT_METRICS
              value: "10"
            - name: CONCURRENT_LABEL_CARDINALITY
              value: "75"
            - name: url
              valueFrom:
                secretKeyRef:
                  name: prometheus-creds
                  key: url
            - name: login
              valueFrom:
                secretKeyRef:
                  name: prometheus-creds
                  key: login
            args:
            - analyze
            - --output-dir
            - /reports
            - --collect-label-cardinality
            - --s3-upload
          restartPolicy: OnFailure

πŸ”§ Troubleshooting

Rate Limit Errors (429)

Problem: Too many concurrent requests.

Solution: Reduce concurrency:

export CONCURRENT_METRICS=3
export CONCURRENT_JOBS=2
export CONCURRENT_LABEL_CARDINALITY=25

Slow Collection

Problem: Collection takes too long.

Solution: Increase concurrency or add filters:

# Increase concurrency
export CONCURRENT_METRICS=10
export CONCURRENT_LABEL_CARDINALITY=100

# Or filter metrics
instrumentation-score analyze \
  --additional-query-filters 'cluster=~"prod.*"'

Missing Label Cardinality

Problem: HTML report shows estimates instead of actual values.

Solution: Enable label cardinality collection:

instrumentation-score analyze \
  --collect-label-cardinality

Note: Requires Grafana Mimir or Grafana Cloud. Not supported by vanilla Prometheus.

S3 Upload Fails

Problem: "no credentials" error.

Solution: Set AWS credentials:

export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
# OR
export AWS_PROFILE=production

Environment Variables Not Working

Problem: Settings not taking effect.

Solution: Check precedence (flags override env vars):

# Verify env vars are set
env | grep CONCURRENT

# Use explicit flags
instrumentation-score analyze --metrics-concurrency 10

πŸ“š Additional Resources


🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

This project follows the Instrumentation Score specification. When creating rules, please follow the spec's rule format for consistency.


πŸ“ License

This project is licensed under the MIT License. See LICENSE for details.


πŸ™ Acknowledgments

About

Prometheus metrics instrumentation quality & cost analyzer with automated scoring

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors