Evaluate and improve your Prometheus metrics quality with automated scoring
A production-ready tool that analyzes Prometheus metrics against instrumentation best practices, providing actionable insights to improve observability quality, reduce costs, and maintain healthy metrics.
demo.mp4
Demo showing metrics analysis, scoring, and HTML report generation
- About
- Quick Start
- Installation
- Commands
- Configuration
- S3 Integration
- Rule System
- Output Formats
- CI/CD Integration
- Troubleshooting
This project implements the Instrumentation Score specification - originally designed for OpenTelemetry (OTLP), now extended to support Prometheus-compatible metrics.
Why this matters:
- π Standardized quality measurement across observability stacks
- π― Proven scoring methodology from the OpenTelemetry community
- π Vendor-neutral, community-driven approach
- β Automated Scoring: Quality scores (0-100) per job/service
- β Declarative Rules: Define custom rules in YAML
- β Multi-Format Output: HTML, JSON, Text, Prometheus metrics
- β Cost Estimation: Calculate storage costs based on cardinality
- β S3 Integration: Store and retrieve reports from S3
- β CI/CD Ready: Easy integration with GitHub Actions, Jenkins, etc.
- β Performance Tuned: Configurable concurrency for optimal speed
export login="user:api_key"
export url="https://your-prometheus-instance.com/api/prom"/instrumentation-score analyze \
--output-dir=reports/adaptive \
--retry-failures-count=3 \
--batch-rps-limit 1000 \ # number of metrics to allow in batch for single promql query.
--batch-mode adaptive \
--batch-interval-ms 2000 \
--batch-concurrency 1 \
--jobs-concurrency 2
# --collect-label-cardinality \ enable if you want per label cardinality to be calculated
# --label-cardinality-concurrency 5# All jobs with HTML report
instrumentation-score evaluate \
--job-dir reports/adaptive \
--output html \
--html-file report.html \ # a detailed html report for issues in metrics
--show-costs \
--cost-unit-price 0.00615 # $6.15 per 1000 metricOpen report.html for an interactive dashboard with:
- π Quality scores per job (0-100)
- π° Cost breakdown
- π― Per-metric failure details
- π‘ Actionable recommendations
Download from the releases page:
# Linux (amd64)
wget https://github.com/chit786/instrumentation-score/releases/latest/download/instrumentation-score-linux-amd64
chmod +x instrumentation-score-linux-amd64
sudo mv instrumentation-score-linux-amd64 /usr/local/bin/instrumentation-score
# macOS (Apple Silicon)
wget https://github.com/chit786/instrumentation-score/releases/latest/download/instrumentation-score-darwin-arm64
chmod +x instrumentation-score-darwin-arm64
sudo mv instrumentation-score-darwin-arm64 /usr/local/bin/instrumentation-score
# macOS (Intel)
wget https://github.com/chit786/instrumentation-score/releases/latest/download/instrumentation-score-darwin-amd64
chmod +x instrumentation-score-darwin-amd64
sudo mv instrumentation-score-darwin-amd64 /usr/local/bin/instrumentation-scoredocker pull ghcr.io/chit786/instrumentation-score:latestgit clone https://github.com/chit786/instrumentation-score.git
cd instrumentation-score
go build -o instrumentation-score .Collect metrics from Prometheus/Mimir and group by job.
instrumentation-score analyze \
--output-dir ./reports \
--collect-label-cardinality \
--additional-query-filters 'cluster=~"prod.*"'Required Environment Variables:
url: Prometheus API URLlogin: Basic auth credentials (format:user:password)
Key Flags:
--output-dir: Where to save reports (required)--collect-label-cardinality: Enable accurate per-label cardinality (recommended for Mimir)--additional-query-filters: PromQL filters to limit scope--retry-failures-count: Retry attempts for transient failures (default: 2)--s3-upload: Upload results to S3
Output:
job_metrics_TIMESTAMP/: Per-job metric filesmetrics_errors_TIMESTAMP.txt: Error log
Evaluate metrics against rules and generate reports.
# Evaluate single job
instrumentation-score evaluate \
--job-file reports/job_metrics_*/api-service.txt \
--output html \
--html-file report.html
# Evaluate all jobs
instrumentation-score evaluate \
--job-dir reports/job_metrics_*/ \
--output html,json,prometheus \
--html-file dashboard.html \
--json-file results.json \
--prometheus-file metrics.prom \
--show-costs \
--cost-unit-price 0.00615
# Evaluate from S3
instrumentation-score evaluate \
--s3-source \
--s3-bucket my-bucket \
--s3-prefix instrumentation-reports/job_metrics_20251102_160000 \
--output html \
--html-file dashboard.htmlKey Flags:
--rules,-r: Rules configuration file (default:rules_config.yaml)--output,-o: Output formats (comma-separated):text,json,html,prometheus--show-costs: Calculate estimated costs--cost-unit-price: Cost per series/month (e.g., 0.00615 = $6.15/1000 series)--min-score: Highlight jobs below threshold--s3-source: Download source data from S3--s3-upload: Upload evaluation results to S3
Authentication (Required):
export url="https://your-prometheus-instance.com/api/prom"
export login="user:api_key"Concurrency Tuning (Optional):
export CONCURRENT_METRICS=5 # Metrics processed in parallel
export CONCURRENT_JOBS=3 # Job queries per metric
export CONCURRENT_LABEL_CARDINALITY=50 # Label cardinality API callsS3 Configuration (Optional):
export S3_BUCKET=my-metrics-bucket
export S3_PREFIX=instrumentation-reports
export AWS_REGION=eu-west-1Conservative (rate-limited APIs):
export CONCURRENT_METRICS=3
export CONCURRENT_JOBS=2
export CONCURRENT_LABEL_CARDINALITY=25Balanced (default - most cases):
export CONCURRENT_METRICS=5
export CONCURRENT_JOBS=3
export CONCURRENT_LABEL_CARDINALITY=50Aggressive (fast networks, high capacity):
export CONCURRENT_METRICS=10
export CONCURRENT_JOBS=5
export CONCURRENT_LABEL_CARDINALITY=100Flags override environment variables:
export CONCURRENT_METRICS=5
instrumentation-score analyze \
--metrics-concurrency 10 # Uses 10, not 5# Set AWS credentials (choose one method)
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
# OR
export AWS_PROFILE=production
# OR use IAM role (EC2/ECS/EKS)
# Configure S3
export S3_BUCKET=my-metrics-bucket
export S3_PREFIX=instrumentation-reports
# Analyze and upload
instrumentation-score analyze \
--output-dir ./reports \
--s3-uploadinstrumentation-score evaluate \
--s3-source \
--s3-bucket my-bucket \
--s3-prefix instrumentation-reports/job_metrics_20251102_160000 \
--output html \
--html-file dashboard.htmls3://bucket/prefix/
βββ job_metrics_20251102_160000/ # Raw metrics
β βββ job1.txt
β βββ job2.txt
β βββ ...
βββ metrics_errors_20251102_160000.txt
βββ evaluations/ # Evaluation results
βββ run-id/
βββ dashboard.html
βββ report.json
βββ manifest.json
Rules are defined in rules_config.yaml and evaluated against your metrics.
Example Rule:
- rule_id: "PROM-MET-02"
description: "Metrics must maintain bounded cardinality"
impact: "Critical" # Weight: 40
validators:
- name: "cardinality_check"
type: "cardinality"
data_source: "cardinality"
conditions:
- field: "count"
operator: "lt"
value: 10000| Impact | Weight | Use Case |
|---|---|---|
| Critical | 40 | Security, compliance, system stability |
| Important | 30 | Significant quality impact |
| Normal | 20 | Standard best practices |
| Low | 10 | Nice-to-have improvements |
Uses the Instrumentation Score specification formula:
Score = (Ξ£(P_i Γ W_i) / Ξ£(T_i Γ W_i)) Γ 100
Where:
P_i = Metrics passed for impact level i
T_i = Total metrics for impact level i
W_i = Weight for impact level i
Example:
Job: api-service (100 metrics)
Rule Results:
- PROM-MET-01 (Important, W=30): 95/100 passed
- PROM-MET-02 (Critical, W=40): 80/100 passed
- PROM-MET-03 (Important, W=30): 90/100 passed
Calculation:
Numerator = (95Γ30) + (80Γ40) + (90Γ30) = 8,750
Denominator = (100Γ30) + (100Γ40) + (100Γ30) = 10,000
Score = (8,750 / 10,000) Γ 100 = 87.5% π’ Good
See FRAMEWORK.md for detailed guide on creating custom rules.
instrumentation-score evaluate \
--job-file reports/job_metrics_*/api-service.txt \
--output textOutput:
=== Instrumentation Score Report for Job: api-service ===
Total Metrics: 45
Instrumentation Score: 97.63%
Rule Evaluation Results:
------------------------
Rule PROM-MET-01 (Important): 44/45 metrics passed (97.8%)
Failed metrics: http_request_duration
Rule PROM-MET-02 (Critical): 45/45 metrics passed (100.0%)
instrumentation-score evaluate \
--job-dir reports/job_metrics_*/ \
--output json \
--json-file results.jsoninstrumentation-score evaluate \
--job-dir reports/job_metrics_*/ \
--output html \
--html-file report.html \
--show-costs \
--cost-unit-price 0.00615Features:
- π Sortable job list
- π° Cost breakdown
- π Searchable metrics
- π Per-metric drill-down
- π‘ Failure reasons
instrumentation-score evaluate \
--job-dir reports/job_metrics_*/ \
--output prometheus \
--prometheus-file metrics.promExports:
instrumentation_quality_score{job="..."}instrumentation_rule_metrics_total{job="...",rule_id="...",impact="..."}instrumentation_rule_metrics_failed_total{job="...",rule_id="...",impact="..."}
name: Instrumentation Score
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.21'
- name: Build
run: go build -o instrumentation-score .
- name: Analyze Metrics
env:
url: ${{ secrets.PROMETHEUS_URL }}
login: ${{ secrets.PROMETHEUS_LOGIN }}
CONCURRENT_METRICS: 10
CONCURRENT_LABEL_CARDINALITY: 75
run: |
./instrumentation-score analyze \
--output-dir ./reports \
--collect-label-cardinality
- name: Generate Report
run: |
./instrumentation-score evaluate \
--job-dir ./reports/job_metrics_*/ \
--output html,json \
--html-file report.html \
--json-file results.json \
--show-costs \
--cost-unit-price 0.00615
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: instrumentation-report
path: |
report.html
results.jsonFROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o instrumentation-score .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/instrumentation-score .
COPY rules_config.yaml .
ENTRYPOINT ["./instrumentation-score"]docker build -t instrumentation-score .
docker run \
-e url="https://your-prometheus-instance.com/api/prom" \
-e login="user:api_key" \
-e CONCURRENT_METRICS=10 \
-v $(pwd)/reports:/reports \
instrumentation-score \
analyze --output-dir /reports --collect-label-cardinalityapiVersion: batch/v1
kind: CronJob
metadata:
name: instrumentation-score
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: instrumentation-score
image: ghcr.io/chit786/instrumentation-score:latest
env:
- name: CONCURRENT_METRICS
value: "10"
- name: CONCURRENT_LABEL_CARDINALITY
value: "75"
- name: url
valueFrom:
secretKeyRef:
name: prometheus-creds
key: url
- name: login
valueFrom:
secretKeyRef:
name: prometheus-creds
key: login
args:
- analyze
- --output-dir
- /reports
- --collect-label-cardinality
- --s3-upload
restartPolicy: OnFailureProblem: Too many concurrent requests.
Solution: Reduce concurrency:
export CONCURRENT_METRICS=3
export CONCURRENT_JOBS=2
export CONCURRENT_LABEL_CARDINALITY=25Problem: Collection takes too long.
Solution: Increase concurrency or add filters:
# Increase concurrency
export CONCURRENT_METRICS=10
export CONCURRENT_LABEL_CARDINALITY=100
# Or filter metrics
instrumentation-score analyze \
--additional-query-filters 'cluster=~"prod.*"'Problem: HTML report shows estimates instead of actual values.
Solution: Enable label cardinality collection:
instrumentation-score analyze \
--collect-label-cardinalityNote: Requires Grafana Mimir or Grafana Cloud. Not supported by vanilla Prometheus.
Problem: "no credentials" error.
Solution: Set AWS credentials:
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
# OR
export AWS_PROFILE=productionProblem: Settings not taking effect.
Solution: Check precedence (flags override env vars):
# Verify env vars are set
env | grep CONCURRENT
# Use explicit flags
instrumentation-score analyze --metrics-concurrency 10- FRAMEWORK.md - Complete guide to creating custom rules
- CONTRIBUTING.md - Contribution guidelines
- Instrumentation Score Spec - Official specification
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project follows the Instrumentation Score specification. When creating rules, please follow the spec's rule format for consistency.
This project is licensed under the MIT License. See LICENSE for details.
- Instrumentation Score Specification by OllyGarden
- Prometheus and Grafana communities
- Grafana Mimir for cardinality APIs