Skip to content

het2576/NeuralOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

NeuralOps

NeuralOps

The Intelligent AI Inference Gateway

Stop paying premium rates for every AI request. NeuralOps automatically routes each prompt to the cheapest model capable of answering it โ€” saving up to 85% on LLM costs without sacrificing quality.


FastAPI React Python Groq License PRs Welcome


๐Ÿ† 2nd Runner Up โ€” โ‚น5000 Prize โ€” Built in 36 hours at hackathon


Features ยท Demo ยท Quick Start ยท Pricing ยท Architecture ยท API Reference ยท Roadmap


๐Ÿง  The Problem

Every company using AI APIs today pays a flat premium rate for every single request โ€” regardless of whether that request needs frontier intelligence or could be answered by a model 15x cheaper.

A user asks: "What is the capital of France?"

Without NeuralOps With NeuralOps
Routes to premium model โ†’ $0.000018 Routes to Llama 3.1 8B โ†’ $0.0000012
93% of cost wasted 93% saved automatically

This waste compounds at scale:

Monthly AI Spend Wasted (est. 70%) Annual Waste
$5,000 $3,500 $42,000
$50,000 $35,000 $420,000
$500,000 $350,000 $4,200,000

NeuralOps fixes this โ€” automatically, with zero changes to your existing code.


โœจ Features

๐ŸŽฏ Intelligent Routing Engine

Every prompt is classified as SIMPLE, MEDIUM, or COMPLEX using a fast LLM classifier, then routed to the optimal cost-tier model. No configuration needed.

โš”๏ธ Battle Mode

Send the same prompt to all 3 models simultaneously. See responses, latency, cost, and composite scores side-by-side. NeuralOps picks the winner using complexity-aware scoring.

๐Ÿ“Š Real-Time Dashboard

Live WebSocket-powered stats โ€” total requests, cumulative savings, routing distribution, and per-request activity feed. Zero page refreshes.

๐Ÿ—บ๏ธ Traffic Flow Visualization

Interactive node-graph showing how requests flow from classifier โ†’ router โ†’ model tier, with live health indicators per node.

๐Ÿ“ˆ Insights Panel

Deep analytics on request history: complexity distribution, per-model savings breakdown, NeuralOps Intelligence Report with automatic recommendations, and activity timeline.

๐Ÿ’ฐ ROI Calculator

Input your monthly AI spend โ€” get projected savings, annual ROI, payback period, and NeuralOps fee. Pure math, no API calls.

๐Ÿ›ก๏ธ Self-Healing System

Simulate model outages by toggling health states. The router automatically falls back through a priority chain โ€” requests always get routed to a healthy model with zero downtime.

๐Ÿค– Three-Layer Fallback Classifier

If the AI classifier fails: โ†’ Rule-based keyword fallback โ†’ Hardcoded COMPLEX safety fallback. Quality is never compromised.


๐ŸŽฌ Demo

Smart Routing โ€” Same API, Different Models

# Simple question โ†’ Economy model (Llama 3.1 8B)
curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"text": "What is 2+2?"}'

# Response includes:
# "tier": "economy"
# "savings_percentage": 93.3
# "latency_ms": 147

# Complex question โ†’ Premium model (Qwen 3 32B)  
curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"text": "Design a distributed payment system for 10 million users"}'

# Response includes:
# "tier": "premium"
# "complexity": "COMPLEX"
# "latency_ms": 4821

Battle Mode โ€” Let Models Compete

curl -X POST http://localhost:8000/battle \
  -H "Content-Type: application/json" \
  -d '{"text": "Explain how transformers work in ML"}'

# All 3 models respond in parallel
# NeuralOps picks the winner
# You see cost, latency, quality score for each

๐Ÿš€ Quick Start

Prerequisites

1. Clone

git clone https://github.com/YOUR_GITHUB_USERNAME/NeuralOps.git
cd NeuralOps

2. Backend Setup

cd backend

# Create virtual environment
python -m venv .venv
source .venv/bin/activate      # macOS/Linux
# .venv\Scripts\activate       # Windows

# Install dependencies
pip install fastapi uvicorn aiosqlite python-dotenv groq httpx

# Configure environment
cp .env.example .env
# Add your GROQ_API_KEY to .env

# Start server
uvicorn main:app --reload --port 8000

3. Frontend Setup

# New terminal
cd frontend
npm install
npm run dev

Open http://localhost:3000 ๐ŸŽ‰

4. Seed Demo Data (Optional)

# Populates dashboard with ~300 realistic requests
cd backend
python seed_data.py

Environment Variables

Create backend/.env:

GROQ_API_KEY=gsk_...                      # Required
CLASSIFIER_MODEL=llama-3.1-8b-instant     # Classifier model
CHEAP_MODEL=llama-3.1-8b-instant          # Economy tier
MID_MODEL=llama-3.3-70b-versatile         # Standard tier
PREMIUM_MODEL=qwen/qwen3-32b              # Premium tier
DATABASE_URL=neuralops.db                 # SQLite path

๐Ÿ’ธ Pricing

We earn only when you save.

NeuralOps charges 3% of your monthly savings โ€” nothing more. Zero risk for you.

Monthly AI Spend Est. Savings NeuralOps Fee Your Net Gain
$1,000 ~$700 $21 $679
$10,000 ~$7,000 $210 $6,790
$50,000 ~$35,000 $1,050 $33,950
$500,000 ~$350,000 $10,500 $339,500

Minimum fee: $99/month. If NeuralOps doesn't save you money โ€” you don't pay. Simple.


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  FRONTEND  (React 19 + Vite)                 โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ Dashboard โ”‚ โ”‚Battle Mode โ”‚ โ”‚ Traffic  โ”‚ โ”‚  Insights  โ”‚  โ”‚
โ”‚  โ”‚(Recharts) โ”‚ โ”‚            โ”‚ โ”‚   Map    โ”‚ โ”‚            โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  ROI Calculator  โ”‚  โ”‚       Self-Healing Panel        โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚ REST + WebSocket
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  BACKEND  (FastAPI)                           โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Classifier โ”‚โ†’ โ”‚  Router  โ”‚โ†’ โ”‚     Cost Tracker       โ”‚   โ”‚
โ”‚  โ”‚(Llama 8B)  โ”‚  โ”‚  Engine  โ”‚  โ”‚                        โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚Rule-Based  โ”‚  โ”‚Self-Heal โ”‚  โ”‚  SQLite (aiosqlite)    โ”‚   โ”‚
โ”‚  โ”‚  Fallback  โ”‚  โ”‚ Manager  โ”‚  โ”‚                        โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚              โ”‚                  โ”‚
  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Llama 3.1  โ”‚ โ”‚ Llama 3.3  โ”‚ โ”‚  Qwen 3 32B   โ”‚
  โ”‚    8B      โ”‚ โ”‚    70B     โ”‚ โ”‚               โ”‚
  โ”‚  Economy   โ”‚ โ”‚  Standard  โ”‚ โ”‚   Premium     โ”‚
  โ”‚ $0.06/1M   โ”‚ โ”‚ $0.59/1M   โ”‚ โ”‚  $0.90/1M    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   Groq API

Request Flow

1.  POST /route  โ†  User prompt arrives
2.  Classifier   โ†’  SIMPLE / MEDIUM / COMPLEX + confidence score
3.  Router       โ†’  Selects cheapest healthy model
4.  Groq API     โ†’  Model generates response
5.  Cost Tracker โ†’  Calculates actual cost vs premium baseline
6.  SQLite       โ†’  Request saved to database
7.  WebSocket    โ†’  Dashboard updated in real time
8.  Response     โ†’  Returned with full metadata

Fallback Chain

Complexity Primary Fallback 1 Fallback 2
SIMPLE Economy Standard Premium
MEDIUM Standard Premium Economy
COMPLEX Premium Standard Economy

If all models fail โ†’ honest 503 with retry guidance.


๐Ÿงฎ How Costs Are Calculated

Latency

start = time.time()
response = await call_model(prompt, model)
latency_ms = (time.time() - start) * 1000

Cost Per Request

tokens = response.usage.prompt_tokens + response.usage.completion_tokens

PRICE_PER_TOKEN = {
    "llama-3.1-8b-instant":    0.06   / 1_000_000,
    "llama-3.3-70b-versatile": 0.59   / 1_000_000,
    "qwen/qwen3-32b":          0.90   / 1_000_000,
}

actual_cost   = tokens * PRICE_PER_TOKEN[model_used]
baseline_cost = tokens * PRICE_PER_TOKEN["qwen/qwen3-32b"]
savings       = baseline_cost - actual_cost
savings_pct   = (savings / baseline_cost) * 100

Battle Mode Scoring

Complexity-aware composite score:

Complexity Cost Weight Speed Weight Quality Weight
SIMPLE 80% 15% 5%
MEDIUM 15% 15% 70%
COMPLEX 5% 5% 90%

๐Ÿ“ก API Reference

POST /route

Route a prompt to the optimal model.

Request:

{ "text": "Your prompt here" }

Response:

{
  "request_id": "uuid",
  "response": "Model response text",
  "model_used": "Llama 3.1 8B",
  "tier": "economy",
  "complexity": "SIMPLE",
  "confidence": 0.97,
  "routing_reason": "Single factual question",
  "latency_ms": 147,
  "input_tokens": 12,
  "output_tokens": 18,
  "actual_cost": 0.0000018,
  "cost_without_neuralops": 0.000027,
  "savings": 0.0000252,
  "savings_percentage": 93.3,
  "is_fallback": false
}

POST /battle

Run the same prompt against all 3 models simultaneously.

GET /stats

Aggregate statistics โ€” total requests, savings, model distribution.

GET /history?limit=50&offset=0

Paginated request history.

GET /health

Current health state of all model tiers.

POST /health/toggle

Toggle a model's health state (simulate outage/recovery).

{ "model_key": "llama-3.1-8b-instant", "healthy": false }

WebSocket /ws

Real-time event stream:

  • new_request โ€” fired on every routed request
  • stats_update โ€” updated aggregate stats
  • health_change โ€” model health state changes

๐Ÿ“ Project Structure

NeuralOps/
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ main.py              # FastAPI app, routes, WebSocket
โ”‚   โ”œโ”€โ”€ classifier.py        # LLM + rule-based prompt classifier
โ”‚   โ”œโ”€โ”€ router.py            # Routing table, health, fallback chains
โ”‚   โ”œโ”€โ”€ model_client.py      # Async Groq API calls
โ”‚   โ”œโ”€โ”€ cost_tracker.py      # Token cost math + savings calculation
โ”‚   โ”œโ”€โ”€ database.py          # SQLite schema + async CRUD
โ”‚   โ”œโ”€โ”€ models.py            # Pydantic schemas
โ”‚   โ”œโ”€โ”€ seed_data.py         # Demo data generator
โ”‚   โ””โ”€โ”€ .env.example         # Environment template
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ App.jsx                    # Navigation + WebSocket state
โ”‚   โ”‚   โ””โ”€โ”€ components/
โ”‚   โ”‚       โ”œโ”€โ”€ Dashboard.jsx          # Live stats + request feed
โ”‚   โ”‚       โ”œโ”€โ”€ BattleMode.jsx         # Model comparison arena
โ”‚   โ”‚       โ”œโ”€โ”€ TrafficFlow.jsx        # Live traffic visualization
โ”‚   โ”‚       โ”œโ”€โ”€ Insights.jsx           # Analytics + intelligence report
โ”‚   โ”‚       โ”œโ”€โ”€ ROICalculator.jsx      # Savings projection tool
โ”‚   โ”‚       โ”œโ”€โ”€ SelfHealingPanel.jsx   # Health toggle simulation
โ”‚   โ”‚       โ””โ”€โ”€ ui/                   # Card, Badge, shared primitives
โ”‚   โ”œโ”€โ”€ index.html
โ”‚   โ””โ”€โ”€ vite.config.js
โ”‚
โ””โ”€โ”€ README.md

๐Ÿ–ผ๏ธ Screenshots

Dashboard

Dashboard

Battle Mode

Battle Mode

Traffic Flow

Traffic Flow

ROI Calculator

ROI Calculator


๐Ÿ›ฃ๏ธ Roadmap

v1.1 โ€” Production Ready

  • PostgreSQL (replace SQLite)
  • API key auth per user
  • OpenAI + Anthropic model support
  • Rate limiting + usage quotas
  • Deploy to Railway + Vercel

v1.2 โ€” Developer SDK

  • npm install neuralops-sdk
  • Python SDK (pip install neuralops)
  • 2-line integration for any app
  • Webhook support

v2.0 โ€” SaaS Platform

  • Multi-tenant dashboard
  • Custom routing rules per customer
  • Quality guarantee SLAs
  • Usage-based billing (3% of savings, min $99/month)
  • On-premise deployment (Ollama)

๐Ÿค Contributing

Contributions are welcome! Please open an issue first to discuss what you'd like to change.

# Fork the repo
# Create your branch
git checkout -b feature/your-feature

# Make changes + commit
git commit -m "feat: your feature description"

# Push and open a PR
git push origin feature/your-feature

๐Ÿ“œ License

MIT License โ€” see LICENSE for details.


Built with โšก by NeuralOps Team

If this saved you money โ€” give it a โญ

Report Bug ยท Request Feature ยท Discussions

Releases

No releases published

Packages

 
 
 

Contributors