Stop paying premium rates for every AI request. NeuralOps automatically routes each prompt to the cheapest model capable of answering it โ saving up to 85% on LLM costs without sacrificing quality.
๐ 2nd Runner Up โ โน5000 Prize โ Built in 36 hours at hackathon
Features ยท Demo ยท Quick Start ยท Pricing ยท Architecture ยท API Reference ยท Roadmap
Every company using AI APIs today pays a flat premium rate for every single request โ regardless of whether that request needs frontier intelligence or could be answered by a model 15x cheaper.
A user asks: "What is the capital of France?"
| Without NeuralOps | With NeuralOps |
|---|---|
| Routes to premium model โ $0.000018 | Routes to Llama 3.1 8B โ $0.0000012 |
| 93% of cost wasted | 93% saved automatically |
This waste compounds at scale:
| Monthly AI Spend | Wasted (est. 70%) | Annual Waste |
|---|---|---|
| $5,000 | $3,500 | $42,000 |
| $50,000 | $35,000 | $420,000 |
| $500,000 | $350,000 | $4,200,000 |
NeuralOps fixes this โ automatically, with zero changes to your existing code.
Every prompt is classified as SIMPLE, MEDIUM, or COMPLEX using a fast LLM classifier, then routed to the optimal cost-tier model. No configuration needed.
Send the same prompt to all 3 models simultaneously. See responses, latency, cost, and composite scores side-by-side. NeuralOps picks the winner using complexity-aware scoring.
Live WebSocket-powered stats โ total requests, cumulative savings, routing distribution, and per-request activity feed. Zero page refreshes.
Interactive node-graph showing how requests flow from classifier โ router โ model tier, with live health indicators per node.
Deep analytics on request history: complexity distribution, per-model savings breakdown, NeuralOps Intelligence Report with automatic recommendations, and activity timeline.
Input your monthly AI spend โ get projected savings, annual ROI, payback period, and NeuralOps fee. Pure math, no API calls.
Simulate model outages by toggling health states. The router automatically falls back through a priority chain โ requests always get routed to a healthy model with zero downtime.
If the AI classifier fails: โ Rule-based keyword fallback โ Hardcoded COMPLEX safety fallback. Quality is never compromised.
# Simple question โ Economy model (Llama 3.1 8B)
curl -X POST http://localhost:8000/route \
-H "Content-Type: application/json" \
-d '{"text": "What is 2+2?"}'
# Response includes:
# "tier": "economy"
# "savings_percentage": 93.3
# "latency_ms": 147
# Complex question โ Premium model (Qwen 3 32B)
curl -X POST http://localhost:8000/route \
-H "Content-Type: application/json" \
-d '{"text": "Design a distributed payment system for 10 million users"}'
# Response includes:
# "tier": "premium"
# "complexity": "COMPLEX"
# "latency_ms": 4821curl -X POST http://localhost:8000/battle \
-H "Content-Type: application/json" \
-d '{"text": "Explain how transformers work in ML"}'
# All 3 models respond in parallel
# NeuralOps picks the winner
# You see cost, latency, quality score for each- Python 3.11+
- Node.js 18+
- Groq API Key (free at console.groq.com)
git clone https://github.com/YOUR_GITHUB_USERNAME/NeuralOps.git
cd NeuralOpscd backend
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# Install dependencies
pip install fastapi uvicorn aiosqlite python-dotenv groq httpx
# Configure environment
cp .env.example .env
# Add your GROQ_API_KEY to .env
# Start server
uvicorn main:app --reload --port 8000# New terminal
cd frontend
npm install
npm run devOpen http://localhost:3000 ๐
# Populates dashboard with ~300 realistic requests
cd backend
python seed_data.pyCreate backend/.env:
GROQ_API_KEY=gsk_... # Required
CLASSIFIER_MODEL=llama-3.1-8b-instant # Classifier model
CHEAP_MODEL=llama-3.1-8b-instant # Economy tier
MID_MODEL=llama-3.3-70b-versatile # Standard tier
PREMIUM_MODEL=qwen/qwen3-32b # Premium tier
DATABASE_URL=neuralops.db # SQLite pathWe earn only when you save.
NeuralOps charges 3% of your monthly savings โ nothing more. Zero risk for you.
| Monthly AI Spend | Est. Savings | NeuralOps Fee | Your Net Gain |
|---|---|---|---|
| $1,000 | ~$700 | $21 | $679 |
| $10,000 | ~$7,000 | $210 | $6,790 |
| $50,000 | ~$35,000 | $1,050 | $33,950 |
| $500,000 | ~$350,000 | $10,500 | $339,500 |
Minimum fee: $99/month. If NeuralOps doesn't save you money โ you don't pay. Simple.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND (React 19 + Vite) โ
โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ Dashboard โ โBattle Mode โ โ Traffic โ โ Insights โ โ
โ โ(Recharts) โ โ โ โ Map โ โ โ โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ROI Calculator โ โ Self-Healing Panel โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REST + WebSocket
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND (FastAPI) โ
โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Classifier โโ โ Router โโ โ Cost Tracker โ โ
โ โ(Llama 8B) โ โ Engine โ โ โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โRule-Based โ โSelf-Heal โ โ SQLite (aiosqlite) โ โ
โ โ Fallback โ โ Manager โ โ โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโผโโโโโโโ โโโโโโโผโโโโโโโ โโโโโโโโโโผโโโโโโโโ
โ Llama 3.1 โ โ Llama 3.3 โ โ Qwen 3 32B โ
โ 8B โ โ 70B โ โ โ
โ Economy โ โ Standard โ โ Premium โ
โ $0.06/1M โ โ $0.59/1M โ โ $0.90/1M โ
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
Groq API
1. POST /route โ User prompt arrives
2. Classifier โ SIMPLE / MEDIUM / COMPLEX + confidence score
3. Router โ Selects cheapest healthy model
4. Groq API โ Model generates response
5. Cost Tracker โ Calculates actual cost vs premium baseline
6. SQLite โ Request saved to database
7. WebSocket โ Dashboard updated in real time
8. Response โ Returned with full metadata
| Complexity | Primary | Fallback 1 | Fallback 2 |
|---|---|---|---|
| SIMPLE | Economy | Standard | Premium |
| MEDIUM | Standard | Premium | Economy |
| COMPLEX | Premium | Standard | Economy |
If all models fail โ honest 503 with retry guidance.
start = time.time()
response = await call_model(prompt, model)
latency_ms = (time.time() - start) * 1000tokens = response.usage.prompt_tokens + response.usage.completion_tokens
PRICE_PER_TOKEN = {
"llama-3.1-8b-instant": 0.06 / 1_000_000,
"llama-3.3-70b-versatile": 0.59 / 1_000_000,
"qwen/qwen3-32b": 0.90 / 1_000_000,
}
actual_cost = tokens * PRICE_PER_TOKEN[model_used]
baseline_cost = tokens * PRICE_PER_TOKEN["qwen/qwen3-32b"]
savings = baseline_cost - actual_cost
savings_pct = (savings / baseline_cost) * 100Complexity-aware composite score:
| Complexity | Cost Weight | Speed Weight | Quality Weight |
|---|---|---|---|
| SIMPLE | 80% | 15% | 5% |
| MEDIUM | 15% | 15% | 70% |
| COMPLEX | 5% | 5% | 90% |
Route a prompt to the optimal model.
Request:
{ "text": "Your prompt here" }Response:
{
"request_id": "uuid",
"response": "Model response text",
"model_used": "Llama 3.1 8B",
"tier": "economy",
"complexity": "SIMPLE",
"confidence": 0.97,
"routing_reason": "Single factual question",
"latency_ms": 147,
"input_tokens": 12,
"output_tokens": 18,
"actual_cost": 0.0000018,
"cost_without_neuralops": 0.000027,
"savings": 0.0000252,
"savings_percentage": 93.3,
"is_fallback": false
}Run the same prompt against all 3 models simultaneously.
Aggregate statistics โ total requests, savings, model distribution.
Paginated request history.
Current health state of all model tiers.
Toggle a model's health state (simulate outage/recovery).
{ "model_key": "llama-3.1-8b-instant", "healthy": false }Real-time event stream:
new_requestโ fired on every routed requeststats_updateโ updated aggregate statshealth_changeโ model health state changes
NeuralOps/
โโโ backend/
โ โโโ main.py # FastAPI app, routes, WebSocket
โ โโโ classifier.py # LLM + rule-based prompt classifier
โ โโโ router.py # Routing table, health, fallback chains
โ โโโ model_client.py # Async Groq API calls
โ โโโ cost_tracker.py # Token cost math + savings calculation
โ โโโ database.py # SQLite schema + async CRUD
โ โโโ models.py # Pydantic schemas
โ โโโ seed_data.py # Demo data generator
โ โโโ .env.example # Environment template
โ
โโโ frontend/
โ โโโ src/
โ โ โโโ App.jsx # Navigation + WebSocket state
โ โ โโโ components/
โ โ โโโ Dashboard.jsx # Live stats + request feed
โ โ โโโ BattleMode.jsx # Model comparison arena
โ โ โโโ TrafficFlow.jsx # Live traffic visualization
โ โ โโโ Insights.jsx # Analytics + intelligence report
โ โ โโโ ROICalculator.jsx # Savings projection tool
โ โ โโโ SelfHealingPanel.jsx # Health toggle simulation
โ โ โโโ ui/ # Card, Badge, shared primitives
โ โโโ index.html
โ โโโ vite.config.js
โ
โโโ README.md
- PostgreSQL (replace SQLite)
- API key auth per user
- OpenAI + Anthropic model support
- Rate limiting + usage quotas
- Deploy to Railway + Vercel
-
npm install neuralops-sdk - Python SDK (
pip install neuralops) - 2-line integration for any app
- Webhook support
- Multi-tenant dashboard
- Custom routing rules per customer
- Quality guarantee SLAs
- Usage-based billing (3% of savings, min $99/month)
- On-premise deployment (Ollama)
Contributions are welcome! Please open an issue first to discuss what you'd like to change.
# Fork the repo
# Create your branch
git checkout -b feature/your-feature
# Make changes + commit
git commit -m "feat: your feature description"
# Push and open a PR
git push origin feature/your-featureMIT License โ see LICENSE for details.
Built with โก by NeuralOps Team
If this saved you money โ give it a โญ



