← All case studies

AI Integration

Nova — Personal AI Assistant

A self-hosted AI assistant with 56 integrated tools, mobile + voice access, and a routing layer that hands 90% of queries to a local model — keeping cloud spend at lunch-money levels.

Duration

8 months in production

Cloud spend

~$50/mo cloud spend (was $500+/mo on equivalent SaaS)

Stack

Node.js · React Native · Ollama (local LLM) · Claude API · WebSocket · Docker

The Problem

Off-the-shelf AI assistants (ChatGPT, Claude, Gemini, etc.) all share three frustrations: they don't know your stack, your data leaves your network, and the bill scales with usage. For someone running a home automation system, a media server, a trading bot, four niche websites, and 48 mobile apps, the per-token costs alone would have been $500+/month. The privacy story would have been worse — every config file, every email draft, every scratchpad would touch a cloud provider.

The Solution

Nova is a custom AI assistant that runs as a containerized service on a Dell R7525 with 3× NVIDIA Tesla V100 GPUs (96 GB VRAM total). It exposes a WebSocket API consumed by a React Native mobile app and a web dashboard. The architectural bet is three-tier model routing:

  1. Tier 1 — Regex & deterministic dispatch. Commands like "lights off" or "what's the cache fill?" hit a router that maps directly to a tool call. Latency: <50ms. Cost: zero.
  2. Tier 2 — Local LLM (Ollama, qwen3-coder-next, 80B MoE). Anything ambiguous goes to the local model with the full toolset attached. Latency: 1-3s. Cost: zero (already paid for the GPUs).
  3. Tier 3 — Cloud API (Claude / GPT). Only when the local model self-reports low confidence or the task requires the very best reasoning. Latency: 2-5s. Cost: ~$0.02-0.20 per call.

The router measures which tier handled the query and writes a daily breakdown. In practice, ~90% of traffic stops at Tier 1 or 2 — leaving Tier 3 to handle the truly hard requests where paying for Claude actually moves the needle.

What Got Integrated

56 tools across categories:

  • Home & infrastructure — Home Assistant control, Docker container management, Unraid status, server health.
  • Personal data — Calendar, email triage, password vault read, contact lookup, document search across Nextcloud.
  • Build & dev — App build status, deploy triggers, Gitea repo browsing, log tailing.
  • Finance & research — Stock quotes, portfolio diffing, SEC filing pulls, scheduled cron monitors with push alerts.
  • Communication — Push notifications to phone, email composition with attachments, SMS via Twilio.

Results

  • Cloud cost: ~$50/month instead of an estimated $500+/month with equivalent cloud-only setup.
  • Latency: Sub-second response for ~70% of queries (Tier 1 + warm Tier 2).
  • Privacy: Personal data never leaves the LAN unless Tier 3 is invoked, and Tier 3 calls are logged and reviewable.
  • Reliability: 99.4% uptime over the last 6 months, including two server reboots and one GPU driver upgrade.
  • Replaced: ChatGPT Plus, Google Assistant, three separate dashboard apps, a $20/mo notification service, manual SSH-into-server workflows.

What Surprised Us

Tool definitions matter more than prompts. The single biggest accuracy gain came from rewriting tool descriptions — not refining the system prompt. A tool description that says "use this when the user mentions X" pulls correct invocations dramatically better than vague capability lists.

WebSocket beat polling and beat raw HTTP for mobile. A persistent WebSocket connection lets the assistant push progress updates ("checking your inbox… found 3 new") instead of going silent for 5 seconds while the model thinks. UX delta is large; engineering effort was small.

Local models are good enough for 90% of personal-assistant work. The hard part isn't model capability — it's tool routing, prompt engineering, and keeping the conversation context tight. Throwing GPT-4o at a problem that qwen3-coder handles in 800ms locally is bad engineering.

What We'd Build For You

Most companies don't need a 96GB GPU — but they almost certainly have:

  • A bunch of internal SaaS subscriptions whose data could be queried by an LLM if it had API access.
  • A handful of repetitive workflows (status reports, ticket triage, customer FAQ) that an assistant could front-line.
  • A privacy or compliance constraint that makes "send everything to OpenAI" a non-starter.

For ~$5K-$15K we can build a stripped-down Nova for your stack: 5-15 tools, your authentication, your model preference (cloud, local, or hybrid), deployed to your infrastructure, with documentation for your team to extend.

Interested?

See the services page for engagement options, or email charles@griswoldlabs.com with a one-paragraph description of your use case.