Skip to content

4 min read

airgap: a customer-support bot that runs on the device

  • ai
  • on-device
  • react-native
  • case-study

airgap is a React Native framework for shipping a customer-support bot that runs on the device. Gemma 4 E2B as the default model, around 2.4 GB at Q3_K_S, under 1.5 GB of runtime RAM, 128K context. No cloud calls. Seven industry templates ship in the repo: telecom, banking, healthcare, airline, insurance, electric utility, water utility. Pick one, edit one JSON file, deploy.

The case for on-device

The default architecture for an LLM-driven support agent is server-side. An API behind your BFF that proxies to OpenAI or Anthropic. That works, scales, and lets you swap models. It also has the obvious cost. Every conversation goes off the device, you pay per token, and you are the data controller for whatever the user typed in the chat.

For regulated industries (banking, healthcare, utilities, insurance) that data-controller status is consequential. You are responsible for the contents of every chat for as long as you keep it. You are responsible for the latency budget when the user is in low signal or roaming. You are also the one who has to explain, after the next Moffatt v. Air Canada style precedent, why your bot promised something the policy does not actually offer.

Running the model on the device flips most of those defaults. The prompt and the response never leave the phone. The latency is consistent because there is no network. The KB is bundled and versioned, so there is a verifiable source for every grounded answer. The token cost is zero.

The cost is that the model is smaller and the install is large. ~2.4 GB is a real number on a phone budget. The hardware floor is real: Gemma 4 E2B at Q3_K_S runs on most current Android devices and on physical iPhones; iOS simulators do not have a working Metal path through llama.rn, which is why the iOS demo runs in demo mode on the simulator and only switches to real inference on a physical device.

What shipping one looks like

A fresh clone of the repo runs in demo mode by default. No model download, no cloud calls. The chat formats top-K KB hits as a streamed reply with simulated thinking pauses, so a fresh emulator launch reaches the chat screen in under five minutes. That matters because most evaluation of an on-device chatbot fails at “the hardware is too slow to download the model and run it before I lose patience.” Demo mode lets a stakeholder see the UX before committing to the model download.

To switch to real on-device inference, flip llm.mode from demo to prefer-offline in airgap.config.json, run ./scripts/pull-dev-model.sh (or trigger the in-app onboarding download), and the same chat is now running against Gemma on the device.

Branding is a separate step. ./scripts/setup.sh is an interactive wizard that handles company name, colors, hotline, and native package id, separately for Android and iOS, with no manual Gradle and Xcode editing.

Trust signals

Every grounded answer surfaces up to three citation chips below the bubble: category > title. Tap a chip and the source drawer opens to the full KB doc that grounded the answer. This is the single most useful UI affordance the project ships. Customers do not need to trust the model. They need to verify the answer against the source the model used. Citation chips make verification one tap away.

The KBs in the seven templates are real. Between 40 and 118 documents each, with real tool definitions and real safety policies for the vertical. The safety layer is deterministic, not prompted. A policy file says “the bot must not promise X without Y” and the response is rejected before it streams. Prompt-engineered safety is fragile. A policy file is a policy file.

State

204 Jest tests passing, 236 journey tests passing, CI green on Android API 36 and iOS 26+. Active.

Repo: github.com/xmpuspus/airgap. Live showcase: xmpuspus.github.io/airgap.