whisper-kb
Zero-drop dictation TUI for whisper.cpp
High-performance terminal interface for hands-free dictation with guaranteed character accuracy
A terminal user interface for whisper.cpp dictation with one non-negotiable requirement: zero character drops. Every word you speak appears exactly as transcribed, at typing speed.
Problem
Voice dictation tools exist, but they’re either:
- Cloud-dependent (privacy, latency)
- GUI-only (doesn’t integrate with terminal workflows)
- Unreliable (dropped characters, garbled output)
For hands-free coding and writing, you need local processing, terminal integration, and absolute reliability.
Architecture
Service-oriented async TUI with strict separation:
┌─────────────────────────────────────────────────────────┐
│ Textual TUI Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Output │ │ Queue │ │ Status Bar │ │
│ │ History │ │ Panel │ │ + Levels │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Service Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Audio │ │ Transcribe │ │ Type │ │
│ │ Capture │ │ Queue │ │ Simulator │ │
│ │ (GStreamer) │ │ │ │ (xdotool) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Legacy Bridge Layer │
│ (Proven client_lite scripts preserved) │
└─────────────────────────────────────────────────────────┘
Key Pattern: Integration via Preservation
The typing system uses battle-tested legacy scripts rather than reimplementing. New TUI wraps proven code.
Performance Requirements
| Metric | Target | Rationale |
|---|---|---|
| Character drop rate | 0% | Non-negotiable for trust |
| Typing speed | 15ms/char | Feels natural, doesn’t trigger rate limits |
| Error recovery | 90% | Failed transcriptions recoverable via re-record |
| Workflow interruption | <5% | Down from ~30% with basic client |
Constitutional Development
The project enforces design principles programmatically:
constitution.md defines:
- Non-blocking I/O everywhere
- State-machine-first design
- Explicit error boundaries
- Performance thresholds
tests/constitutional/ validates every change against the constitution. Not guidelines—automated gates.
# Example constitutional test
def test_no_blocking_io_in_handlers():
"""All event handlers must be async"""
for handler in get_all_handlers():
assert inspect.iscoroutinefunction(handler)
Current Status
Why This Exists
Hands-free input isn’t a nice-to-have—it’s accessibility. But accessibility tools need to be rock-solid. This project proves you can have local, private, terminal-native dictation that actually works. The constitutional development pattern ensures it stays working.