Multi-SSD Inference Appliance for Personal AI Labs

C6/10March 22, 2026
WhatA purpose-built hardware+software appliance that stripes MoE model weights across multiple NVMe SSDs (or Intel Optane) to achieve 30-50 tokens/second on giant models without expensive GPU memory.
SignalMultiple commenters are independently converging on the idea that parallelizing across SSDs or using cheaper persistent memory like Optane could dramatically improve throughput, and people with modest GPU setups like dual 3090s want a practical path to big-model inference.
Why NowNVMe SSD bandwidth has hit 15GB/s on consumer hardware, Optane is available at $1/GB on secondary markets, and open MoE models have crossed the 400B parameter threshold — the cost/performance crossover for SSD-based inference is happening right now.
MarketAI researchers, personal labs, small companies wanting local inference; hundreds of thousands of potential buyers at $500-2000 price points; no one sells a turnkey SSD-striping inference solution today.
MoatHardware-software co-design with validated configurations and optimized firmware creates switching costs and a testing/certification moat that pure software plays lack.
Flash-MoE: Running a 397B Parameter Model on a Laptop View discussion ↗ · Article ↗ · 365 pts · March 22, 2026

More ideas from March 22, 2026

SSD-Optimized Local LLM Inference EngineP7/10A commercial inference runtime that lets developers and power users run 300B+ parameter models on consumer hardware by streaming sparse MoE weights from SSD through optimized GPU compute pipelines.
Mobile GPU LLM Inference OptimizerC5/10An inference SDK that brings MoE expert-streaming techniques to mobile GPUs (Adreno, Mali, Apple A-series), enabling usable on-device inference of large models on phones and tablets.
SSD Wear-Aware AI Workload ManagerC5/10A system utility that monitors and intelligently manages SSD wear from AI inference workloads, implementing caching strategies, wear leveling across drives, and lifetime predictions specific to LLM usage patterns.
Offline-First Personal Knowledge Server with Local AIP5/10A plug-and-play appliance that packages curated knowledge bases (Wikipedia, maps, tutorials, medical references) with a local LLM for natural-language querying, designed to work entirely without internet.
Turnkey Offline Knowledge Kit for Old DevicesC5/10A lightweight app that packages Wikipedia, OpenStreetMap, survival guides, and tutorial videos into a single installable bundle optimized for old Android tablets and low-end hardware.
CRDT-Native Version Control System for AI-Heavy TeamsP6/10A developer-friendly version control system built on CRDT fundamentals that handles concurrent edits from both humans and AI agents without blocking merges.