Feb 2025 6 min read

Running a Local LLM on an Old Phone

Turning retired hardware into a high-performance AI edge server with local inference, automated reporting, and real-time monitoring — for free.

The Hidden Power in Your Drawer

Most old smartphones represent a missed opportunity. Despite their cracked screens or dated cameras, they remain incredible pieces of engineering — ARM64 architectures, highly efficient cores, built-in battery redundancy, and integrated wireless connectivity.

My Huawei nova 7i was collecting dust. By repurposing it as an always-on AI server, I managed to eliminate cloud dependency for basic inference tasks while breathing new life into "e-waste." No cloud bills, no privacy concerns, just a pure local ecosystem.

Zero Latency/Cost

No API fees or network delays from external providers.

Total Privacy

Your data stays on-device, encrypted and local.

Sustainable Dev

Repurposing hardware reduces environmental impact.

Full Linux Stack

Native Linux performance via Termux & ARM64.

Architecture Flow

A multi-layered pipeline running entirely on the Kirin 810 SoC.

Edge Node
Online
TermuxLinux Environment
llama-serverQwen 2.5 1.5B
PicoClawAI Gateway
Telegram Bot
Dashboard
Cloudflare

Technical Implementation

Phase 01

The Linux Foundation

Deploying a Linux ecosystem on Android begins with Termux. I opted for the F-Droid distribution to ensure full access to the package repository. Integrating Termux:API was critical — it enables low-level hardware access, allowing my scripts to monitor thermal throttling and power states in real-time.

terminal — termux
pkg update && pkg upgrade
pkg install termux-api nodejs git cmake
Phase 02

Inference Engine: llama.cpp

The choice of inference engine is vital. llama.cpp provides native C++ performance on ARM64. I compiled it with 4-core parallelization optimized for the Kirin 810 architecture. For the model, Qwen 2.5 (1.5B) at Q4_K_M quantization offers the best balance between token-per-second (TPS) and contextual intelligence.

terminal — termux
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build
cmake --build build --config Release -j4
Phase 03

System Orchestration

Automation is handled via standard Linux cron jobs. The Node.js worker triggers PicoClaw at scheduled intervals, processes the model output, and pushes rich Markdown reports to my Telegram bot. This allows for scheduled news aggregation and personal task management without manual intervention.

terminal — termux
0 */6 * * * node ~/scripts/generate-intel-report.js
Intelligence reports delivered via Telegram API
Figure 1.0Scheduled intelligence reports delivered via Telegram API.
Real-time performance monitor
Figure 2.0Real-time Node.js + Alpine.js monitoring dashboard.

The Future of Edge AI

Running local LLMs on repurposed mobile hardware isn't just a fun experiment — it's a blueprint for the future of distributed computing. As models become more efficient and NPU (Neural Processing Unit) access becomes more standardized in Linux-on-Android environments, the "phone in your drawer" might just become the most valuable server in your stack.

Zero cloud cost, zero data leakage, and zero recurring fees. The intelligence is now local.