Skip to content

Release 6: Agentic Deploy Pipeline and Module Health Guard

Status: Complete | Sprints: 8 | GitHub Release tag: v1.6.0

Release 6 builds the development infrastructure that makes AI-collaborative development practical at scale: a reproducible uv-based toolchain, a fast PC inner loop, compact artifacts, USB-hub parallel flash, and an MCP server so Claude can drive the full build/flash/test cycle from within a conversation. The second half adds runtime safety and observability: a proactive heap guard, graceful allocation refusal, and per-module health dots in the UI.


Release Overview

Area Highlights
Toolchain pyproject.toml + uv.lock; pre-commit (clang-format + ruff); core_only CMake PAL boundary check
Serial hygiene TIMING gated on 20% change or 60 s silence; MemLive edge detector; GET /api/log ring buffer
Fast inner loop all_pc.py (build + unittest + PC livetest, ~73 s); all_devices.py (flash + ESP32 livetest)
Compact artifacts live-results-*.json 85% smaller; live-results-all.json merged across devices
Status docs docs/status/index.md (per-device table); live-results.md cross-device matrix
USB hub scaling "group" field; parallel flash + test (--workers N); device-type split in live-results.md
MCP server deploy/mcp_server.py with 8 tools; .mcp.json; Claude drives build/flash/test without leaving the chat
Memory guard pal::check_alloc(); disableSelf()/setupOk_; green/red dot with health tooltip in UI

Sprint 1: Foundation Tooling

Goal: reproducible Python environment and compile-enforced PAL boundary so every subsequent sprint inherits clean tooling.

  • pyproject.toml + uv.lock: pins all transitive deps; uv sync --extra dev is the complete setup step.
  • .pre-commit-config.yaml: clang-format (C++) + ruff (Python); uv run pre-commit install wires into git hooks.
  • core_only CMake OBJECT target: compiles 6 PAL-free core headers with error-triggering stubs; PAL violations become compile errors.
  • CI lint job updated to uv sync + uv run pre-commit run --all-files.
  • timeMicros() removed from Timing.h; 9 callers migrated to pal::micros().
  • PAL discovery: StatefulModule.h had hidden pal:: calls via transitive include; explicit #include "pal/Pal.h" added to surface the dependency. Scheduler.h/cpp, ModuleManager.cpp, PhysMap.h backlogged violations documented.
Metric Value
Tests 357/357 (unchanged)
core_only PAL-clean headers 6 (Module.h, Timing.h, Logger.h, KvStore.h, Coord3D.h, BuildInfo.h)
uv.lock 11 packages pinned

Sprint 2: Serial Output Discipline

Goal: remove periodic serial noise; log state changes, not heartbeats.

  • Scheduler.cpp: TIMING logged only when totalMsPerTick changes by more than 20% or after 60 s silence (lastTimingTotalMs_ / lastTimingLogUs_ fields).
  • MemLive edge detector: memWarnActive_ bool; logs on rising edge (WARNING) and falling edge (OK) only.
  • printNetworkInfo() change-gated via lastNetReport_ cache.
  • GET /api/log: 64-entry ring buffer (8 KB static); logPush() feeds both ring and WebSocket; logClearRing() for test isolation.
  • Ring reduced from planned 256 to 64 entries: 32 KB would take ~50% of classic ESP32 free heap.
Metric Value
Tests 361/361 (+4 ring buffer cases in test_logger.cpp)
Static RAM cost +8192 B (ring) + ~104 B (4 Scheduler fields)

Sprint 3: Fast Inner Loop

Goal: split all.py so the PC development loop completes without touching hardware.

  • deploy/all_pc.py: build.py -target pc + unittest.py + livetest.py -type pc; ~73 s warm build.
  • deploy/all_devices.py: build esp32 envs + flash + optional flashfs + mem capture + livetest esp32 + summarise.
  • deploy/all.py: thin delegation wrapper; --flashfs forwarded.
  • deploy/_lib.py: run_step() and wait_for_esp32s() shared helpers.
  • py = ["uv", "run"] in all three scripts: project venv always active, CI and local invocations byte-for-byte identical. MCP server (Sprint 7) inherits the same isolation for free.
  • CI PC job: 4 manual steps replaced by uv run deploy/all_pc.py.
Metric Value
Tests 361/361 (unchanged)
all_pc.py wall clock 73 s (3/3 steps passed)

Sprint 4: Compact Log Files

Goal: reduce artifact footprint so deploy/test/ stays readable as device count grows.

  • deploy/live_suite.py R.to_dict(): assertions key omitted for all-pass tests; only failures included when a test fails.
  • deploy/livetest.py merge step: _merge_results() writes deploy/live/live-results-all.json (array of device entries, "current": true/false); copies to live-results-all-last-good.json on all-pass.
  • deploy/summarise.py: reads live-results-all.json; backward-compatible fallback to per-device files.
Metric Value
Tests 361/361 (unchanged)
live-results-pc.json 810 lines / 21 KB before; 119 lines / 3 KB after (85% reduction)

Sprint 5: Consolidated Status Docs

Goal: single cross-device test matrix that stays readable at 16 devices.

  • docs/status/index.md: one row per device, columns for unit tests, live tests, heap_free, fps; 12 lines at 3 devices.
  • docs/status/live-results.md: all devices in one matrix (rows = tests, columns = devices); device-type groups.
  • deploy-summary.md: compressed from 132 to 10 lines (header block + pipeline table only).
  • test-results.md: unchanged — full per-file test list preserved for discoverability and module doc anchor links.
  • heap_free_kb and fps added to device_info in live_suite.py; stored in live-results-all.json.
  • Also fixed: two ESP32 CI regressions from Sprint 3's lib_ldf_mode change: switched to lib_compat_mode = soft + explicit lib_ignore = RPAsyncTCP, ESPAsyncTCP; esp32_footprint.py graceful return on missing size lines.
Metric Value
Tests 361/361 (unchanged)
docs/status/index.md 12 lines for 3 devices (target < 80)

Sprint 6: USB Hub Scaling

Goal: parallel flash and test across N devices; group filter for targeting a subset.

  • devicelist.json: "group" field added to all test:true devices.
  • select() bug fixed: default_test_true and filters now ANDed independently (was silently OR-ing).
  • flash.py: --workers N (default 4); _flash_one(d, esptool) -> bool; ThreadPoolExecutor replaces sequential loop.
  • livetest.py: --workers N (default 4); _run_esp32_test(d) -> bool; ThreadPoolExecutor for both ESP32 paths.
  • all_devices.py: --group <name> + --workers N; esp32_envs derived from filtered device set (build and flash always in sync).
  • all_pc.py: summarise.py added as step 4.
  • summarise.py: live-results.md split by device type; _short_chip() strips Rev suffix.
Metric Value
Tests 361/361 (unchanged)

Sprint 7: MCP Server

Goal: wrap the stable script set as MCP tools so an AI agent can trigger builds, flash firmware, and read test results without leaving the conversation.

  • pyproject.toml: mcp = ["mcp>=1.0"] optional dependency; uv run --extra mcp handles install automatically.
  • deploy/mcp_server.py: FastMCP server; 8 tools (run_all_pc, run_all_devices, run_build, run_flash, run_livetest, run_summarise, read_status, list_devices). Each spawns uv run deploy/<script>.py and returns combined stdout+stderr; non-zero exit appends [exit N].
  • .mcp.json: project-level config; Claude Code picks it up without manual setup.
  • Tool schema derived automatically from type hints and docstrings by FastMCP.
  • --merge-ports deferred: device-management utility, not a pipeline step for an AI agent.
Metric Value
Tests 361/361 (unchanged)
Tools 8 tools; list_devices smoke test passes

What this enables: the full development loop (propose change, implement, build, flash two devices in parallel, run live test suite, read results, diagnose failures, iterate) runs without leaving the chat window.


Sprint 8: Memory Guard

Goal: prevent a large module allocation from silently exhausting the heap and leaving the HTTP server unreachable.

  • pal::check_alloc(bytes, reserve_bytes) -> bool: returns false if free_heap_bytes() - bytes < reserve_bytes; PAL_HEAP_RESERVE_BYTES = 90 KB (calibrated from live device failure: covers server.begin ~28 KB + WiFi buffers ~24 KB + headroom + fragmentation dead space); always true on PC.
  • StatefulModule: setupOk_ bool + disableSelf() helper + setupOk() accessor; runSetup() resets to true; runLoop() skips loop() when !setupOk_.
  • EffectsLayer::allocate_(): guard before psram_malloc; with fallback buffer keeps previous size silently; disableSelf() only when no fallback exists.
  • DriverLayer::allocate_(): allocate-before-free; null check; refused alloc keeps previous buffer.
  • DriverLayer::onChildrenReady(): proactive check_alloc on new geometry; on failure marks the most recently added layout child disableSelf() (correct red dot attribution: layout that caused the problem shows red, not EffectsLayer).
  • ArtNetOutModule: disableSelf() + early return on null pkt_.
  • GET /api/modules: "health" and "setup_ok" fields added to every module entry.
  • Frontend: green/red dot (&#9679;) next to each module name; tooltip shows healthReport() string — first place healthReport() output is visible in the UI.
  • Unified [MemBoot]/[MemLive] log format: DeltaKB = FreeKB (frag=X%, largest=YKB) across all emit sites.
  • uptime_s control uses "time" uiType; fmtTime() helper in app.js formats as Nd Nh Nm Ns.
Metric Value
Tests 364/364 (+3: disableSelf, setupOk reset, DriverLayer refusal)
Reserve calibration 90 KB: 60 KB proved insufficient on MM-C1BC (server.begin costs 28 KB after guard runs)

Post-sprint esp32dev (MM-C1BC, no PSRAM) memory floor:

Stage Free Largest Frag
After module setup ~235 KB 108 KB 55%
After server.begin ~149 KB 80 KB 47%
Live steady-state 109-126 KB 60-72 KB 41-48%

Live floor of ~109 KB sits only ~19 KB above the 90 KB reserve — enough to run but no room for additional module allocations. Dual check_alloc guard (total free + largest block) and WiFi buffer tuning tracked in Backlog.

Retrospective: Release 6 complete

What was proven:

  • uv run + uv.lock makes every script invocation reproducible across contributor machines, CI, and MCP tool calls.
  • The MCP server closes the agentic feedback loop: Claude can drive build, flash, test, and diagnosis without terminal context switches.
  • Pre-checking heap with check_alloc before a large psram_malloc prevents the half-initialised module state that previously left HTTP unreachable.
  • Red dot attribution (layout child, not EffectsLayer) required live device debugging — the disableSelf() placement reflects operator intuition, not code topology.

Watch points going into Release 7:

  • 8 KB log ring buffer is a meaningful static cost on non-PSRAM esp32dev — consider halving to 4 KB.
  • StatefulModule.h PAL violations (backlogged in Sprint 1) block full core_only coverage.
  • MCP tools collect output at the end of a long flash run — streaming progress (ctx.report_progress()) is the next usability improvement.
  • OTA flash (no USB cable) not yet implemented; flash.py still requires a serial port.