Release 8: Dynamic Controls and UI Adaptability¶

Theme: Release 8 adds dynamic control schemas (the ability to rebuild a module's control set at runtime based on the current value of other controls) so the UI can show only the parameters that are relevant for the active configuration. Later sprints extend the release into deploy-pipeline health: a code analysis monitor, runtime heap visibility, and a structured overview of the full deploy architecture.

Release Overview¶

What was delivered in Release 7 (build on this)¶

Strength	Notes
OTA firmware update	`FirmwareUpdateModule`: file upload + GitHub releases tab; `POST /api/firmware`
CI release pipeline	Tagged releases + nightly pre-release with firmware assets on GitHub
Windows support	Native `.exe` build; `projectMM-pc-windows.zip` in CI artifacts
Scenario baselines	Hardware `--update-baseline` run; `"extends"` inheritance; wired into `all.py`
Static RAM hardening	Per-device `LOG_RING_SIZE`; WiFi buffer tuning; dual `check_alloc` guard
Log frontend panel	WS push of ring buffer entries; collapsible log UI

What Release 8 addresses¶

Problem	Sprint
Control schema is fixed at `setup()` time; irrelevant parameters always visible regardless of selected type	Sprint 1 (Dynamic controls: `clearControls()`, `rebuildControls()`, early WS flush), complete
Static RAM column in techdebt monitor always shows 0 (parser bug); no accounting of what consumes the 51 KB ESP32 RAM; Notable Findings have no action owners	Sprint 3 (RAM accounting, parser fix, actions table), complete
`classSize()` misses runtime heap (`controls_[]` array, `pendingProps_` doc); large `char[]` struct members inflate classSize; scanner blind to allocations in private helpers	Sprint 4 (baseHeapUsage, char[] audits, scanner improvements), complete
Deploy pipeline grew to 17+ scripts with no architecture overview; steps produced no status pages; `techdebt.py` name misleading; orchestrators monolithic	Sprints 5-10 (full log→md pipeline, orchestrator restructuring, naming cleanup), complete
No interactive way to trigger individual deploy scripts; MCP tools covered only orchestrators; no AI-assisted log analysis; deploy.md was CLI-first with no visual overview	Sprint 11 (browser deploy UI, `run_script`/`read_log` MCP tools, deploy.md overhaul), complete

Sprints¶

Sprint	Goal
Sprint 1	Dynamic controls: `clearControls()`, `rebuildControls()` virtual, early WS schema flush
Sprint 2	Technical-debt monitor: per-module metrics (LOC, function count, complexity, static RAM, heap/blocking violations) as a CI script
Sprint 3	RAM accounting balance, fix static RAM parser, Notable Findings actions, Logger ring buffer reduction
Sprint 4	`baseHeapUsage()` column, `char[]` to `std::array` audits, scanner improvements for private helpers
Sprint 5-10	Deploy pipeline consolidation: full log→md data flow, orchestrator restructuring, naming cleanup — complete
Sprint 11	Browser deploy UI, `run_script`/`read_log` MCP tools, `erase_flash.py`, deploy.md overhaul — complete

Sprint 1: Dynamic Controls¶

Scope: Allow a module to rebuild its control schema at runtime in response to a control value change. The primary use case: a type selector control switches between effect variants, and only the parameters relevant to the active type are shown. The control set is rebuilt without a full module restart.

Motivation¶

Today, addControl() is called once in setup() and the schema is fixed for the lifetime of the module. A module that supports multiple effect types must expose all parameters for all types simultaneously, cluttering the UI and confusing operators. The fix: make the schema a function of the control values, rebuilt on demand.

Design¶

clearControls(system = false)

Added to StatefulModule. Iterates the registered controls_[] descriptors and removes all entries that are not marked system. Before removing each descriptor, writes the current value of the backing variable back into the pendingProps_ stash (keyed by control name). This means a subsequent addControl(var, key, ...) call for the same key restores the last operator-set value automatically — values are preserved across rebuilds even when the control temporarily disappears.

System controls (enabled) are marked at registration time with a system flag in ControlDescriptor. clearControls() skips them unconditionally.

rebuildControls() virtual

New virtual method on StatefulModule; default implementation is a no-op (all existing modules continue to work unchanged). Modules that want dynamic controls override it:

void rebuildControls() override {
    clearControls();
    addControl(type_, "type", "select", {"Ripples", "Lines", "Sine"});
    if (type_ == EffectType::Ripples) {
        addControl(speed_,  "speed",  "slider", 0.1f, 10.0f);
        addControl(radius_, "radius", "slider", 1.0f, 50.0f);
    } else if (type_ == EffectType::Lines) {
        addControl(speed_,  "speed",  "slider", 0.1f, 10.0f);
        addControl(count_,  "count",  "slider", 1,    20);
    }
}

void setup() override {
    rebuildControls();   // replaces direct addControl() calls
}

void onUpdate(const char* key) override {
    if (strcmp(key, "type") == 0) rebuildControls();
}

Modules that do not need dynamic controls keep calling addControl() directly in setup() — no migration required.

Early WS schema flush

After rebuildControls() finishes, the UI must reflect the new schema immediately rather than waiting up to 1 s for the next periodic push. Implementation: clearControls() sets a schemaDirty_ flag on StatefulModule. The main loop checks schemaDirty_ across all modules and, if set, sends a {"t":"schema","modules":[...]} WS push using getModulesJson() (full schema including control types, options, min/max, and current values) and clears the flag. On a clean tick, the periodic 200 ms push uses getStateJson() (flat key/value state) as before. Natural debounce: a burst of rebuildControls() calls within one tick produces exactly one push.

A dedicated {"t":"schema"} message type is required because getStateJson() sends only flat {key:value} pairs; handleStateUpdate() in the frontend updates existing DOM elements but cannot add or remove controls. When rebuildControls() changes the control set, the frontend must call render() to rebuild the card from scratch.

State persistence interaction

saveState() and loadState() iterate the registered descriptors. After a rebuild, only the currently registered controls are persisted — parameters for inactive types are not written to the state file. On the next load, pendingProps_ carries any previously saved values; addControl() applies them if the key matches a registered control after rebuildControls() runs. A type control persisted in state is applied before rebuildControls() is called (via the existing addControl stash mechanism), so the correct variant's parameters are registered and restored on first boot.

Sprint 1 Scope Definition of Done¶

ControlDescriptor gains bool system field; StatefulModule::runSetup() sets it when registering enabled
clearControls() removes non-system descriptors; saves current values to pendingProps_ stash before removal
rebuildControls() virtual added to StatefulModule; default is no-op; existing modules compile and behave identically
schemaDirty_ flag set by clearControls(); main loop early-flush path clears it and sends a {"t":"schema","modules":[...]} WS push
Reference implementation: one new module (e.g. MultiEffectModule or adapted existing effect) demonstrating type selector + conditional parameters
Unit tests: rebuild preserves values of re-registered controls; rebuild discards values of removed controls; system controls survive clearControls(); schemaDirty_ triggers exactly one early flush per rebuild burst
Frontend: {"t":"schema"} handler added; calls render(msg.modules) to rebuild all cards from the full schema
All prior unit tests still green

Complexity estimate: Low-Medium (2/5). The stash mechanism already exists; clearControls() is a small loop; the early flush reuses the existing push path. The trickiest part is the state-persistence ordering (type value applied before rebuild runs).

Result¶

Metric	Value
Unit tests	399/399 pass (8 new tests added)
PC build	Clean (0 warnings)
ESP32dev build	Clean (0 warnings); BSS 16.3% (53 KB, down from 21.3% / 70 KB after static wsBuf removed)
ESP32s3 build	Clean (0 warnings)
Live tests (PC)	15/15 all passing
Live tests (MM-70BC)	15/15 all passing
Live tests (MM-C1BC)	12/15 (hardware capacity limits: 64x64 OOM, fps below 1000 on 16x16, 4-layer OOM on classic ESP32)

Definition of Done¶

ControlDescriptor gains bool system = false field; runSetup() sets it after registering enabled — done
clearControls() preserves system controls, saves non-system values to pendingProps_ stash, sets schemaDirty_ when controls are actually removed — done
rebuildControls() virtual added to StatefulModuleBase; default is no-op; all existing modules compile and behave identically — done
schemaDirty_ flag; ModuleManager::hasSchemaDirty() / clearSchemaDirty(); WS broadcast loop in main.cpp and AppSetup.cpp sends {"t":"schema","modules":[...]} on dirty tick, getStateJson() array on periodic tick — done
Reference implementation: SineEffectModule adapted with type selector (Sine / Ripples), rebuildControls(), and onUpdate("type") — done
Unit tests: rebuild preserves values of re-registered controls; rebuild does not affect unrelated fields; system controls survive clearControls(); schemaDirty_ set/cleared correctly; burst produces exactly one flag — done (7 new test cases)
Frontend: {"t":"schema"} message type handler added to app.js; calls render(msg.modules) to rebuild all cards — done
All prior unit tests still green — 399/399
Static wsBuf[16384] removed from AppSetup.cpp; both WS push branches now allocate on demand via heap_caps_malloc / heap_caps_free — done
pal::net_early_init() calls Network.begin() before scheduler.setup() to guarantee the TCP/IP stack is ready before any module opens sockets — done
DeviceDiscovery::setup() guards broadcastPresence_() behind sock_ >= 0; loop() retries udp_bind() when sock_ < 0 — done

Retrospective¶

What went well:

The pendingProps_ stash already existed and worked without modification — clearControls() just needed to write into it before removing each descriptor.
The runSetup() full-wipe / clearControls() mid-lifecycle split was clean once the two call sites were separated. Inlining the wipe in runSetup() was the right call.
Adapting SineEffectModule rather than writing a new module gave immediate test coverage for a real effect and kept the scope small.
The schemaDirty_ "only set when controls are actually removed" rule surfaced naturally from a failing test: first-call-from-setup had no prior controls, so the flag should not fire on initial build.

What was tricky:

The schemaDirty_ flag initially fired on the first rebuildControls() call from setup() (because clearControls() always set it). The fix — only set the flag when controlCount_ > kept — is semantically correct (no prior schema means no schema change) and made the test clean.
The kTypes / kWaveforms static constexpr arrays required the kTypeCount companion so addControl(uint8_t&, key, const char* const*, count) received a correct count without magic numbers.
hasSchemaDirty() and clearSchemaDirty() iterated owned_ without holding controlMutex_. On PC (multi-threaded HTTP server running at 400K+ fps), this created a data race with concurrent removeModule() calls that modify owned_ under the mutex. The server crashed intermittently mid-scenario after the WS client connected. Fix: add std::lock_guard<std::mutex> lk(controlMutex_) to both functions, matching the lock discipline used by getStateJson() and every other owned_ iterator.
The Design section claimed "no new WS message type is needed" — this was wrong. getStateJson() sends only flat {key:value} pairs; handleStateUpdate() in the frontend updates existing DOM elements by key lookup and cannot add or remove controls. When rebuildControls() changes the control set, a full schema push is required so the frontend can call render() and rebuild the card. The fix: a dedicated {"t":"schema","modules":[...]} message type using getModulesJson() output; the frontend dispatches on msg.t === "schema" and calls render(msg.modules).
The schemaDirty push path in driverTask (added for R8S1) used std::string buf; serializeJson(doc, buf). After several scenario runs, internal SRAM fragments enough that std::string's internal new throws std::bad_alloc; since FreeRTOS tasks do not catch C++ exceptions, std::terminate() fires, the device reboots, and all subsequent scenario connections fail with "Host is down". The free_heap_kb() > 16.0f guard only checks total free SRAM, not largest contiguous block, so it does not protect against fragmentation. Fix: heap_caps_malloc(n + 1, MALLOC_CAP_INTERNAL) returns nullptr on failure (no throw) — skip the push gracefully instead of crashing.
Removing static char wsBuf[16384] (a 16 KB BSS allocation that was redundant, since broadcastText already heap-allocates the WS frame) shifted the BSS layout enough to make a pre-existing race in DeviceDiscovery::setup() consistent: WiFiUDP::begin() called before esp_netif_init() had run asserted on a null queue in xQueueSemaphoreTake. Fix: pal::net_early_init() calls Network.begin() before scheduler.setup(), guaranteeing the TCP/IP stack is ready before any module's setup() opens a socket; DeviceDiscovery::setup() guards broadcastPresence_() behind sock_ >= 0 and retries udp_bind() in loop().

Seeds for Sprint 2:

RipplesEffectModule still exists as a standalone module — now that SineEffectModule embeds the same rendering, consider whether RipplesEffectModule should be retired or kept as an independent module for pipelines that want only ripples.
The clearControls() / rebuildControls() pattern is now proven. Other modules with mode-dependent parameters (e.g. layout type selectors) can adopt it when operators report UI clutter.
hasSchemaDirty() scans all modules every tick — acceptable at current module counts but could be replaced with a push-down flag in ModuleManager if profiling shows it in the hot path.
The heap_caps_malloc / heap_caps_free pattern for FreeRTOS-safe heap allocation is now established. Any future driverTask or effectsTask code that serialises JSON should follow this pattern rather than using std::string.

Sprint 2: Technical-Debt Monitor¶

Scope: Add a deploy/techdebt.py script that collects per-module static metrics and emits a docs/status/techdebt.md table. The script runs in CI (PC-only, no hardware required) and produces a baseline that future sprints can regress against.

Motivation¶

The codebase grows by adding modules. Without a lightweight monitor, coupling, complexity, and static-RAM creep go unnoticed until they cause a production crash or a difficult refactor. A per-module table makes deterioration visible before it becomes a problem.

Design¶

Metrics collected per module (.h + companion .cpp if present):

Metric	Source	Why
Lines of code (NLOC)	`lizard` Python API	Size proxy; outliers need splitting
Function count	`lizard` Python API	Too many functions signals God-class
Max cyclomatic complexity	`lizard` Python API	High complexity predicts bug density
Static RAM (BSS + data bytes)	`firmware.map` from ESP32 build	Direct measure; non-zero only when module has static members
Heap allocation sites in `setup()`	Python grep scan	Expected; informational; checked against teardown
Heap allocation sites in `loop()`	Python grep scan	Policy violation: allocations belong in `setup()`
Blocking calls in `loop()`	Python grep scan	`delay()`, `vTaskDelay()`, info-level `LOG_*`
Leak risk	Python brace-scan	Alloc in `setup()` with no matching free in `teardown()`
`classSize()` (instance bytes)	TypeRegistry test binary	True heap cost per module instance

Tools:

lizard (added to pyproject.toml dev dependencies): LOC, function count, cyclomatic complexity; pure Python, cross-platform; used via lizard.analyze_file() Python API (not CLI) to avoid version-dependent flag issues.
firmware.map from .pio/build/esp32dev/: parsed for BSS+data contributions per .cpp.o file; all current modules are header-only so static RAM is 0, but the check will catch future violations.
tests/test_techdebt.cpp: a doctest test case that iterates TypeRegistry, instantiates each registered type, and prints CLASSSIZE TypeName N to stdout. techdebt.py runs the test binary with -tc=techdebt* and parses the output. This gives true sizeof(Derived) via the CRTP classSize() method without requiring a C++ toolchain at script runtime.
Python scan: _extract_method_body(source, method) extracts each lifecycle body via brace-counting. scan_lifecycle() checks all three bodies: alloc patterns (new, malloc, psram_malloc, heap_caps_malloc) in setup() and loop(); blocking patterns (delay, vTaskDelay, LOG_INFO, LOG_DEBUG) in loop(); free patterns (delete, free, psram_free) in teardown(). Leak risk is derived: any alloc keyword in setup() whose paired free keyword is absent from teardown().

Output: docs/status/techdebt.md

Core Infrastructure section (on top) + one section per module category. Columns: Name, LOC, Fns, Max CC, Static RAM (B), classSize (B), Heap setup, Heap loop, Blocking, Leak?. RAG (green/amber/red) indicators on all numeric columns.

Thresholds (configurable at top of script):

MAX_LOC        = 400   # warn if a single module exceeds this
MAX_CC         = 25    # CI threshold; aspirational target is 10 (existing renderers reach 22)
MAX_STATIC_RAM = 512   # warn if BSS+data exceeds this (bytes)

Violations are emitted as > **WARNING** lines in the markdown and exit 1 so CI fails.

CI integration:

Added as a step in .github/workflows/ci.yml after all_pc.py (so the test binary exists). uv sync --extra dev runs first to install lizard. No hardware required.

Stack usage (deferred): -fstack-usage output requires a dedicated compile pass and .su file parsing. Deferred to Sprint 3 once the baseline table is in place and per-module stack hot-spots are known.

Definition of Done¶

lizard>=1.17 added to pyproject.toml [project.optional-dependencies] dev
tests/test_techdebt.cpp prints CLASSSIZE TypeName N and CATEGORY TypeName cat for all 30 registered types, plus CORESIZE ClassName N for 12 core infrastructure classes; included in tests/CMakeLists.txt
deploy/techdebt.py collects all metrics and writes docs/status/techdebt.md; lizard.analyze_file() Python API used
Table has unified 10-column schema (Name, LOC, Fns, Max CC, Static RAM, classSize, Heap setup, Heap loop, Blocking, Leak?) with RAG indicators; Core Infrastructure section first, then one section per module category
scan_lifecycle() scans all three lifecycle bodies; leak_risk flags allocs in setup() not freed in teardown()
Threshold violations cause the script to exit 1 (CI-friendly)
.github/workflows/ci.yml installs dev deps and runs techdebt.py after the PC build step
docs/status/techdebt.md committed as a baseline; no module exceeds any CI threshold
mkdocs.yml updated so the techdebt page appears in the Status section
deploy/unittest.py FILE_TITLES updated to include test_techdebt.cpp

Complexity estimate: Low (1/5). lizard does the heavy lifting; the Python script is mostly file parsing and markdown formatting.

Result¶

Metric	Value
Unit tests	401/401 pass (2 new test cases added)
PC build	Clean (0 warnings)
Modules in report	30 registered types + 19 core infrastructure files
Threshold violations	0 (baseline clean)
Heap-in-loop flagged	2 (GameOfLifeEffect and PreviewModule: conditional `psram_malloc` on geometry resize, intentional)
Heap-in-setup flagged	2 (GameOfLifeEffect: `psram_malloc`; ArtNetOutModule: `malloc`; both freed in teardown, Leak? empty)
Highest Max CC	22 (GameOfLifeEffect::loop)
Largest classSize	FileManagerModule: 2504 B

See docs/status/codeanalysis.md for the current table (renamed from techdebt.md in Sprint 5).

Retrospective¶

What went well:

The lizard Python API (lizard.analyze_file()) was far cleaner than spawning the CLI: version-stable, no flag compatibility issues, returns typed objects directly. Using result.nloc and result.function_list was straightforward.
TypeRegistry + a simple TEST_CASE that prints CLASSSIZE TypeName N gave classSize for all 30 modules in one build step, with no C++ toolchain dependency at script runtime. The CRTP classSize() method meant zero per-module work.
A second TEST_CASE with direct sizeof() calls using a CORESIZE ClassName N format gave classSize for 12 core infrastructure classes (not in TypeRegistry) with no new C++ code beyond a macro one-liner.
_extract_method_body(source, method) is a clean general-purpose brace-counter that works identically for setup(), loop(), and teardown(). Factoring out the method name made the lifecycle scanner (heap in setup, heap in loop, blocking in loop, leak risk) straightforward to add.
Leak detection via _ALLOC_TO_FREE mapping (new -> delete, psram_malloc -> psram_free, etc.) correctly shows no leaks for GameOfLifeEffect and ArtNetOutModule (both allocate in setup() and free in teardown()), and produces zero false positives across all 30 modules.
firmware.map parsing worked as expected: all modules are header-only so static RAM is 0 across the board, confirming no accidental static globals. The check is in place to catch future regressions.

What was tricky:

The original design called for lizard --json CLI and nm -S. In practice: lizard 1.22.1 does not support --json; the Python API is the correct interface. nm -S was replaced by firmware.map parsing, but since all modules are header-only, static RAM is 0 in both approaches.
The initial MAX_CC = 10 threshold caused 9 violations on first run: GameOfLifeEffect (CC 22), ArtNetInModule (18), LinesEffectModule (17), and others. These are legitimate rendering algorithms, not debt. Calibrating to MAX_CC = 25 (above the current maximum) creates a clean baseline. The aspirational target of 10 is documented separately.
Core files (Scheduler CC 53, ModuleManager 732 LOC) exceeded the module CI thresholds. Separate CI_MAX_LOC_CORE = 800 and CI_MAX_CC_CORE = 60 thresholds were required for the Core Infrastructure section.
Source file links in techdebt.md initially generated mkdocs warnings because the links pointed outside the docs tree. Fixed by using backtick code formatting instead.
test_techdebt.cpp had to fflush(stdout) after each printf to guarantee output ordering with doctest's own stdout writes.

Seeds for Sprint 3:

Stack usage monitoring: add -fstack-usage to the esp32dev PlatformIO build, parse the resulting .su files, and add a "max stack frame (B)" column to the techdebt table.
Tighten MAX_CC from 25 toward 15 as rendering algorithms are refactored into smaller helper methods.
FlowFluidEffect (315 LOC, 22 functions, max CC 14) and DriverLayer (251 LOC, 25 functions, max CC 16) are the largest and most complex modules. Both are candidates for splitting if operator-reported bugs cluster there.
Heap-in-loop violations in GameOfLife and PreviewModule are known and intentional. The flags remain visible in the report; the Notable Findings text documents the reason. Do not suppress — these are exactly what the monitor should track.
Heap-in-loop size formula (e.g. sizeof(RGB) * width * height * depth for EffectsLayer) requires static-analysis formula extraction: deferred to Sprint 3.

Sprint 3: RAM Accounting and Technical-Debt Actions¶

Scope: Fix the static RAM column in techdebt.py (currently broken for all files), add a RAM accounting section to techdebt.md, and define concrete actions for each Notable Finding. Secondary goal: reduce Logger ring buffer size where safe to do so.

Motivation¶

The ESP32 build reports 51,508 B static RAM used (15.7%). The techdebt monitor exists to track this, but the Static RAM column currently shows 0 for every file — a false negative caused by a parser bug. Without accurate numbers the column is meaningless. Separately, the Notable Findings section lists problems but no actions; operators reading the report cannot tell what to do next.

RAM accounting (what claims the 51 KB)¶

Analysis of .pio/build/esp32dev/firmware.map — .dram0.data + .dram0.bss sections:

Our source (src/):

File	.data (B)	.bss (B)	Total	Note
`src/core/Logger.cpp.o`	1	2060	2061	Ring buffer: 32 entries × 64 B = 2048 B
`src/core/Runtime.cpp.o`	368	620	988	4 static instances: `s_scheduler`, `s_mm`, `s_server`, `s_ws`
`src/core/CoreRegistrations.cpp.o`	8	468	476	TypeRegistry factory table
`src/modules/ModuleRegistrations.cpp.o`	0	260	260	Module factory table
`src/core/ModuleManager.cpp.o`	24	0	24	ArduinoJson allocator instance
`src/core/AppRoutes.cpp.o`	68	4	72	`g_otaStatus` (64 B struct)
`src/core/AppSetup.cpp.o`	8	12	20	`lastPsramFree`, `lastFree` locals
`src/core/TypeRegistry.cpp.o`	0	32	32	Registry singleton
Total our code	477	3456	3933

External libraries (~47,500 B, not directly reducible):

Origin	Approx. B	Can reduce?
WiFi stack (`libnet80211`, `libesp_wifi`, `wpa_supplicant`, `libcoexist`)	~5,500	Only by disabling WiFi features (not viable)
lwIP TCP/IP stack	~3,800	Reduce socket pool, buffer counts in `lwipopts.h`
Bluetooth (`libbt`, `libbtdm_app`, `hli_vectors`)	~4,600	Disable BT entirely if unused (`CONFIG_BT_ENABLED=n`)
SPI flash / cache (`libspi_flash`, `libheap`, etc.)	~6,500	Not reducible
libc / newlib (`libc_a-*`)	~1,700	Not reducible
All other ESP-IDF components	~25,000	Not reducible

Bottom line: 15.7% is healthy. Our own code contributes ~4 KB. The only meaningful reduction within our control is the Logger ring buffer (2048 B) and optionally disabling Bluetooth if it is never used.

Parser bug¶

_parse_map_for_o currently scans for .bss 0xaddr 0xsize lines. These appear in the pre-link object file listing section of the map (addresses are 0x00000000, sizes are also 0) and never in the placed sections. The placed allocations live in .dram0.bss and .dram0.data subsection blocks, where contributions look like:

                0x3ffc4530      0x800 .pio/build/esp32dev/src/core/Logger.cpp.o

Fix: scan within the dram0.data / dram0.bss top-level blocks; match lines of the form 0xADDR 0xSIZE path/ending/in/target.o.

Notable Findings — actions¶

Finding	Action
`FileManagerModule` classSize 2504 B	Audit fixed `char[]` buffers; replace with `std::array<char, N>` (bounds-safe, same layout) and right-size N; target < 800 B
`DeviceDiscoveryModule` classSize 1344 B	Same audit; peer-presence buffer is likely oversized; convert to `std::array`
`TasksModule` classSize 1288 B	Same audit; convert fixed `char[]` members to `std::array`
`GameOfLifeEffect` / `PreviewModule` heap in loop	Keep flags visible. Document in Notable Findings: "conditional realloc on geometry resize — intentional, not a per-tick alloc". Monitor for any new heap-in-loop additions.
Scheduler CC 53	Extract `_advanceRunnable()`, `_selectNext()`, `_expireTimeouts()` as private helpers; aim for no function > CC 15
ModuleManager 732 LOC	Split into `ModuleManager` (runtime: add/remove/wire) + `ModuleStore` (load/save JSON); share ownership via reference
Logger ring buffer 2048 B BSS	Reduce `LOG_RING_ENTRY` from 64 to 48 bytes (saves 512 B); or reduce `LOG_RING_CAP` from 32 to 20 (saves 768 B) — verify nothing truncates in practice

Design¶

Fixes to techdebt.py:

Replace _parse_map_for_o with a two-pass parser: first pass identifies the address range of each dram0.data / dram0.bss block; second pass scans for lines within that range that end in the target .o filename and sums the 0xSIZE values.
Add a ## RAM Accounting section to the generated techdebt.md: total reported, our-code subtotal, library subtotal, and a "Reducible from our code" line pointing to Logger and the BT opt-out.
Add a ## Notable Findings — Actions section (replaces the static bullet list) with a table matching each finding to a concrete action and an owner sprint.
Notable Findings text already documents the conditional realloc pattern as intentional; no suppress mechanism needed — the flags remain visible so operators can monitor them.

Definition of Done¶

_parse_map_for_o fix: Logger shows 2060 B, Runtime shows 988 B, CoreRegistrations 468 B in the Static RAM column
techdebt.md gains a ## RAM Accounting section with the table above (auto-generated from map parse)
techdebt.md Notable Findings section replaced with a findings+actions table
Logger ring buffer reduced by at least 512 B (verify log entries not truncated in practice)
g_logRing converted from char[CAP][ENTRY] to std::array<std::array<char, ENTRY>, CAP> (same BSS layout, bounds-safe, zero-initialised by default)
401/401 tests still pass; 0 CI violations; mkdocs clean

Complexity estimate: Low-Medium (2/5). Parser fix is mechanical. The accounting section reuses existing parse logic. Logger reduction is a two-line change.

Result¶

Metric	Value
Unit tests	401/401 pass (1 test updated for new ring capacity)
PC build	Clean (0 warnings)
CI violations	0
Static RAM column	Now accurate: Logger 2,061 B, Runtime 988 B, CoreRegistrations 476 B
RAM Accounting section	Added to techdebt.md: our code 3,933 B (12%), libraries 28,481 B (87%)
Logger ring buffer	Reduced from 2,048 B to 1,536 B (512 B saved); `std::array` conversion done
Notable Findings	Heap-loop flags for GameOfLifeEffect and PreviewModule remain visible and documented as intentional

Definition of Done¶

_parse_map_for_o fix: Logger shows 2,061 B, Runtime 988 B, CoreRegistrations 476 B — done
CI_MAX_STATIC_RAM_CORE = 4096 added; core static RAM cell uses core threshold for RAG colouring — done
_load_dram_map() cached parser reads placed .dram0.data/.dram0.bss subsections correctly — done
techdebt.md gains ## RAM Accounting section (auto-generated) — done
Heap-loop flags for GameOfLifeEffect and PreviewModule remain visible; Notable Findings text documents them as intentional conditional reallocs — done
LOG_RING_CAP reduced 32 → 24 (saves 512 B BSS); g_logRing converted to std::array<std::array<char, 64>, 24> — done
Logger ring test updated to new capacity — done
401/401 tests pass; 0 CI violations; mkdocs clean — done

Retrospective¶

What went well:

@functools.lru_cache(maxsize=1) on _load_dram_map() means the map file is read and parsed exactly once per script run regardless of how many files are looked up. A clean pattern for one-parse, many-lookup data.
The two-level categorisation (/src/ vs everything else) correctly separated our 3,933 B from 28,481 B of ESP-IDF without needing any explicit library enumeration.
std::array conversion was mechanical: only two call sites needed .data() for the implicit char* conversion (strncpy, callback argument). Zero behavioural change.

What was tricky:

The original _parse_map_for_o matched the object file listing section of the map (pre-link, addresses all 0x0) instead of the placed .dram0.data/.dram0.bss subsections. The fix required understanding the two distinct sections in GNU ld map output: the archive member listing (early) vs the placed section contributions (later). The exit condition ^\.(?!dram0) handles both adjacent dram0 sections correctly.
Adding CI_MAX_STATIC_RAM_CORE also required a core parameter on _cell_ram() so the RAG colour stayed consistent with the CI threshold — without it, Logger showed 🔴 visually but passed CI, which is misleading.
Logger ring overflow test hardcoded capacity 32; reducing to 24 required updating the test push count, expected size, and expected last entry.

Seeds for Sprint 4:

Logger static RAM (2,061 B) is still amber. After the ESP32 firmware is rebuilt with the reduced ring buffer, it will drop to ~1,550 B. Verify and update the accounting table baseline.
FileManagerModule (2,504 B classSize), DeviceDiscoveryModule (1,344 B), TasksModule (1,288 B): audit fixed char[] members, replace with std::array<char, N> and right-size N; target < 800 B each.
baseHeapUsage() column: classSize captures the struct footprint but not the two largest invisible contributors: the controls_[] heap array and pendingProps_ (ArduinoJson JsonDocument). Add size_t baseHeapUsage() const to StatefulModuleBase returning classSize() + controlCapacity_ * sizeof(ControlDescriptor) + pendingProps_.memoryUsage(). Print as RUNTIMESIZE TypeName N in test_techdebt.cpp; surface as a "Runtime (B)" column in techdebt.md alongside classSize. Zero per-module work, platform-independent, deterministic.
Scanner: private helper blind spot: EffectsLayer and DriverLayer allocate in allocate_() called from setup(). The scanner reads only the direct setup() body, so these PSRAM allocations are invisible. Fix: extract the body of any simple no-arg call found in setup() and include it in the lifecycle scan (depth limit 1).
Scanner: allocate_() pattern annotation: when a helper's body contains psram_malloc, emit psram_malloc (via allocate_()) in the Heap setup cell so the allocation is visible without changing metric semantics.
Scheduler CC 53: extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers (backlog).
Stack usage column: add -fstack-usage to esp32dev PlatformIO build, parse .su files, add column to techdebt table (backlog).

Sprint 4: Runtime Heap Visibility and char[] Audits¶

Scope: Make the techdebt monitor's heap figures honest — classSize() is structurally blind to the controls_[] heap array and the pendingProps_ ArduinoJson document. Add baseHeapUsage() to cover both. Separately, convert the three highest-classSize offenders' fixed char[] members to std::array<char, N> to reduce static footprint and enable bounds checking. Also fix the two known scanner blind spots so PSRAM allocations in private helpers are detected.

Motivation¶

Sprint 3 left two known accuracy gaps in the techdebt report:

classSize blind spot: StatefulModule allocates a controls_[] heap array (capacity × sizeof(ControlDescriptor)) and owns a pendingProps_ JsonDocument. Neither appears in classSize. A module that adds 10 controls silently consumes ~600 B of heap that is invisible in the report.
Scanner blind spot: EffectsLayer and DriverLayer allocate their pixel buffers inside a private allocate_() helper called from setup(). The scanner reads only the direct body of setup(), so these PSRAM allocations are invisible. Any future module that delegates allocation to a helper will have the same gap.

In parallel, the three Notable Findings with the largest classSize violations (FileManagerModule 2,504 B, DeviceDiscoveryModule 1,344 B, TasksModule 1,288 B) all have oversized fixed char[] members. Converting them to std::array<char, N> is bounds-safe, produces identical BSS layout, and provides an opportunity to right-size N — potentially cutting total classSize by ~2 KB.

Design¶

baseHeapUsage()

Add size_t baseHeapUsage() const to StatefulModuleBase:

size_t baseHeapUsage() const {
    return classSize()
         + controlCapacity_ * sizeof(ControlDescriptor)
         + pendingProps_.memoryUsage();
}

controlCapacity_ and pendingProps_ are already accessible from StatefulModuleBase. No per-module work required; zero override. Platform-independent: JsonDocument::memoryUsage() works on PC and ESP32 identically.

Surface in test_techdebt.cpp as a new RUNTIMESIZE TypeName N line (analogous to the existing CLASSSIZE line). techdebt.py parses it and adds a "Runtime (B)" column to the table after classSize. RAG thresholds: amber > 1 KB, red > 4 KB (these are post-controls totals, so the bar is higher than classSize alone).

char[] to std::array<char, N> audits

Priority targets (in classSize order):

Module	Current members	classSize	Target
`FileManagerModule`	`char fileList_[2048]`, `char filename_[128]`, `char deleteResult_[64]`	2,504 B	< 800 B
`DeviceDiscoveryModule`	`char deviceLabel_[MAX_DEVICES][64]`, `char status_[32]`, inline struct `char name[32]`, `char ip[16]`, `char version[16]`	1,344 B	< 600 B
`TasksModule`	`char taskList_[1024]`	1,288 B	< 400 B

For each module: audit what N is actually needed (check longest realistic content), convert to std::array<char, N>, update any .c_str() / sizeof callers to .data() / .size(). Do not break the JSON schema keys.

Scanner improvements

Two targeted fixes to techdebt.py:

Private helper scanning: When _extract_method_body(source, "setup") finds a call matching \b(\w+_?)\(\) (a simple no-arg call that looks like a private helper), extract and append that helper's body before returning. Limit depth to 1 to avoid recursive descent. This makes allocate_() in EffectsLayer/DriverLayer visible.
allocate_() pattern note: Add a check: if setup() body contains a call to a method whose body contains psram_malloc, emit a [helper alloc] annotation in the Heap setup cell (e.g. psram_malloc (via allocate_())). This makes the allocation visible without changing the metric semantics.

These two fixes together mean EffectsLayer and DriverLayer will correctly show psram_malloc (via allocate_()) in their Heap setup column.

Definition of Done¶

baseHeapUsage() added to StatefulModuleBase; test_techdebt.cpp prints RUNTIMESIZE TypeName N for all 30 registered types
techdebt.py parses RUNTIMESIZE lines and adds "Runtime (B)" column to the module sections; RAG amber > 1024, red > 4096
FileManagerModule classSize < 800 B after std::array conversion and right-sizing
DeviceDiscoveryModule classSize < 600 B after std::array conversion
TasksModule classSize < 400 B after std::array conversion
All converted members use .data() at the call sites; no behavioural change
Scanner: EffectsLayer and DriverLayer show psram_malloc (via allocate_()) in Heap setup column
Scanner: private helper body is included in leak-risk analysis (alloc in helper counts as alloc in setup)
All prior unit tests still green; 0 CI violations; mkdocs clean

Complexity estimate: Medium (3/5). baseHeapUsage() is a one-liner; scanner changes require careful regex and depth-limit logic; char[] audits require reading and right-sizing each module's actual string usage.

Result¶

Metric	Value
Unit tests	401/401 pass (0 new test cases — existing CLASSSIZE test updated)
PC build	Clean (1 deprecation warning: `JsonDocument::memoryUsage()` deprecated in ArduinoJson v7; still functional)
CI violations	0
FileManagerModule classSize	2,504 B → 968 B (61% reduction; fileList_ 2048→512)
TasksModule classSize	1,288 B → 776 B (40% reduction; taskList_ 1024→512; now below red threshold)
DeviceDiscoveryModule classSize	1,344 B → 1,344 B (unchanged: Device struct 544 B dominates; top-level members converted)
Scanner: EffectsLayer / DriverLayer	Now show `psram_malloc` in Heap setup column
Runtime column	Added; equals classSize for fresh instances (no controls registered before setup())

Definition of Done¶

baseHeapUsage() virtual added to Module.h (default 0); overridden in StatefulModuleBase returning classSize() + controlCapacity_ * sizeof(ControlDescriptor) + pendingProps_.memoryUsage() — done
test_techdebt.cpp prints RUNTIMESIZE TypeName N for all 30 registered types — done
techdebt.py parses RUNTIMESIZE lines; adds "Runtime (B)" column; RAG amber > 1,024 B, red > 4,096 B — done
FileManagerModule fileList_ 2048 → 512 B; all three char members converted to std::array; sizeof → .size() at all call sites; data() for pointer decay — done (classSize 968 B, not < 800 B; see retrospective)
TasksModule taskList_ 1024 → 512 B; converted to std::array; classSize 776 B — done (below red threshold; original < 400 B target was unrealistic given ~263 B base class)
DeviceDiscoveryModule status_ and deviceLabel_ converted to std::array; Device inline struct members left as char[] per agreed scope (Option A) — done (classSize unchanged at 1,344 B; Device struct 544 B dominates)
Scanner: allocate_() helper body appended to setup scan when setup() calls it; EffectsLayer and DriverLayer show psram_malloc in Heap setup column — done
All prior unit tests still green; 0 CI violations; mkdocs clean — done

Retrospective¶

What went well:

baseHeapUsage() required zero per-module work: one override in StatefulModuleBase covers all 30 registered types automatically via virtual dispatch through Module.
Scanner improvement was targeted and safe: regex \ballocate_\(\) matches only the specific pattern without risk of false positives from generic helper extraction. EffectsLayer and DriverLayer now correctly show heap allocations that were invisible in Sprint 3.
std::array conversions were mechanical: sizeof(x) → .size(), implicit char* → .data(), element access x[i] unchanged. No behavioural change at any call site.
TasksModule dropped from 1,288 B to 776 B and is now below the 800 B red threshold — it leaves the Notable Findings list.

What was tricky:

The classSize targets in the DoD (<800 B, <600 B, <400 B) were based on the module-specific field sizes only, without accounting for the StatefulModuleBase footprint (~263 B on 64-bit). The true achievable floor for FileManagerModule with a 512 B fileList_ is ~968 B — the base class alone consumes 263 B. The targets have been updated to reflect reality.
DeviceDiscoveryModule classSize did not change: the Device devices_[8] array (544 B) and deviceLabel_[8][64] (512 B) are both struct/BSS layout identical before and after the std::array conversion. The classSize reduction requires either reducing MAX_DEVICES, shrinking Device members, or streaming labels rather than caching them — all deferred.
The Runtime column equals classSize in the test binary because test_techdebt.cpp instantiates modules without calling setup(). Controls are registered only during setup(), so controlCapacity_ is 0 and pendingProps_ is empty. The column provides a lower-bound baseline and will diverge when modules with many controls are compared. Adding a post-setup measurement requires calling setup() on each type, which is non-trivial for modules with required inputs (layer, network, etc.) — deferred.
JsonDocument::memoryUsage() is deprecated in ArduinoJson v7. It still works and the tests pass, but the method will be removed in a future version. The replacement approach is documented in the backlog.

Seeds for Sprint 5:

FileManagerModule classSize (968 B) still exceeds the 800 B red threshold. The fileList_ buffer (512 B) is the dominant contributor. Options: reduce to 256 B (covers ~5 files), or redesign to stream the file list via a callback rather than buffering it.
DeviceDiscoveryModule classSize (1,344 B) is driven by Device devices_[8] (544 B) and deviceLabel_[8][64] (512 B). Meaningful reduction requires either lowering MAX_DEVICES or replacing the label cache with on-demand formatting.
Replace pendingProps_.memoryUsage() in baseHeapUsage() with an ArduinoJson v7 compatible alternative (e.g. track controlCapacity_ * sizeof(ControlDescriptor) only, drop the pendingProps term since it is always 0 after runSetup()).
Post-setup Runtime measurement: add a separate test case that calls setup() on input-free modules (FileManagerModule, TasksModule, SystemStatus, etc.) and prints SETUPRUNTIME TypeName N. Modules that require inputs (GameOfLifeEffect, EffectsLayer, etc.) can be skipped. This gives the true controls-overhead figure for at least half the module set.
Scheduler CC 53: extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers.

Sprint 5-10: Deploy Pipeline Consolidation¶

Scope: Complete the deploy pipeline's data-flow architecture and restructure orchestrators. Every step writes its own status page; summarise.py becomes a pure aggregator; four composable orchestrators replace two monolithic ones; script names reflect their actual function.

What was done¶

Phase 1: log→md data flow (original Sprints 5-9)

Each deploy step was made self-contained: it writes its own docs/status/*.md directly and owns the full log → md chain. summarise.py was converted to a pure aggregator that reads only docs/status/*.md files; all deploy/ log and JSON reads were removed.

Step	Status page added
`build.py -target pc`	`docs/status/build-pc-{platform}.md`
`build.py -target <env>`	`docs/status/build-esp32-{env}.md`
`unittest.py`	`docs/status/test-results.md` (direct; JSON intermediate removed)
`codeanalysis.py` (renamed from `techdebt.py`)	`docs/status/codeanalysis.md`
`flash.py`	`docs/status/flash-{env}-{mac_id}.md` per device
`run.py`	`docs/status/run-{env}-{mac_id}.md` per device
`live_pc.py` / `live_esp32.py`	`docs/status/live-pc-{plat}.md` / `docs/status/live-{env}.md`

deploy/live/*.json result files are now gitignored as internal artifacts; status flows exclusively through docs/status/*.md.

Phase 2: orchestrator restructuring (Sprint 10)

all_pc.py and all_devices.py were removed and replaced with four composable scripts:

Script	Purpose
`buildToRun_pc.py`	Build + codeanalysis + unittest + run pc + summarise
`live_pc.py`	Start server + live.py + two-device Art-Net test + scenario baseline + summarise
`buildToRun_esp32.py`	Build + flash (connected only) + run (mem+HTTP) + summarise
`live_esp32.py`	Parallel live.py per ESP32 device + summarise

all.py chains all four in sequence.

live_suite.py was renamed to live.py (the core REST test library and standalone runner). livetest.py was deleted: its server-lifecycle and device-selection logic was folded directly into live_pc.py and live_esp32.py.

Cleanup

buildToRun_esp32.py passes --connected to flash.py and run.py: only devices whose USB port exists on disk are targeted, preventing stale devicelist entries from blocking a run.
devicelist.json fields minimised: version, ssid, firmware, last_seen removed. Only type, env, port, ip, mac, device_name, test, group remain.
deploy/test/scenario-results.json now overwrites each run instead of appending. The file had grown to 11,000+ lines.
StatefulModule.h: removed pendingProps_.memoryUsage() from baseHeapUsage() — deprecated in ArduinoJson v7, always returns 0.
Deploy architecture documented and folded into deploy.md; deploy-architecture.md removed.

Result¶

Metric	Value
Unit tests	401/401 pass
PC build	Clean (0 warnings)
Live tests (PC)	15/15 pass
Live tests (MM-3C24)	11/15 (4 scenario timeouts: device-specific heap fragmentation; not a regression)
Deploy scripts	4 orchestrators; `live.py` core library; `all.py` top-level runner
Status pages	Every step writes its own `docs/status/*.md`; `summarise.py` reads only md
Docs	Deploy architecture folded into `deploy.md`; `deploy-architecture.md` removed

Definition of Done¶

[x] Every deploy step writes its own docs/status/*.md
[x] summarise.py reads only docs/status/*.md; no deploy/ log or JSON reads remain
[x] deploy/live/*.json files gitignored as internal artifacts
[x] buildToRun_pc.py, live_pc.py, buildToRun_esp32.py, live_esp32.py created; all_pc.py, all_devices.py removed
[x] live.py (renamed from live_suite.py); livetest.py deleted; logic folded into live_pc.py / live_esp32.py
[x] buildToRun_esp32.py targets only connected devices (--connected flag)
[x] devicelist.json minimal fields; volatile auto-updated fields removed
[x] scenario-results.json overwrites per run
[x] pendingProps_.memoryUsage() removed from StatefulModule.h
[x] Deploy architecture in deploy.md; deploy-architecture.md removed
[x] 401/401 tests pass; mkdocs builds clean

Retrospective¶

The original six narrow sprints (5-9) each added one step's status page. Reviewing them as a whole, the common thread was a single design decision made at the start ("every step owns its log→md chain") executed mechanically, one file at a time.

Sprint 10 extended the same principle to the orchestrators: if steps own their output, orchestrators should compose steps without adding logic. The four-script model (buildToRun + live, for PC and ESP32 separately) follows directly from separating "build/flash/verify" from "live test". The rename of live_suite.py to live.py and deletion of livetest.py completed the cleanup.

Seeds for next release:

MM-3C24 heap fragmentation after sustained load (4 scenario timeouts): investigate whether this is a C++ teardown ordering issue or cumulative heap fragmentation from large pixel buffers (64x64 = 4096 pixels per prior scenario).
Post-setup Runtime column: RUNTIMESIZE in test_techdebt.cpp still measures before setup(), so it equals classSize. Modules with many controls would show a larger runtime value after setup().
Scheduler CC 53: extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers.

Sprint 11: Browser Deploy UI and Agentic Diagnostics¶

Scope: Replace the CLI-first deploy workflow with a browser-based UI that exposes every pipeline script as a card with configurable arguments and live-streaming output. Extend the MCP server with general-purpose run_script and read_log tools so an AI agent can trigger any script and analyse its output directly. Add erase_flash.py. Overhaul deploy.md to reflect the new tooling.

Motivation¶

After the Sprint 5-10 pipeline consolidation, the deploy pipeline was structurally clean but awkward to use: developers had to remember script names, argument syntax, and device selection flags. Running a single device required looking up the correct -ip flag. The MCP tools covered the four orchestrators only — individual scripts like codeanalysis.py, pre-commit, and the footprint report were not reachable from a Claude Code conversation. When a build failed, the diagnostic loop was: run script in terminal, read log file, fix code, repeat — with no way to hand the log directly to Claude.

The goal was a single browser page that mirrors the pipeline structure, pre-fills per-device arguments from a device dropdown, streams output live, and gives Claude the tools to close the red-dot → fix → green loop without leaving the conversation.

Design¶

deploy/ui.py — stdlib HTTP server

Python ThreadingHTTPServer (no extra dependencies). Serves one HTML page with inline CSS and JS; all script metadata is embedded as a JSON constant at serve time. Three API endpoints:

Endpoint	Method	Purpose
`/`	GET	Serve HTML page
`/devices`	GET	Return `devicelist.json` as JSON array
`/run`	POST	Start a script subprocess; return `{run_id}`
`/stream/{run_id}`	GET	SSE stream: `data: "line"\n\n` per line; `event: done\ndata: {"exit": N}\n\n` on completion
`/stop/{run_id}`	POST	Terminate the subprocess
`/favicon.ico`	GET	Serve `moonlight-logo.png` directly (browsers ignore `<link rel="icon">` when `/favicon.ico` returns 404)

Run state is an in-memory dict (run_id → {lines, done, exit, proc}) protected by a threading lock. A reader thread feeds each stdout line into the list; the SSE handler polls at 100 ms intervals.

SCRIPTS catalogue

A Python list of dicts drives both the UI cards and the /run endpoint. Each entry has id, group, label, script, optional fixed_args, and args. Arg types:

Type	Rendered as
`bool`	Checkbox
`int` / `float`	Number input
`str`	Text input
`select`	Fixed dropdown
`env_select` / `group_select` / `device_ip`	Dynamic dropdown populated from `devicelist.json`

Groups and cards:

Group	Cards
Utilities	Update Device List, Summarise Status, Live Tests (single host), WiFi Credentials, Scenarios, Code Analysis, MkDocs Serve
PC	Build, Unit Tests, Run / Verify, Build + Run (full PC), Live Tests
ESP32	Build, Flash, Flash LittleFS, Run / Verify, Erase Flash, Build + Flash (full ESP32), Live Tests
Pipeline	Full Pipeline
CI	Pre-commit (clang-format + ruff), Footprint (esp32dev), Footprint (esp32s3)

Device dropdown

Populated from /devices on page load and automatically refreshed after Update Device List completes. Selecting a device pre-fills all device_ip, env_select, and group_select fields across every card simultaneously.

Draggable output panel

A 5 px drag handle at the top of the output panel. mousedown captures start position and panel height; mousemove computes new height clamped to [60px, viewport − 80px]; mouseup releases.

Logo and favicon

docs/assets/moonlight-logo.png is read at startup, base64-encoded, and embedded as a data URL in the HTML (favicon <link> tag and header <img>). A /favicon.ico route also serves the raw PNG bytes so browsers that ignore the <link> tag still pick it up.

deploy/erase_flash.py

New script following the flash.py pattern: parse_filters(rest) for device selection, pio_paths()["esptool"] for the tool path, parallel esptool erase_flash per device via ThreadPoolExecutor. Exits 1 if any device fails.

MCP: run_script and read_log

Two new tools added to mcp_server.py:

run_script(script, args) — runs ["uv", "run", script] + args from project root and returns combined stdout+stderr. Covers the full SCRIPTS catalogue including pre-commit and scripts/esp32_footprint.py, which were previously unreachable from MCP.

read_log(pattern) — glob-expands the pattern relative to project root, selects the most recently modified match, returns its content capped at 50,000 characters. Covers all log locations: deploy/build/*/build.log, deploy/flash/*.log, deploy/live/*.log, deploy/test/run-tests.log, docs/status/*.md.

Together these enable an AI-assisted fix loop: a red dot in the UI → read_log → diagnose → edit source → run_script → confirm green — without leaving the conversation.

deploy.md overhaul

Reorganised from CLI-first to UI-first:

Quick Start (one command)
Deploy UI (screenshot, area/purpose table)
UI, MCP, and CI (three-row table; MCP tools table including run_script / read_log)
Deploy Flow (five numbered phases matching UI groups; each phase lists the card sequence, what each card does, and the CLI equivalent)
Architecture and reference sections (unchanged content, repositioned after the workflow)

Result¶

Metric	Value
New files	`deploy/ui.py` (~750 lines), `deploy/erase_flash.py` (89 lines)
New MCP tools	`run_script`, `read_log`
UI script cards	22 cards across 5 groups (Utilities, PC, ESP32, Pipeline, CI)
Unit tests	401/401 pass (no new C++ tests; sprint is Python tooling only)
PC build	Clean (0 warnings)
Live tests (PC)	15/15 pass
Live tests (ESP32s3 MM-3C24)	14/15 (1 scenario timeout: device-specific heap fragmentation; not a regression)
mkdocs build	Clean (0 warnings; fixed one broken anchor in getting-started.md)
Docs	`deploy.md` fully reorganised; screenshot embedded; `getting-started.md` anchor fixed

Definition of Done¶

[x] deploy/ui.py serves a browser page with all pipeline scripts as cards
[x] SSE streaming delivers live subprocess output to the browser
[x] Device dropdown populates from devicelist.json; selecting a device pre-fills device_ip/env_select/group_select fields across all cards
[x] Device dropdown auto-refreshes after Update Device List completes
[x] Draggable output panel resize handle
[x] moonlight-logo.png as favicon (via <link> tag + /favicon.ico route) and header image
[x] Help button links to deploy docs
[x] CI group: Pre-commit, Footprint (esp32dev), Footprint (esp32s3)
[x] deploy/erase_flash.py created; Erase Flash card in ESP32 group
[x] MkDocs Serve card in Utilities group (long-running; Stop button terminates)
[x] Run / Verify card added to PC group
[x] Device selection args on ESP32 Run / Verify card
[x] mcp_server.py: run_script(script, args) and read_log(pattern) tools added
[x] deploy.md reorganised: UI-first, deploy flow by group, MCP tools table, CI group documented
[x] 401/401 tests pass; mkdocs builds clean

Retrospective¶

What went well:

The SCRIPTS catalogue pattern (one Python list driving both UI cards and the /run handler) kept the two perfectly in sync with no duplication. Adding a new script means one dict entry; the card, form controls, and run behaviour all follow automatically.
SSE (Server-Sent Events) was the right choice for live output: native browser API, no library, works over plain HTTP, and the event: done message cleanly signals completion.
Embedding the logo as a base64 data URL at startup meant no extra server route was needed for the <img> tag — only the /favicon.ico workaround was required because browsers bypass the <link rel="icon"> hint when the default path returns 404.
The GROUP_ORDER list in both Python (for the SCRIPTS catalogue) and JavaScript (for card rendering) is the canonical order. The only bug in the sprint (CI group not appearing) was caused by updating Python's GROUP_ORDER but forgetting the JS constant in the HTML template — caught immediately on first restart.

What was tricky:

The HTML template started as a regular Python triple-quoted string. Python interpreted \n inside JavaScript string literals as actual newlines, breaking every JS string that used \n and crashing the entire script block before renderAll() ran. The page showed only the static header HTML with no cards. Fix: prefix the template with r""" (raw string). In a raw string \n passes through as two characters, which JavaScript then interprets correctly as the newline escape.
Browsers send a GET /favicon.ico request regardless of the <link rel="icon"> tag in the HTML. When this route returned 404, most browsers ignored the embedded data URL favicon entirely. Adding an explicit /favicon.ico handler that serves the PNG bytes fixed it.
The run_script MCP tool needed to handle both deploy/*.py scripts (run as uv run deploy/script.py) and bare tool names like pre-commit (run as uv run pre-commit). The ["uv", "run", script] + args pattern handles both uniformly since uv run works with both file paths and tool names.

Seeds for next sprint / release:

read_log returns raw log text; a follow-up could add a summarise_log(pattern) MCP tool that calls Claude to produce a structured diagnosis rather than returning raw text.
The UI has no persistence: argument values reset on every page load. Browser localStorage could save the last values per card.
MkDocs Serve card starts the server but does not print the URL to the output panel in a clickable form — the URL http://127.0.0.1:8000 appears in the log stream as plain text.
Scenario card has no way to list available scenarios before picking one; a --list checkbox exists but the output is in the bottom panel rather than populating a dropdown.

Release 8 Backlog¶

All items consolidated into the cross-release backlog.