Skip to content

Release 8: Dynamic Controls and UI Adaptability

Theme: Release 8 adds dynamic control schemas (the ability to rebuild a module's control set at runtime based on the current value of other controls) so the UI can show only the parameters that are relevant for the active configuration. Later sprints extend the release into deploy-pipeline health: a code analysis monitor, runtime heap visibility, and a structured overview of the full deploy architecture.


Release Overview

What was delivered in Release 7 (build on this)

Strength Notes
OTA firmware update FirmwareUpdateModule: file upload + GitHub releases tab; POST /api/firmware
CI release pipeline Tagged releases + nightly pre-release with firmware assets on GitHub
Windows support Native .exe build; projectMM-pc-windows.zip in CI artifacts
Scenario baselines Hardware --update-baseline run; "extends" inheritance; wired into all.py
Static RAM hardening Per-device LOG_RING_SIZE; WiFi buffer tuning; dual check_alloc guard
Log frontend panel WS push of ring buffer entries; collapsible log UI

What Release 8 addresses

Problem Sprint
Control schema is fixed at setup() time; irrelevant parameters always visible regardless of selected type Sprint 1 (Dynamic controls: clearControls(), rebuildControls(), early WS flush), complete
Static RAM column in techdebt monitor always shows 0 (parser bug); no accounting of what consumes the 51 KB ESP32 RAM; Notable Findings have no action owners Sprint 3 (RAM accounting, parser fix, actions table), complete
classSize() misses runtime heap (controls_[] array, pendingProps_ doc); large char[] struct members inflate classSize; scanner blind to allocations in private helpers Sprint 4 (baseHeapUsage, char[] audits, scanner improvements), complete
Deploy pipeline grew to 17+ scripts with no architecture overview; steps produced no status pages; techdebt.py name misleading; orchestrators monolithic Sprints 5-10 (full log→md pipeline, orchestrator restructuring, naming cleanup), complete
No interactive way to trigger individual deploy scripts; MCP tools covered only orchestrators; no AI-assisted log analysis; deploy.md was CLI-first with no visual overview Sprint 11 (browser deploy UI, run_script/read_log MCP tools, deploy.md overhaul), complete

Sprints

Sprint Goal
Sprint 1 Dynamic controls: clearControls(), rebuildControls() virtual, early WS schema flush
Sprint 2 Technical-debt monitor: per-module metrics (LOC, function count, complexity, static RAM, heap/blocking violations) as a CI script
Sprint 3 RAM accounting balance, fix static RAM parser, Notable Findings actions, Logger ring buffer reduction
Sprint 4 baseHeapUsage() column, char[] to std::array audits, scanner improvements for private helpers
Sprint 5-10 Deploy pipeline consolidation: full log→md data flow, orchestrator restructuring, naming cleanup — complete
Sprint 11 Browser deploy UI, run_script/read_log MCP tools, erase_flash.py, deploy.md overhaul — complete

Sprint 1: Dynamic Controls

Scope: Allow a module to rebuild its control schema at runtime in response to a control value change. The primary use case: a type selector control switches between effect variants, and only the parameters relevant to the active type are shown. The control set is rebuilt without a full module restart.

Motivation

Today, addControl() is called once in setup() and the schema is fixed for the lifetime of the module. A module that supports multiple effect types must expose all parameters for all types simultaneously, cluttering the UI and confusing operators. The fix: make the schema a function of the control values, rebuilt on demand.

Design

clearControls(system = false)

Added to StatefulModule. Iterates the registered controls_[] descriptors and removes all entries that are not marked system. Before removing each descriptor, writes the current value of the backing variable back into the pendingProps_ stash (keyed by control name). This means a subsequent addControl(var, key, ...) call for the same key restores the last operator-set value automatically — values are preserved across rebuilds even when the control temporarily disappears.

System controls (enabled) are marked at registration time with a system flag in ControlDescriptor. clearControls() skips them unconditionally.

rebuildControls() virtual

New virtual method on StatefulModule; default implementation is a no-op (all existing modules continue to work unchanged). Modules that want dynamic controls override it:

void rebuildControls() override {
    clearControls();
    addControl(type_, "type", "select", {"Ripples", "Lines", "Sine"});
    if (type_ == EffectType::Ripples) {
        addControl(speed_,  "speed",  "slider", 0.1f, 10.0f);
        addControl(radius_, "radius", "slider", 1.0f, 50.0f);
    } else if (type_ == EffectType::Lines) {
        addControl(speed_,  "speed",  "slider", 0.1f, 10.0f);
        addControl(count_,  "count",  "slider", 1,    20);
    }
}

void setup() override {
    rebuildControls();   // replaces direct addControl() calls
}

void onUpdate(const char* key) override {
    if (strcmp(key, "type") == 0) rebuildControls();
}

Modules that do not need dynamic controls keep calling addControl() directly in setup() — no migration required.

Early WS schema flush

After rebuildControls() finishes, the UI must reflect the new schema immediately rather than waiting up to 1 s for the next periodic push. Implementation: clearControls() sets a schemaDirty_ flag on StatefulModule. The main loop checks schemaDirty_ across all modules and, if set, sends a {"t":"schema","modules":[...]} WS push using getModulesJson() (full schema including control types, options, min/max, and current values) and clears the flag. On a clean tick, the periodic 200 ms push uses getStateJson() (flat key/value state) as before. Natural debounce: a burst of rebuildControls() calls within one tick produces exactly one push.

A dedicated {"t":"schema"} message type is required because getStateJson() sends only flat {key:value} pairs; handleStateUpdate() in the frontend updates existing DOM elements but cannot add or remove controls. When rebuildControls() changes the control set, the frontend must call render() to rebuild the card from scratch.

State persistence interaction

saveState() and loadState() iterate the registered descriptors. After a rebuild, only the currently registered controls are persisted — parameters for inactive types are not written to the state file. On the next load, pendingProps_ carries any previously saved values; addControl() applies them if the key matches a registered control after rebuildControls() runs. A type control persisted in state is applied before rebuildControls() is called (via the existing addControl stash mechanism), so the correct variant's parameters are registered and restored on first boot.

Sprint 1 Scope Definition of Done

  • ControlDescriptor gains bool system field; StatefulModule::runSetup() sets it when registering enabled
  • clearControls() removes non-system descriptors; saves current values to pendingProps_ stash before removal
  • rebuildControls() virtual added to StatefulModule; default is no-op; existing modules compile and behave identically
  • schemaDirty_ flag set by clearControls(); main loop early-flush path clears it and sends a {"t":"schema","modules":[...]} WS push
  • Reference implementation: one new module (e.g. MultiEffectModule or adapted existing effect) demonstrating type selector + conditional parameters
  • Unit tests: rebuild preserves values of re-registered controls; rebuild discards values of removed controls; system controls survive clearControls(); schemaDirty_ triggers exactly one early flush per rebuild burst
  • Frontend: {"t":"schema"} handler added; calls render(msg.modules) to rebuild all cards from the full schema
  • All prior unit tests still green

Complexity estimate: Low-Medium (2/5). The stash mechanism already exists; clearControls() is a small loop; the early flush reuses the existing push path. The trickiest part is the state-persistence ordering (type value applied before rebuild runs).


Result

Metric Value
Unit tests 399/399 pass (8 new tests added)
PC build Clean (0 warnings)
ESP32dev build Clean (0 warnings); BSS 16.3% (53 KB, down from 21.3% / 70 KB after static wsBuf removed)
ESP32s3 build Clean (0 warnings)
Live tests (PC) 15/15 all passing
Live tests (MM-70BC) 15/15 all passing
Live tests (MM-C1BC) 12/15 (hardware capacity limits: 64x64 OOM, fps below 1000 on 16x16, 4-layer OOM on classic ESP32)

Definition of Done

  • ControlDescriptor gains bool system = false field; runSetup() sets it after registering enableddone
  • clearControls() preserves system controls, saves non-system values to pendingProps_ stash, sets schemaDirty_ when controls are actually removed — done
  • rebuildControls() virtual added to StatefulModuleBase; default is no-op; all existing modules compile and behave identically — done
  • schemaDirty_ flag; ModuleManager::hasSchemaDirty() / clearSchemaDirty(); WS broadcast loop in main.cpp and AppSetup.cpp sends {"t":"schema","modules":[...]} on dirty tick, getStateJson() array on periodic tick — done
  • Reference implementation: SineEffectModule adapted with type selector (Sine / Ripples), rebuildControls(), and onUpdate("type")done
  • Unit tests: rebuild preserves values of re-registered controls; rebuild does not affect unrelated fields; system controls survive clearControls(); schemaDirty_ set/cleared correctly; burst produces exactly one flag — done (7 new test cases)
  • Frontend: {"t":"schema"} message type handler added to app.js; calls render(msg.modules) to rebuild all cards — done
  • All prior unit tests still green — 399/399
  • Static wsBuf[16384] removed from AppSetup.cpp; both WS push branches now allocate on demand via heap_caps_malloc / heap_caps_freedone
  • pal::net_early_init() calls Network.begin() before scheduler.setup() to guarantee the TCP/IP stack is ready before any module opens sockets — done
  • DeviceDiscovery::setup() guards broadcastPresence_() behind sock_ >= 0; loop() retries udp_bind() when sock_ < 0done

Retrospective

What went well:

  • The pendingProps_ stash already existed and worked without modification — clearControls() just needed to write into it before removing each descriptor.
  • The runSetup() full-wipe / clearControls() mid-lifecycle split was clean once the two call sites were separated. Inlining the wipe in runSetup() was the right call.
  • Adapting SineEffectModule rather than writing a new module gave immediate test coverage for a real effect and kept the scope small.
  • The schemaDirty_ "only set when controls are actually removed" rule surfaced naturally from a failing test: first-call-from-setup had no prior controls, so the flag should not fire on initial build.

What was tricky:

  • The schemaDirty_ flag initially fired on the first rebuildControls() call from setup() (because clearControls() always set it). The fix — only set the flag when controlCount_ > kept — is semantically correct (no prior schema means no schema change) and made the test clean.
  • The kTypes / kWaveforms static constexpr arrays required the kTypeCount companion so addControl(uint8_t&, key, const char* const*, count) received a correct count without magic numbers.
  • hasSchemaDirty() and clearSchemaDirty() iterated owned_ without holding controlMutex_. On PC (multi-threaded HTTP server running at 400K+ fps), this created a data race with concurrent removeModule() calls that modify owned_ under the mutex. The server crashed intermittently mid-scenario after the WS client connected. Fix: add std::lock_guard<std::mutex> lk(controlMutex_) to both functions, matching the lock discipline used by getStateJson() and every other owned_ iterator.
  • The Design section claimed "no new WS message type is needed" — this was wrong. getStateJson() sends only flat {key:value} pairs; handleStateUpdate() in the frontend updates existing DOM elements by key lookup and cannot add or remove controls. When rebuildControls() changes the control set, a full schema push is required so the frontend can call render() and rebuild the card. The fix: a dedicated {"t":"schema","modules":[...]} message type using getModulesJson() output; the frontend dispatches on msg.t === "schema" and calls render(msg.modules).
  • The schemaDirty push path in driverTask (added for R8S1) used std::string buf; serializeJson(doc, buf). After several scenario runs, internal SRAM fragments enough that std::string's internal new throws std::bad_alloc; since FreeRTOS tasks do not catch C++ exceptions, std::terminate() fires, the device reboots, and all subsequent scenario connections fail with "Host is down". The free_heap_kb() > 16.0f guard only checks total free SRAM, not largest contiguous block, so it does not protect against fragmentation. Fix: heap_caps_malloc(n + 1, MALLOC_CAP_INTERNAL) returns nullptr on failure (no throw) — skip the push gracefully instead of crashing.
  • Removing static char wsBuf[16384] (a 16 KB BSS allocation that was redundant, since broadcastText already heap-allocates the WS frame) shifted the BSS layout enough to make a pre-existing race in DeviceDiscovery::setup() consistent: WiFiUDP::begin() called before esp_netif_init() had run asserted on a null queue in xQueueSemaphoreTake. Fix: pal::net_early_init() calls Network.begin() before scheduler.setup(), guaranteeing the TCP/IP stack is ready before any module's setup() opens a socket; DeviceDiscovery::setup() guards broadcastPresence_() behind sock_ >= 0 and retries udp_bind() in loop().

Seeds for Sprint 2:

  • RipplesEffectModule still exists as a standalone module — now that SineEffectModule embeds the same rendering, consider whether RipplesEffectModule should be retired or kept as an independent module for pipelines that want only ripples.
  • The clearControls() / rebuildControls() pattern is now proven. Other modules with mode-dependent parameters (e.g. layout type selectors) can adopt it when operators report UI clutter.
  • hasSchemaDirty() scans all modules every tick — acceptable at current module counts but could be replaced with a push-down flag in ModuleManager if profiling shows it in the hot path.
  • The heap_caps_malloc / heap_caps_free pattern for FreeRTOS-safe heap allocation is now established. Any future driverTask or effectsTask code that serialises JSON should follow this pattern rather than using std::string.

Sprint 2: Technical-Debt Monitor

Scope: Add a deploy/techdebt.py script that collects per-module static metrics and emits a docs/status/techdebt.md table. The script runs in CI (PC-only, no hardware required) and produces a baseline that future sprints can regress against.

Motivation

The codebase grows by adding modules. Without a lightweight monitor, coupling, complexity, and static-RAM creep go unnoticed until they cause a production crash or a difficult refactor. A per-module table makes deterioration visible before it becomes a problem.

Design

Metrics collected per module (.h + companion .cpp if present):

Metric Source Why
Lines of code (NLOC) lizard Python API Size proxy; outliers need splitting
Function count lizard Python API Too many functions signals God-class
Max cyclomatic complexity lizard Python API High complexity predicts bug density
Static RAM (BSS + data bytes) firmware.map from ESP32 build Direct measure; non-zero only when module has static members
Heap allocation sites in setup() Python grep scan Expected; informational; checked against teardown
Heap allocation sites in loop() Python grep scan Policy violation: allocations belong in setup()
Blocking calls in loop() Python grep scan delay(), vTaskDelay(), info-level LOG_*
Leak risk Python brace-scan Alloc in setup() with no matching free in teardown()
classSize() (instance bytes) TypeRegistry test binary True heap cost per module instance

Tools:

  • lizard (added to pyproject.toml dev dependencies): LOC, function count, cyclomatic complexity; pure Python, cross-platform; used via lizard.analyze_file() Python API (not CLI) to avoid version-dependent flag issues.
  • firmware.map from .pio/build/esp32dev/: parsed for BSS+data contributions per .cpp.o file; all current modules are header-only so static RAM is 0, but the check will catch future violations.
  • tests/test_techdebt.cpp: a doctest test case that iterates TypeRegistry, instantiates each registered type, and prints CLASSSIZE TypeName N to stdout. techdebt.py runs the test binary with -tc=techdebt* and parses the output. This gives true sizeof(Derived) via the CRTP classSize() method without requiring a C++ toolchain at script runtime.
  • Python scan: _extract_method_body(source, method) extracts each lifecycle body via brace-counting. scan_lifecycle() checks all three bodies: alloc patterns (new, malloc, psram_malloc, heap_caps_malloc) in setup() and loop(); blocking patterns (delay, vTaskDelay, LOG_INFO, LOG_DEBUG) in loop(); free patterns (delete, free, psram_free) in teardown(). Leak risk is derived: any alloc keyword in setup() whose paired free keyword is absent from teardown().

Output: docs/status/techdebt.md

Core Infrastructure section (on top) + one section per module category. Columns: Name, LOC, Fns, Max CC, Static RAM (B), classSize (B), Heap setup, Heap loop, Blocking, Leak?. RAG (green/amber/red) indicators on all numeric columns.

Thresholds (configurable at top of script):

MAX_LOC        = 400   # warn if a single module exceeds this
MAX_CC         = 25    # CI threshold; aspirational target is 10 (existing renderers reach 22)
MAX_STATIC_RAM = 512   # warn if BSS+data exceeds this (bytes)

Violations are emitted as > **WARNING** lines in the markdown and exit 1 so CI fails.

CI integration:

Added as a step in .github/workflows/ci.yml after all_pc.py (so the test binary exists). uv sync --extra dev runs first to install lizard. No hardware required.

Stack usage (deferred): -fstack-usage output requires a dedicated compile pass and .su file parsing. Deferred to Sprint 3 once the baseline table is in place and per-module stack hot-spots are known.

Definition of Done

  • lizard>=1.17 added to pyproject.toml [project.optional-dependencies] dev
  • tests/test_techdebt.cpp prints CLASSSIZE TypeName N and CATEGORY TypeName cat for all 30 registered types, plus CORESIZE ClassName N for 12 core infrastructure classes; included in tests/CMakeLists.txt
  • deploy/techdebt.py collects all metrics and writes docs/status/techdebt.md; lizard.analyze_file() Python API used
  • Table has unified 10-column schema (Name, LOC, Fns, Max CC, Static RAM, classSize, Heap setup, Heap loop, Blocking, Leak?) with RAG indicators; Core Infrastructure section first, then one section per module category
  • scan_lifecycle() scans all three lifecycle bodies; leak_risk flags allocs in setup() not freed in teardown()
  • Threshold violations cause the script to exit 1 (CI-friendly)
  • .github/workflows/ci.yml installs dev deps and runs techdebt.py after the PC build step
  • docs/status/techdebt.md committed as a baseline; no module exceeds any CI threshold
  • mkdocs.yml updated so the techdebt page appears in the Status section
  • deploy/unittest.py FILE_TITLES updated to include test_techdebt.cpp

Complexity estimate: Low (1/5). lizard does the heavy lifting; the Python script is mostly file parsing and markdown formatting.


Result

Metric Value
Unit tests 401/401 pass (2 new test cases added)
PC build Clean (0 warnings)
Modules in report 30 registered types + 19 core infrastructure files
Threshold violations 0 (baseline clean)
Heap-in-loop flagged 2 (GameOfLifeEffect and PreviewModule: conditional psram_malloc on geometry resize, intentional)
Heap-in-setup flagged 2 (GameOfLifeEffect: psram_malloc; ArtNetOutModule: malloc; both freed in teardown, Leak? empty)
Highest Max CC 22 (GameOfLifeEffect::loop)
Largest classSize FileManagerModule: 2504 B

See docs/status/codeanalysis.md for the current table (renamed from techdebt.md in Sprint 5).


Retrospective

What went well:

  • The lizard Python API (lizard.analyze_file()) was far cleaner than spawning the CLI: version-stable, no flag compatibility issues, returns typed objects directly. Using result.nloc and result.function_list was straightforward.
  • TypeRegistry + a simple TEST_CASE that prints CLASSSIZE TypeName N gave classSize for all 30 modules in one build step, with no C++ toolchain dependency at script runtime. The CRTP classSize() method meant zero per-module work.
  • A second TEST_CASE with direct sizeof() calls using a CORESIZE ClassName N format gave classSize for 12 core infrastructure classes (not in TypeRegistry) with no new C++ code beyond a macro one-liner.
  • _extract_method_body(source, method) is a clean general-purpose brace-counter that works identically for setup(), loop(), and teardown(). Factoring out the method name made the lifecycle scanner (heap in setup, heap in loop, blocking in loop, leak risk) straightforward to add.
  • Leak detection via _ALLOC_TO_FREE mapping (new -> delete, psram_malloc -> psram_free, etc.) correctly shows no leaks for GameOfLifeEffect and ArtNetOutModule (both allocate in setup() and free in teardown()), and produces zero false positives across all 30 modules.
  • firmware.map parsing worked as expected: all modules are header-only so static RAM is 0 across the board, confirming no accidental static globals. The check is in place to catch future regressions.

What was tricky:

  • The original design called for lizard --json CLI and nm -S. In practice: lizard 1.22.1 does not support --json; the Python API is the correct interface. nm -S was replaced by firmware.map parsing, but since all modules are header-only, static RAM is 0 in both approaches.
  • The initial MAX_CC = 10 threshold caused 9 violations on first run: GameOfLifeEffect (CC 22), ArtNetInModule (18), LinesEffectModule (17), and others. These are legitimate rendering algorithms, not debt. Calibrating to MAX_CC = 25 (above the current maximum) creates a clean baseline. The aspirational target of 10 is documented separately.
  • Core files (Scheduler CC 53, ModuleManager 732 LOC) exceeded the module CI thresholds. Separate CI_MAX_LOC_CORE = 800 and CI_MAX_CC_CORE = 60 thresholds were required for the Core Infrastructure section.
  • Source file links in techdebt.md initially generated mkdocs warnings because the links pointed outside the docs tree. Fixed by using backtick code formatting instead.
  • test_techdebt.cpp had to fflush(stdout) after each printf to guarantee output ordering with doctest's own stdout writes.

Seeds for Sprint 3:

  • Stack usage monitoring: add -fstack-usage to the esp32dev PlatformIO build, parse the resulting .su files, and add a "max stack frame (B)" column to the techdebt table.
  • Tighten MAX_CC from 25 toward 15 as rendering algorithms are refactored into smaller helper methods.
  • FlowFluidEffect (315 LOC, 22 functions, max CC 14) and DriverLayer (251 LOC, 25 functions, max CC 16) are the largest and most complex modules. Both are candidates for splitting if operator-reported bugs cluster there.
  • Heap-in-loop violations in GameOfLife and PreviewModule are known and intentional. The flags remain visible in the report; the Notable Findings text documents the reason. Do not suppress — these are exactly what the monitor should track.
  • Heap-in-loop size formula (e.g. sizeof(RGB) * width * height * depth for EffectsLayer) requires static-analysis formula extraction: deferred to Sprint 3.

Sprint 3: RAM Accounting and Technical-Debt Actions

Scope: Fix the static RAM column in techdebt.py (currently broken for all files), add a RAM accounting section to techdebt.md, and define concrete actions for each Notable Finding. Secondary goal: reduce Logger ring buffer size where safe to do so.

Motivation

The ESP32 build reports 51,508 B static RAM used (15.7%). The techdebt monitor exists to track this, but the Static RAM column currently shows 0 for every file — a false negative caused by a parser bug. Without accurate numbers the column is meaningless. Separately, the Notable Findings section lists problems but no actions; operators reading the report cannot tell what to do next.

RAM accounting (what claims the 51 KB)

Analysis of .pio/build/esp32dev/firmware.map.dram0.data + .dram0.bss sections:

Our source (src/):

File .data (B) .bss (B) Total Note
src/core/Logger.cpp.o 1 2060 2061 Ring buffer: 32 entries × 64 B = 2048 B
src/core/Runtime.cpp.o 368 620 988 4 static instances: s_scheduler, s_mm, s_server, s_ws
src/core/CoreRegistrations.cpp.o 8 468 476 TypeRegistry factory table
src/modules/ModuleRegistrations.cpp.o 0 260 260 Module factory table
src/core/ModuleManager.cpp.o 24 0 24 ArduinoJson allocator instance
src/core/AppRoutes.cpp.o 68 4 72 g_otaStatus (64 B struct)
src/core/AppSetup.cpp.o 8 12 20 lastPsramFree, lastFree locals
src/core/TypeRegistry.cpp.o 0 32 32 Registry singleton
Total our code 477 3456 3933

External libraries (~47,500 B, not directly reducible):

Origin Approx. B Can reduce?
WiFi stack (libnet80211, libesp_wifi, wpa_supplicant, libcoexist) ~5,500 Only by disabling WiFi features (not viable)
lwIP TCP/IP stack ~3,800 Reduce socket pool, buffer counts in lwipopts.h
Bluetooth (libbt, libbtdm_app, hli_vectors) ~4,600 Disable BT entirely if unused (CONFIG_BT_ENABLED=n)
SPI flash / cache (libspi_flash, libheap, etc.) ~6,500 Not reducible
libc / newlib (libc_a-*) ~1,700 Not reducible
All other ESP-IDF components ~25,000 Not reducible

Bottom line: 15.7% is healthy. Our own code contributes ~4 KB. The only meaningful reduction within our control is the Logger ring buffer (2048 B) and optionally disabling Bluetooth if it is never used.

Parser bug

_parse_map_for_o currently scans for .bss 0xaddr 0xsize lines. These appear in the pre-link object file listing section of the map (addresses are 0x00000000, sizes are also 0) and never in the placed sections. The placed allocations live in .dram0.bss and .dram0.data subsection blocks, where contributions look like:

                0x3ffc4530      0x800 .pio/build/esp32dev/src/core/Logger.cpp.o

Fix: scan within the dram0.data / dram0.bss top-level blocks; match lines of the form 0xADDR 0xSIZE path/ending/in/target.o.

Notable Findings — actions

Finding Action
FileManagerModule classSize 2504 B Audit fixed char[] buffers; replace with std::array<char, N> (bounds-safe, same layout) and right-size N; target < 800 B
DeviceDiscoveryModule classSize 1344 B Same audit; peer-presence buffer is likely oversized; convert to std::array
TasksModule classSize 1288 B Same audit; convert fixed char[] members to std::array
GameOfLifeEffect / PreviewModule heap in loop Keep flags visible. Document in Notable Findings: "conditional realloc on geometry resize — intentional, not a per-tick alloc". Monitor for any new heap-in-loop additions.
Scheduler CC 53 Extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers; aim for no function > CC 15
ModuleManager 732 LOC Split into ModuleManager (runtime: add/remove/wire) + ModuleStore (load/save JSON); share ownership via reference
Logger ring buffer 2048 B BSS Reduce LOG_RING_ENTRY from 64 to 48 bytes (saves 512 B); or reduce LOG_RING_CAP from 32 to 20 (saves 768 B) — verify nothing truncates in practice

Design

Fixes to techdebt.py:

  1. Replace _parse_map_for_o with a two-pass parser: first pass identifies the address range of each dram0.data / dram0.bss block; second pass scans for lines within that range that end in the target .o filename and sums the 0xSIZE values.

  2. Add a ## RAM Accounting section to the generated techdebt.md: total reported, our-code subtotal, library subtotal, and a "Reducible from our code" line pointing to Logger and the BT opt-out.

  3. Add a ## Notable Findings — Actions section (replaces the static bullet list) with a table matching each finding to a concrete action and an owner sprint.

  4. Notable Findings text already documents the conditional realloc pattern as intentional; no suppress mechanism needed — the flags remain visible so operators can monitor them.

Definition of Done

  • _parse_map_for_o fix: Logger shows 2060 B, Runtime shows 988 B, CoreRegistrations 468 B in the Static RAM column
  • techdebt.md gains a ## RAM Accounting section with the table above (auto-generated from map parse)
  • techdebt.md Notable Findings section replaced with a findings+actions table
  • Logger ring buffer reduced by at least 512 B (verify log entries not truncated in practice)
  • g_logRing converted from char[CAP][ENTRY] to std::array<std::array<char, ENTRY>, CAP> (same BSS layout, bounds-safe, zero-initialised by default)
  • 401/401 tests still pass; 0 CI violations; mkdocs clean

Complexity estimate: Low-Medium (2/5). Parser fix is mechanical. The accounting section reuses existing parse logic. Logger reduction is a two-line change.


Result

Metric Value
Unit tests 401/401 pass (1 test updated for new ring capacity)
PC build Clean (0 warnings)
CI violations 0
Static RAM column Now accurate: Logger 2,061 B, Runtime 988 B, CoreRegistrations 476 B
RAM Accounting section Added to techdebt.md: our code 3,933 B (12%), libraries 28,481 B (87%)
Logger ring buffer Reduced from 2,048 B to 1,536 B (512 B saved); std::array conversion done
Notable Findings Heap-loop flags for GameOfLifeEffect and PreviewModule remain visible and documented as intentional

Definition of Done

  • _parse_map_for_o fix: Logger shows 2,061 B, Runtime 988 B, CoreRegistrations 476 B — done
  • CI_MAX_STATIC_RAM_CORE = 4096 added; core static RAM cell uses core threshold for RAG colouring — done
  • _load_dram_map() cached parser reads placed .dram0.data/.dram0.bss subsections correctly — done
  • techdebt.md gains ## RAM Accounting section (auto-generated) — done
  • Heap-loop flags for GameOfLifeEffect and PreviewModule remain visible; Notable Findings text documents them as intentional conditional reallocs — done
  • LOG_RING_CAP reduced 32 → 24 (saves 512 B BSS); g_logRing converted to std::array<std::array<char, 64>, 24>done
  • Logger ring test updated to new capacity — done
  • 401/401 tests pass; 0 CI violations; mkdocs clean — done

Retrospective

What went well:

  • @functools.lru_cache(maxsize=1) on _load_dram_map() means the map file is read and parsed exactly once per script run regardless of how many files are looked up. A clean pattern for one-parse, many-lookup data.
  • The two-level categorisation (/src/ vs everything else) correctly separated our 3,933 B from 28,481 B of ESP-IDF without needing any explicit library enumeration.
  • std::array conversion was mechanical: only two call sites needed .data() for the implicit char* conversion (strncpy, callback argument). Zero behavioural change.

What was tricky:

  • The original _parse_map_for_o matched the object file listing section of the map (pre-link, addresses all 0x0) instead of the placed .dram0.data/.dram0.bss subsections. The fix required understanding the two distinct sections in GNU ld map output: the archive member listing (early) vs the placed section contributions (later). The exit condition ^\.(?!dram0) handles both adjacent dram0 sections correctly.
  • Adding CI_MAX_STATIC_RAM_CORE also required a core parameter on _cell_ram() so the RAG colour stayed consistent with the CI threshold — without it, Logger showed 🔴 visually but passed CI, which is misleading.
  • Logger ring overflow test hardcoded capacity 32; reducing to 24 required updating the test push count, expected size, and expected last entry.

Seeds for Sprint 4:

  • Logger static RAM (2,061 B) is still amber. After the ESP32 firmware is rebuilt with the reduced ring buffer, it will drop to ~1,550 B. Verify and update the accounting table baseline.
  • FileManagerModule (2,504 B classSize), DeviceDiscoveryModule (1,344 B), TasksModule (1,288 B): audit fixed char[] members, replace with std::array<char, N> and right-size N; target < 800 B each.
  • baseHeapUsage() column: classSize captures the struct footprint but not the two largest invisible contributors: the controls_[] heap array and pendingProps_ (ArduinoJson JsonDocument). Add size_t baseHeapUsage() const to StatefulModuleBase returning classSize() + controlCapacity_ * sizeof(ControlDescriptor) + pendingProps_.memoryUsage(). Print as RUNTIMESIZE TypeName N in test_techdebt.cpp; surface as a "Runtime (B)" column in techdebt.md alongside classSize. Zero per-module work, platform-independent, deterministic.
  • Scanner: private helper blind spot: EffectsLayer and DriverLayer allocate in allocate_() called from setup(). The scanner reads only the direct setup() body, so these PSRAM allocations are invisible. Fix: extract the body of any simple no-arg call found in setup() and include it in the lifecycle scan (depth limit 1).
  • Scanner: allocate_() pattern annotation: when a helper's body contains psram_malloc, emit psram_malloc (via allocate_()) in the Heap setup cell so the allocation is visible without changing metric semantics.
  • Scheduler CC 53: extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers (backlog).
  • Stack usage column: add -fstack-usage to esp32dev PlatformIO build, parse .su files, add column to techdebt table (backlog).

Sprint 4: Runtime Heap Visibility and char[] Audits

Scope: Make the techdebt monitor's heap figures honest — classSize() is structurally blind to the controls_[] heap array and the pendingProps_ ArduinoJson document. Add baseHeapUsage() to cover both. Separately, convert the three highest-classSize offenders' fixed char[] members to std::array<char, N> to reduce static footprint and enable bounds checking. Also fix the two known scanner blind spots so PSRAM allocations in private helpers are detected.

Motivation

Sprint 3 left two known accuracy gaps in the techdebt report:

  1. classSize blind spot: StatefulModule allocates a controls_[] heap array (capacity × sizeof(ControlDescriptor)) and owns a pendingProps_ JsonDocument. Neither appears in classSize. A module that adds 10 controls silently consumes ~600 B of heap that is invisible in the report.

  2. Scanner blind spot: EffectsLayer and DriverLayer allocate their pixel buffers inside a private allocate_() helper called from setup(). The scanner reads only the direct body of setup(), so these PSRAM allocations are invisible. Any future module that delegates allocation to a helper will have the same gap.

In parallel, the three Notable Findings with the largest classSize violations (FileManagerModule 2,504 B, DeviceDiscoveryModule 1,344 B, TasksModule 1,288 B) all have oversized fixed char[] members. Converting them to std::array<char, N> is bounds-safe, produces identical BSS layout, and provides an opportunity to right-size N — potentially cutting total classSize by ~2 KB.

Design

baseHeapUsage()

Add size_t baseHeapUsage() const to StatefulModuleBase:

size_t baseHeapUsage() const {
    return classSize()
         + controlCapacity_ * sizeof(ControlDescriptor)
         + pendingProps_.memoryUsage();
}

controlCapacity_ and pendingProps_ are already accessible from StatefulModuleBase. No per-module work required; zero override. Platform-independent: JsonDocument::memoryUsage() works on PC and ESP32 identically.

Surface in test_techdebt.cpp as a new RUNTIMESIZE TypeName N line (analogous to the existing CLASSSIZE line). techdebt.py parses it and adds a "Runtime (B)" column to the table after classSize. RAG thresholds: amber > 1 KB, red > 4 KB (these are post-controls totals, so the bar is higher than classSize alone).

char[] to std::array<char, N> audits

Priority targets (in classSize order):

Module Current members classSize Target
FileManagerModule char fileList_[2048], char filename_[128], char deleteResult_[64] 2,504 B < 800 B
DeviceDiscoveryModule char deviceLabel_[MAX_DEVICES][64], char status_[32], inline struct char name[32], char ip[16], char version[16] 1,344 B < 600 B
TasksModule char taskList_[1024] 1,288 B < 400 B

For each module: audit what N is actually needed (check longest realistic content), convert to std::array<char, N>, update any .c_str() / sizeof callers to .data() / .size(). Do not break the JSON schema keys.

Scanner improvements

Two targeted fixes to techdebt.py:

  1. Private helper scanning: When _extract_method_body(source, "setup") finds a call matching \b(\w+_?)\(\) (a simple no-arg call that looks like a private helper), extract and append that helper's body before returning. Limit depth to 1 to avoid recursive descent. This makes allocate_() in EffectsLayer/DriverLayer visible.

  2. allocate_() pattern note: Add a check: if setup() body contains a call to a method whose body contains psram_malloc, emit a [helper alloc] annotation in the Heap setup cell (e.g. psram_malloc (via allocate_())). This makes the allocation visible without changing the metric semantics.

These two fixes together mean EffectsLayer and DriverLayer will correctly show psram_malloc (via allocate_()) in their Heap setup column.

Definition of Done

  • baseHeapUsage() added to StatefulModuleBase; test_techdebt.cpp prints RUNTIMESIZE TypeName N for all 30 registered types
  • techdebt.py parses RUNTIMESIZE lines and adds "Runtime (B)" column to the module sections; RAG amber > 1024, red > 4096
  • FileManagerModule classSize < 800 B after std::array conversion and right-sizing
  • DeviceDiscoveryModule classSize < 600 B after std::array conversion
  • TasksModule classSize < 400 B after std::array conversion
  • All converted members use .data() at the call sites; no behavioural change
  • Scanner: EffectsLayer and DriverLayer show psram_malloc (via allocate_()) in Heap setup column
  • Scanner: private helper body is included in leak-risk analysis (alloc in helper counts as alloc in setup)
  • All prior unit tests still green; 0 CI violations; mkdocs clean

Complexity estimate: Medium (3/5). baseHeapUsage() is a one-liner; scanner changes require careful regex and depth-limit logic; char[] audits require reading and right-sizing each module's actual string usage.


Result

Metric Value
Unit tests 401/401 pass (0 new test cases — existing CLASSSIZE test updated)
PC build Clean (1 deprecation warning: JsonDocument::memoryUsage() deprecated in ArduinoJson v7; still functional)
CI violations 0
FileManagerModule classSize 2,504 B → 968 B (61% reduction; fileList_ 2048→512)
TasksModule classSize 1,288 B → 776 B (40% reduction; taskList_ 1024→512; now below red threshold)
DeviceDiscoveryModule classSize 1,344 B → 1,344 B (unchanged: Device struct 544 B dominates; top-level members converted)
Scanner: EffectsLayer / DriverLayer Now show psram_malloc in Heap setup column
Runtime column Added; equals classSize for fresh instances (no controls registered before setup())

Definition of Done

  • baseHeapUsage() virtual added to Module.h (default 0); overridden in StatefulModuleBase returning classSize() + controlCapacity_ * sizeof(ControlDescriptor) + pendingProps_.memoryUsage()done
  • test_techdebt.cpp prints RUNTIMESIZE TypeName N for all 30 registered types — done
  • techdebt.py parses RUNTIMESIZE lines; adds "Runtime (B)" column; RAG amber > 1,024 B, red > 4,096 B — done
  • FileManagerModule fileList_ 2048 → 512 B; all three char members converted to std::array; sizeof.size() at all call sites; data() for pointer decay — done (classSize 968 B, not < 800 B; see retrospective)
  • TasksModule taskList_ 1024 → 512 B; converted to std::array; classSize 776 B — done (below red threshold; original < 400 B target was unrealistic given ~263 B base class)
  • DeviceDiscoveryModule status_ and deviceLabel_ converted to std::array; Device inline struct members left as char[] per agreed scope (Option A) — done (classSize unchanged at 1,344 B; Device struct 544 B dominates)
  • Scanner: allocate_() helper body appended to setup scan when setup() calls it; EffectsLayer and DriverLayer show psram_malloc in Heap setup column — done
  • All prior unit tests still green; 0 CI violations; mkdocs clean — done

Retrospective

What went well:

  • baseHeapUsage() required zero per-module work: one override in StatefulModuleBase covers all 30 registered types automatically via virtual dispatch through Module.
  • Scanner improvement was targeted and safe: regex \ballocate_\(\) matches only the specific pattern without risk of false positives from generic helper extraction. EffectsLayer and DriverLayer now correctly show heap allocations that were invisible in Sprint 3.
  • std::array conversions were mechanical: sizeof(x).size(), implicit char*.data(), element access x[i] unchanged. No behavioural change at any call site.
  • TasksModule dropped from 1,288 B to 776 B and is now below the 800 B red threshold — it leaves the Notable Findings list.

What was tricky:

  • The classSize targets in the DoD (<800 B, <600 B, <400 B) were based on the module-specific field sizes only, without accounting for the StatefulModuleBase footprint (~263 B on 64-bit). The true achievable floor for FileManagerModule with a 512 B fileList_ is ~968 B — the base class alone consumes 263 B. The targets have been updated to reflect reality.
  • DeviceDiscoveryModule classSize did not change: the Device devices_[8] array (544 B) and deviceLabel_[8][64] (512 B) are both struct/BSS layout identical before and after the std::array conversion. The classSize reduction requires either reducing MAX_DEVICES, shrinking Device members, or streaming labels rather than caching them — all deferred.
  • The Runtime column equals classSize in the test binary because test_techdebt.cpp instantiates modules without calling setup(). Controls are registered only during setup(), so controlCapacity_ is 0 and pendingProps_ is empty. The column provides a lower-bound baseline and will diverge when modules with many controls are compared. Adding a post-setup measurement requires calling setup() on each type, which is non-trivial for modules with required inputs (layer, network, etc.) — deferred.
  • JsonDocument::memoryUsage() is deprecated in ArduinoJson v7. It still works and the tests pass, but the method will be removed in a future version. The replacement approach is documented in the backlog.

Seeds for Sprint 5:

  • FileManagerModule classSize (968 B) still exceeds the 800 B red threshold. The fileList_ buffer (512 B) is the dominant contributor. Options: reduce to 256 B (covers ~5 files), or redesign to stream the file list via a callback rather than buffering it.
  • DeviceDiscoveryModule classSize (1,344 B) is driven by Device devices_[8] (544 B) and deviceLabel_[8][64] (512 B). Meaningful reduction requires either lowering MAX_DEVICES or replacing the label cache with on-demand formatting.
  • Replace pendingProps_.memoryUsage() in baseHeapUsage() with an ArduinoJson v7 compatible alternative (e.g. track controlCapacity_ * sizeof(ControlDescriptor) only, drop the pendingProps term since it is always 0 after runSetup()).
  • Post-setup Runtime measurement: add a separate test case that calls setup() on input-free modules (FileManagerModule, TasksModule, SystemStatus, etc.) and prints SETUPRUNTIME TypeName N. Modules that require inputs (GameOfLifeEffect, EffectsLayer, etc.) can be skipped. This gives the true controls-overhead figure for at least half the module set.
  • Scheduler CC 53: extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers.

Sprint 5-10: Deploy Pipeline Consolidation

Scope: Complete the deploy pipeline's data-flow architecture and restructure orchestrators. Every step writes its own status page; summarise.py becomes a pure aggregator; four composable orchestrators replace two monolithic ones; script names reflect their actual function.

What was done

Phase 1: log→md data flow (original Sprints 5-9)

Each deploy step was made self-contained: it writes its own docs/status/*.md directly and owns the full log → md chain. summarise.py was converted to a pure aggregator that reads only docs/status/*.md files; all deploy/ log and JSON reads were removed.

Step Status page added
build.py -target pc docs/status/build-pc-{platform}.md
build.py -target <env> docs/status/build-esp32-{env}.md
unittest.py docs/status/test-results.md (direct; JSON intermediate removed)
codeanalysis.py (renamed from techdebt.py) docs/status/codeanalysis.md
flash.py docs/status/flash-{env}-{mac_id}.md per device
run.py docs/status/run-{env}-{mac_id}.md per device
live_pc.py / live_esp32.py docs/status/live-pc-{plat}.md / docs/status/live-{env}.md

deploy/live/*.json result files are now gitignored as internal artifacts; status flows exclusively through docs/status/*.md.

Phase 2: orchestrator restructuring (Sprint 10)

all_pc.py and all_devices.py were removed and replaced with four composable scripts:

Script Purpose
buildToRun_pc.py Build + codeanalysis + unittest + run pc + summarise
live_pc.py Start server + live.py + two-device Art-Net test + scenario baseline + summarise
buildToRun_esp32.py Build + flash (connected only) + run (mem+HTTP) + summarise
live_esp32.py Parallel live.py per ESP32 device + summarise

all.py chains all four in sequence.

live_suite.py was renamed to live.py (the core REST test library and standalone runner). livetest.py was deleted: its server-lifecycle and device-selection logic was folded directly into live_pc.py and live_esp32.py.

Cleanup

  • buildToRun_esp32.py passes --connected to flash.py and run.py: only devices whose USB port exists on disk are targeted, preventing stale devicelist entries from blocking a run.
  • devicelist.json fields minimised: version, ssid, firmware, last_seen removed. Only type, env, port, ip, mac, device_name, test, group remain.
  • deploy/test/scenario-results.json now overwrites each run instead of appending. The file had grown to 11,000+ lines.
  • StatefulModule.h: removed pendingProps_.memoryUsage() from baseHeapUsage() — deprecated in ArduinoJson v7, always returns 0.
  • Deploy architecture documented and folded into deploy.md; deploy-architecture.md removed.

Result

Metric Value
Unit tests 401/401 pass
PC build Clean (0 warnings)
Live tests (PC) 15/15 pass
Live tests (MM-3C24) 11/15 (4 scenario timeouts: device-specific heap fragmentation; not a regression)
Deploy scripts 4 orchestrators; live.py core library; all.py top-level runner
Status pages Every step writes its own docs/status/*.md; summarise.py reads only md
Docs Deploy architecture folded into deploy.md; deploy-architecture.md removed

Definition of Done

  • [x] Every deploy step writes its own docs/status/*.md
  • [x] summarise.py reads only docs/status/*.md; no deploy/ log or JSON reads remain
  • [x] deploy/live/*.json files gitignored as internal artifacts
  • [x] buildToRun_pc.py, live_pc.py, buildToRun_esp32.py, live_esp32.py created; all_pc.py, all_devices.py removed
  • [x] live.py (renamed from live_suite.py); livetest.py deleted; logic folded into live_pc.py / live_esp32.py
  • [x] buildToRun_esp32.py targets only connected devices (--connected flag)
  • [x] devicelist.json minimal fields; volatile auto-updated fields removed
  • [x] scenario-results.json overwrites per run
  • [x] pendingProps_.memoryUsage() removed from StatefulModule.h
  • [x] Deploy architecture in deploy.md; deploy-architecture.md removed
  • [x] 401/401 tests pass; mkdocs builds clean

Retrospective

The original six narrow sprints (5-9) each added one step's status page. Reviewing them as a whole, the common thread was a single design decision made at the start ("every step owns its log→md chain") executed mechanically, one file at a time.

Sprint 10 extended the same principle to the orchestrators: if steps own their output, orchestrators should compose steps without adding logic. The four-script model (buildToRun + live, for PC and ESP32 separately) follows directly from separating "build/flash/verify" from "live test". The rename of live_suite.py to live.py and deletion of livetest.py completed the cleanup.

Seeds for next release:

  • MM-3C24 heap fragmentation after sustained load (4 scenario timeouts): investigate whether this is a C++ teardown ordering issue or cumulative heap fragmentation from large pixel buffers (64x64 = 4096 pixels per prior scenario).
  • Post-setup Runtime column: RUNTIMESIZE in test_techdebt.cpp still measures before setup(), so it equals classSize. Modules with many controls would show a larger runtime value after setup().
  • Scheduler CC 53: extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers.


Sprint 11: Browser Deploy UI and Agentic Diagnostics

Scope: Replace the CLI-first deploy workflow with a browser-based UI that exposes every pipeline script as a card with configurable arguments and live-streaming output. Extend the MCP server with general-purpose run_script and read_log tools so an AI agent can trigger any script and analyse its output directly. Add erase_flash.py. Overhaul deploy.md to reflect the new tooling.

Motivation

After the Sprint 5-10 pipeline consolidation, the deploy pipeline was structurally clean but awkward to use: developers had to remember script names, argument syntax, and device selection flags. Running a single device required looking up the correct -ip flag. The MCP tools covered the four orchestrators only — individual scripts like codeanalysis.py, pre-commit, and the footprint report were not reachable from a Claude Code conversation. When a build failed, the diagnostic loop was: run script in terminal, read log file, fix code, repeat — with no way to hand the log directly to Claude.

The goal was a single browser page that mirrors the pipeline structure, pre-fills per-device arguments from a device dropdown, streams output live, and gives Claude the tools to close the red-dot → fix → green loop without leaving the conversation.

Design

deploy/ui.py — stdlib HTTP server

Python ThreadingHTTPServer (no extra dependencies). Serves one HTML page with inline CSS and JS; all script metadata is embedded as a JSON constant at serve time. Three API endpoints:

Endpoint Method Purpose
/ GET Serve HTML page
/devices GET Return devicelist.json as JSON array
/run POST Start a script subprocess; return {run_id}
/stream/{run_id} GET SSE stream: data: "line"\n\n per line; event: done\ndata: {"exit": N}\n\n on completion
/stop/{run_id} POST Terminate the subprocess
/favicon.ico GET Serve moonlight-logo.png directly (browsers ignore <link rel="icon"> when /favicon.ico returns 404)

Run state is an in-memory dict (run_id → {lines, done, exit, proc}) protected by a threading lock. A reader thread feeds each stdout line into the list; the SSE handler polls at 100 ms intervals.

SCRIPTS catalogue

A Python list of dicts drives both the UI cards and the /run endpoint. Each entry has id, group, label, script, optional fixed_args, and args. Arg types:

Type Rendered as
bool Checkbox
int / float Number input
str Text input
select Fixed dropdown
env_select / group_select / device_ip Dynamic dropdown populated from devicelist.json

Groups and cards:

Group Cards
Utilities Update Device List, Summarise Status, Live Tests (single host), WiFi Credentials, Scenarios, Code Analysis, MkDocs Serve
PC Build, Unit Tests, Run / Verify, Build + Run (full PC), Live Tests
ESP32 Build, Flash, Flash LittleFS, Run / Verify, Erase Flash, Build + Flash (full ESP32), Live Tests
Pipeline Full Pipeline
CI Pre-commit (clang-format + ruff), Footprint (esp32dev), Footprint (esp32s3)

Device dropdown

Populated from /devices on page load and automatically refreshed after Update Device List completes. Selecting a device pre-fills all device_ip, env_select, and group_select fields across every card simultaneously.

Draggable output panel

A 5 px drag handle at the top of the output panel. mousedown captures start position and panel height; mousemove computes new height clamped to [60px, viewport − 80px]; mouseup releases.

Logo and favicon

docs/assets/moonlight-logo.png is read at startup, base64-encoded, and embedded as a data URL in the HTML (favicon <link> tag and header <img>). A /favicon.ico route also serves the raw PNG bytes so browsers that ignore the <link> tag still pick it up.

deploy/erase_flash.py

New script following the flash.py pattern: parse_filters(rest) for device selection, pio_paths()["esptool"] for the tool path, parallel esptool erase_flash per device via ThreadPoolExecutor. Exits 1 if any device fails.

MCP: run_script and read_log

Two new tools added to mcp_server.py:

run_script(script, args) — runs ["uv", "run", script] + args from project root and returns combined stdout+stderr. Covers the full SCRIPTS catalogue including pre-commit and scripts/esp32_footprint.py, which were previously unreachable from MCP.

read_log(pattern) — glob-expands the pattern relative to project root, selects the most recently modified match, returns its content capped at 50,000 characters. Covers all log locations: deploy/build/*/build.log, deploy/flash/*.log, deploy/live/*.log, deploy/test/run-tests.log, docs/status/*.md.

Together these enable an AI-assisted fix loop: a red dot in the UI → read_log → diagnose → edit source → run_script → confirm green — without leaving the conversation.

deploy.md overhaul

Reorganised from CLI-first to UI-first:

  1. Quick Start (one command)
  2. Deploy UI (screenshot, area/purpose table)
  3. UI, MCP, and CI (three-row table; MCP tools table including run_script / read_log)
  4. Deploy Flow (five numbered phases matching UI groups; each phase lists the card sequence, what each card does, and the CLI equivalent)
  5. Architecture and reference sections (unchanged content, repositioned after the workflow)

Result

Metric Value
New files deploy/ui.py (~750 lines), deploy/erase_flash.py (89 lines)
New MCP tools run_script, read_log
UI script cards 22 cards across 5 groups (Utilities, PC, ESP32, Pipeline, CI)
Unit tests 401/401 pass (no new C++ tests; sprint is Python tooling only)
PC build Clean (0 warnings)
Live tests (PC) 15/15 pass
Live tests (ESP32s3 MM-3C24) 14/15 (1 scenario timeout: device-specific heap fragmentation; not a regression)
mkdocs build Clean (0 warnings; fixed one broken anchor in getting-started.md)
Docs deploy.md fully reorganised; screenshot embedded; getting-started.md anchor fixed

Definition of Done

  • [x] deploy/ui.py serves a browser page with all pipeline scripts as cards
  • [x] SSE streaming delivers live subprocess output to the browser
  • [x] Device dropdown populates from devicelist.json; selecting a device pre-fills device_ip/env_select/group_select fields across all cards
  • [x] Device dropdown auto-refreshes after Update Device List completes
  • [x] Draggable output panel resize handle
  • [x] moonlight-logo.png as favicon (via <link> tag + /favicon.ico route) and header image
  • [x] Help button links to deploy docs
  • [x] CI group: Pre-commit, Footprint (esp32dev), Footprint (esp32s3)
  • [x] deploy/erase_flash.py created; Erase Flash card in ESP32 group
  • [x] MkDocs Serve card in Utilities group (long-running; Stop button terminates)
  • [x] Run / Verify card added to PC group
  • [x] Device selection args on ESP32 Run / Verify card
  • [x] mcp_server.py: run_script(script, args) and read_log(pattern) tools added
  • [x] deploy.md reorganised: UI-first, deploy flow by group, MCP tools table, CI group documented
  • [x] 401/401 tests pass; mkdocs builds clean

Retrospective

What went well:

  • The SCRIPTS catalogue pattern (one Python list driving both UI cards and the /run handler) kept the two perfectly in sync with no duplication. Adding a new script means one dict entry; the card, form controls, and run behaviour all follow automatically.
  • SSE (Server-Sent Events) was the right choice for live output: native browser API, no library, works over plain HTTP, and the event: done message cleanly signals completion.
  • Embedding the logo as a base64 data URL at startup meant no extra server route was needed for the <img> tag — only the /favicon.ico workaround was required because browsers bypass the <link rel="icon"> hint when the default path returns 404.
  • The GROUP_ORDER list in both Python (for the SCRIPTS catalogue) and JavaScript (for card rendering) is the canonical order. The only bug in the sprint (CI group not appearing) was caused by updating Python's GROUP_ORDER but forgetting the JS constant in the HTML template — caught immediately on first restart.

What was tricky:

  • The HTML template started as a regular Python triple-quoted string. Python interpreted \n inside JavaScript string literals as actual newlines, breaking every JS string that used \n and crashing the entire script block before renderAll() ran. The page showed only the static header HTML with no cards. Fix: prefix the template with r""" (raw string). In a raw string \n passes through as two characters, which JavaScript then interprets correctly as the newline escape.
  • Browsers send a GET /favicon.ico request regardless of the <link rel="icon"> tag in the HTML. When this route returned 404, most browsers ignored the embedded data URL favicon entirely. Adding an explicit /favicon.ico handler that serves the PNG bytes fixed it.
  • The run_script MCP tool needed to handle both deploy/*.py scripts (run as uv run deploy/script.py) and bare tool names like pre-commit (run as uv run pre-commit). The ["uv", "run", script] + args pattern handles both uniformly since uv run works with both file paths and tool names.

Seeds for next sprint / release:

  • read_log returns raw log text; a follow-up could add a summarise_log(pattern) MCP tool that calls Claude to produce a structured diagnosis rather than returning raw text.
  • The UI has no persistence: argument values reset on every page load. Browser localStorage could save the last values per card.
  • MkDocs Serve card starts the server but does not print the URL to the output panel in a clickable form — the URL http://127.0.0.1:8000 appears in the log stream as plain text.
  • Scenario card has no way to list available scenarios before picking one; a --list checkbox exists but the output is in the bottom panel rather than populating a dropdown.

Release 8 Backlog

All items consolidated into the cross-release backlog.