Release 8: Dynamic Controls and UI Adaptability¶
Theme: Release 8 adds dynamic control schemas (the ability to rebuild a module's control set at runtime based on the current value of other controls) so the UI can show only the parameters that are relevant for the active configuration. Later sprints extend the release into deploy-pipeline health: a code analysis monitor, runtime heap visibility, and a structured overview of the full deploy architecture.
Release Overview¶
What was delivered in Release 7 (build on this)¶
| Strength | Notes |
|---|---|
| OTA firmware update | FirmwareUpdateModule: file upload + GitHub releases tab; POST /api/firmware |
| CI release pipeline | Tagged releases + nightly pre-release with firmware assets on GitHub |
| Windows support | Native .exe build; projectMM-pc-windows.zip in CI artifacts |
| Scenario baselines | Hardware --update-baseline run; "extends" inheritance; wired into all.py |
| Static RAM hardening | Per-device LOG_RING_SIZE; WiFi buffer tuning; dual check_alloc guard |
| Log frontend panel | WS push of ring buffer entries; collapsible log UI |
What Release 8 addresses¶
| Problem | Sprint |
|---|---|
Control schema is fixed at setup() time; irrelevant parameters always visible regardless of selected type |
Sprint 1 (Dynamic controls: clearControls(), rebuildControls(), early WS flush), complete |
| Static RAM column in techdebt monitor always shows 0 (parser bug); no accounting of what consumes the 51 KB ESP32 RAM; Notable Findings have no action owners | Sprint 3 (RAM accounting, parser fix, actions table), complete |
classSize() misses runtime heap (controls_[] array, pendingProps_ doc); large char[] struct members inflate classSize; scanner blind to allocations in private helpers |
Sprint 4 (baseHeapUsage, char[] audits, scanner improvements), complete |
Deploy pipeline grew to 17+ scripts with no architecture overview; steps produced no status pages; techdebt.py name misleading; orchestrators monolithic |
Sprints 5-10 (full log→md pipeline, orchestrator restructuring, naming cleanup), complete |
| No interactive way to trigger individual deploy scripts; MCP tools covered only orchestrators; no AI-assisted log analysis; deploy.md was CLI-first with no visual overview | Sprint 11 (browser deploy UI, run_script/read_log MCP tools, deploy.md overhaul), complete |
Sprints¶
| Sprint | Goal |
|---|---|
| Sprint 1 | Dynamic controls: clearControls(), rebuildControls() virtual, early WS schema flush |
| Sprint 2 | Technical-debt monitor: per-module metrics (LOC, function count, complexity, static RAM, heap/blocking violations) as a CI script |
| Sprint 3 | RAM accounting balance, fix static RAM parser, Notable Findings actions, Logger ring buffer reduction |
| Sprint 4 | baseHeapUsage() column, char[] to std::array audits, scanner improvements for private helpers |
| Sprint 5-10 | Deploy pipeline consolidation: full log→md data flow, orchestrator restructuring, naming cleanup — complete |
| Sprint 11 | Browser deploy UI, run_script/read_log MCP tools, erase_flash.py, deploy.md overhaul — complete |
Sprint 1: Dynamic Controls¶
Scope: Allow a module to rebuild its control schema at runtime in response to a control value change. The primary use case: a
typeselector control switches between effect variants, and only the parameters relevant to the active type are shown. The control set is rebuilt without a full module restart.
Motivation¶
Today, addControl() is called once in setup() and the schema is fixed for the lifetime of the module. A module that supports multiple effect types must expose all parameters for all types simultaneously, cluttering the UI and confusing operators. The fix: make the schema a function of the control values, rebuilt on demand.
Design¶
clearControls(system = false)
Added to StatefulModule. Iterates the registered controls_[] descriptors and removes all entries that are not marked system. Before removing each descriptor, writes the current value of the backing variable back into the pendingProps_ stash (keyed by control name). This means a subsequent addControl(var, key, ...) call for the same key restores the last operator-set value automatically — values are preserved across rebuilds even when the control temporarily disappears.
System controls (enabled) are marked at registration time with a system flag in ControlDescriptor. clearControls() skips them unconditionally.
rebuildControls() virtual
New virtual method on StatefulModule; default implementation is a no-op (all existing modules continue to work unchanged). Modules that want dynamic controls override it:
void rebuildControls() override {
clearControls();
addControl(type_, "type", "select", {"Ripples", "Lines", "Sine"});
if (type_ == EffectType::Ripples) {
addControl(speed_, "speed", "slider", 0.1f, 10.0f);
addControl(radius_, "radius", "slider", 1.0f, 50.0f);
} else if (type_ == EffectType::Lines) {
addControl(speed_, "speed", "slider", 0.1f, 10.0f);
addControl(count_, "count", "slider", 1, 20);
}
}
void setup() override {
rebuildControls(); // replaces direct addControl() calls
}
void onUpdate(const char* key) override {
if (strcmp(key, "type") == 0) rebuildControls();
}
Modules that do not need dynamic controls keep calling addControl() directly in setup() — no migration required.
Early WS schema flush
After rebuildControls() finishes, the UI must reflect the new schema immediately rather than waiting up to 1 s for the next periodic push. Implementation: clearControls() sets a schemaDirty_ flag on StatefulModule. The main loop checks schemaDirty_ across all modules and, if set, sends a {"t":"schema","modules":[...]} WS push using getModulesJson() (full schema including control types, options, min/max, and current values) and clears the flag. On a clean tick, the periodic 200 ms push uses getStateJson() (flat key/value state) as before. Natural debounce: a burst of rebuildControls() calls within one tick produces exactly one push.
A dedicated {"t":"schema"} message type is required because getStateJson() sends only flat {key:value} pairs; handleStateUpdate() in the frontend updates existing DOM elements but cannot add or remove controls. When rebuildControls() changes the control set, the frontend must call render() to rebuild the card from scratch.
State persistence interaction
saveState() and loadState() iterate the registered descriptors. After a rebuild, only the currently registered controls are persisted — parameters for inactive types are not written to the state file. On the next load, pendingProps_ carries any previously saved values; addControl() applies them if the key matches a registered control after rebuildControls() runs. A type control persisted in state is applied before rebuildControls() is called (via the existing addControl stash mechanism), so the correct variant's parameters are registered and restored on first boot.
Sprint 1 Scope Definition of Done¶
ControlDescriptorgainsbool systemfield;StatefulModule::runSetup()sets it when registeringenabledclearControls()removes non-system descriptors; saves current values topendingProps_stash before removalrebuildControls()virtual added toStatefulModule; default is no-op; existing modules compile and behave identicallyschemaDirty_flag set byclearControls(); main loop early-flush path clears it and sends a{"t":"schema","modules":[...]}WS push- Reference implementation: one new module (e.g.
MultiEffectModuleor adapted existing effect) demonstrating type selector + conditional parameters - Unit tests: rebuild preserves values of re-registered controls; rebuild discards values of removed controls; system controls survive
clearControls();schemaDirty_triggers exactly one early flush per rebuild burst - Frontend:
{"t":"schema"}handler added; callsrender(msg.modules)to rebuild all cards from the full schema - All prior unit tests still green
Complexity estimate: Low-Medium (2/5). The stash mechanism already exists; clearControls() is a small loop; the early flush reuses the existing push path. The trickiest part is the state-persistence ordering (type value applied before rebuild runs).
Result¶
| Metric | Value |
|---|---|
| Unit tests | 399/399 pass (8 new tests added) |
| PC build | Clean (0 warnings) |
| ESP32dev build | Clean (0 warnings); BSS 16.3% (53 KB, down from 21.3% / 70 KB after static wsBuf removed) |
| ESP32s3 build | Clean (0 warnings) |
| Live tests (PC) | 15/15 all passing |
| Live tests (MM-70BC) | 15/15 all passing |
| Live tests (MM-C1BC) | 12/15 (hardware capacity limits: 64x64 OOM, fps below 1000 on 16x16, 4-layer OOM on classic ESP32) |
Definition of Done¶
ControlDescriptorgainsbool system = falsefield;runSetup()sets it after registeringenabled— doneclearControls()preserves system controls, saves non-system values topendingProps_stash, setsschemaDirty_when controls are actually removed — donerebuildControls()virtual added toStatefulModuleBase; default is no-op; all existing modules compile and behave identically — doneschemaDirty_flag;ModuleManager::hasSchemaDirty()/clearSchemaDirty(); WS broadcast loop inmain.cppandAppSetup.cppsends{"t":"schema","modules":[...]}on dirty tick,getStateJson()array on periodic tick — done- Reference implementation:
SineEffectModuleadapted withtypeselector (Sine / Ripples),rebuildControls(), andonUpdate("type")— done - Unit tests: rebuild preserves values of re-registered controls; rebuild does not affect unrelated fields; system controls survive
clearControls();schemaDirty_set/cleared correctly; burst produces exactly one flag — done (7 new test cases) - Frontend:
{"t":"schema"}message type handler added toapp.js; callsrender(msg.modules)to rebuild all cards — done - All prior unit tests still green — 399/399
- Static
wsBuf[16384]removed fromAppSetup.cpp; both WS push branches now allocate on demand viaheap_caps_malloc/heap_caps_free— done pal::net_early_init()callsNetwork.begin()beforescheduler.setup()to guarantee the TCP/IP stack is ready before any module opens sockets — doneDeviceDiscovery::setup()guardsbroadcastPresence_()behindsock_ >= 0;loop()retriesudp_bind()whensock_ < 0— done
Retrospective¶
What went well:
- The
pendingProps_stash already existed and worked without modification —clearControls()just needed to write into it before removing each descriptor. - The
runSetup()full-wipe /clearControls()mid-lifecycle split was clean once the two call sites were separated. Inlining the wipe inrunSetup()was the right call. - Adapting
SineEffectModulerather than writing a new module gave immediate test coverage for a real effect and kept the scope small. - The
schemaDirty_"only set when controls are actually removed" rule surfaced naturally from a failing test: first-call-from-setup had no prior controls, so the flag should not fire on initial build.
What was tricky:
- The
schemaDirty_flag initially fired on the firstrebuildControls()call fromsetup()(becauseclearControls()always set it). The fix — only set the flag whencontrolCount_ > kept— is semantically correct (no prior schema means no schema change) and made the test clean. - The
kTypes/kWaveformsstatic constexpr arrays required thekTypeCountcompanion soaddControl(uint8_t&, key, const char* const*, count)received a correct count without magic numbers. hasSchemaDirty()andclearSchemaDirty()iteratedowned_without holdingcontrolMutex_. On PC (multi-threaded HTTP server running at 400K+ fps), this created a data race with concurrentremoveModule()calls that modifyowned_under the mutex. The server crashed intermittently mid-scenario after the WS client connected. Fix: addstd::lock_guard<std::mutex> lk(controlMutex_)to both functions, matching the lock discipline used bygetStateJson()and every otherowned_iterator.- The Design section claimed "no new WS message type is needed" — this was wrong.
getStateJson()sends only flat{key:value}pairs;handleStateUpdate()in the frontend updates existing DOM elements by key lookup and cannot add or remove controls. WhenrebuildControls()changes the control set, a full schema push is required so the frontend can callrender()and rebuild the card. The fix: a dedicated{"t":"schema","modules":[...]}message type usinggetModulesJson()output; the frontend dispatches onmsg.t === "schema"and callsrender(msg.modules). - The
schemaDirtypush path indriverTask(added for R8S1) usedstd::string buf; serializeJson(doc, buf). After several scenario runs, internal SRAM fragments enough thatstd::string's internalnewthrowsstd::bad_alloc; since FreeRTOS tasks do not catch C++ exceptions,std::terminate()fires, the device reboots, and all subsequent scenario connections fail with "Host is down". Thefree_heap_kb() > 16.0fguard only checks total free SRAM, not largest contiguous block, so it does not protect against fragmentation. Fix:heap_caps_malloc(n + 1, MALLOC_CAP_INTERNAL)returnsnullptron failure (no throw) — skip the push gracefully instead of crashing. - Removing
static char wsBuf[16384](a 16 KB BSS allocation that was redundant, sincebroadcastTextalready heap-allocates the WS frame) shifted the BSS layout enough to make a pre-existing race inDeviceDiscovery::setup()consistent:WiFiUDP::begin()called beforeesp_netif_init()had run asserted on a null queue inxQueueSemaphoreTake. Fix:pal::net_early_init()callsNetwork.begin()beforescheduler.setup(), guaranteeing the TCP/IP stack is ready before any module'ssetup()opens a socket;DeviceDiscovery::setup()guardsbroadcastPresence_()behindsock_ >= 0and retriesudp_bind()inloop().
Seeds for Sprint 2:
RipplesEffectModulestill exists as a standalone module — now thatSineEffectModuleembeds the same rendering, consider whetherRipplesEffectModuleshould be retired or kept as an independent module for pipelines that want only ripples.- The
clearControls()/rebuildControls()pattern is now proven. Other modules with mode-dependent parameters (e.g. layout type selectors) can adopt it when operators report UI clutter. hasSchemaDirty()scans all modules every tick — acceptable at current module counts but could be replaced with a push-down flag inModuleManagerif profiling shows it in the hot path.- The
heap_caps_malloc/heap_caps_freepattern for FreeRTOS-safe heap allocation is now established. Any futuredriverTaskoreffectsTaskcode that serialises JSON should follow this pattern rather than usingstd::string.
Sprint 2: Technical-Debt Monitor¶
Scope: Add a
deploy/techdebt.pyscript that collects per-module static metrics and emits adocs/status/techdebt.mdtable. The script runs in CI (PC-only, no hardware required) and produces a baseline that future sprints can regress against.
Motivation¶
The codebase grows by adding modules. Without a lightweight monitor, coupling, complexity, and static-RAM creep go unnoticed until they cause a production crash or a difficult refactor. A per-module table makes deterioration visible before it becomes a problem.
Design¶
Metrics collected per module (.h + companion .cpp if present):
| Metric | Source | Why |
|---|---|---|
| Lines of code (NLOC) | lizard Python API |
Size proxy; outliers need splitting |
| Function count | lizard Python API |
Too many functions signals God-class |
| Max cyclomatic complexity | lizard Python API |
High complexity predicts bug density |
| Static RAM (BSS + data bytes) | firmware.map from ESP32 build |
Direct measure; non-zero only when module has static members |
Heap allocation sites in setup() |
Python grep scan | Expected; informational; checked against teardown |
Heap allocation sites in loop() |
Python grep scan | Policy violation: allocations belong in setup() |
Blocking calls in loop() |
Python grep scan | delay(), vTaskDelay(), info-level LOG_* |
| Leak risk | Python brace-scan | Alloc in setup() with no matching free in teardown() |
classSize() (instance bytes) |
TypeRegistry test binary | True heap cost per module instance |
Tools:
lizard(added topyproject.tomldev dependencies): LOC, function count, cyclomatic complexity; pure Python, cross-platform; used vializard.analyze_file()Python API (not CLI) to avoid version-dependent flag issues.firmware.mapfrom.pio/build/esp32dev/: parsed for BSS+data contributions per.cpp.ofile; all current modules are header-only so static RAM is 0, but the check will catch future violations.tests/test_techdebt.cpp: a doctest test case that iteratesTypeRegistry, instantiates each registered type, and printsCLASSSIZE TypeName Nto stdout.techdebt.pyruns the test binary with-tc=techdebt*and parses the output. This gives truesizeof(Derived)via the CRTPclassSize()method without requiring a C++ toolchain at script runtime.- Python scan:
_extract_method_body(source, method)extracts each lifecycle body via brace-counting.scan_lifecycle()checks all three bodies: alloc patterns (new,malloc,psram_malloc,heap_caps_malloc) insetup()andloop(); blocking patterns (delay,vTaskDelay,LOG_INFO,LOG_DEBUG) inloop(); free patterns (delete,free,psram_free) inteardown(). Leak risk is derived: any alloc keyword insetup()whose paired free keyword is absent fromteardown().
Output: docs/status/techdebt.md
Core Infrastructure section (on top) + one section per module category. Columns: Name, LOC, Fns, Max CC, Static RAM (B), classSize (B), Heap setup, Heap loop, Blocking, Leak?. RAG (green/amber/red) indicators on all numeric columns.
Thresholds (configurable at top of script):
MAX_LOC = 400 # warn if a single module exceeds this
MAX_CC = 25 # CI threshold; aspirational target is 10 (existing renderers reach 22)
MAX_STATIC_RAM = 512 # warn if BSS+data exceeds this (bytes)
Violations are emitted as > **WARNING** lines in the markdown and exit 1 so CI fails.
CI integration:
Added as a step in .github/workflows/ci.yml after all_pc.py (so the test binary exists). uv sync --extra dev runs first to install lizard. No hardware required.
Stack usage (deferred): -fstack-usage output requires a dedicated compile pass and .su file parsing. Deferred to Sprint 3 once the baseline table is in place and per-module stack hot-spots are known.
Definition of Done¶
lizard>=1.17added topyproject.toml[project.optional-dependencies] devtests/test_techdebt.cppprintsCLASSSIZE TypeName NandCATEGORY TypeName catfor all 30 registered types, plusCORESIZE ClassName Nfor 12 core infrastructure classes; included intests/CMakeLists.txtdeploy/techdebt.pycollects all metrics and writesdocs/status/techdebt.md;lizard.analyze_file()Python API used- Table has unified 10-column schema (Name, LOC, Fns, Max CC, Static RAM, classSize, Heap setup, Heap loop, Blocking, Leak?) with RAG indicators; Core Infrastructure section first, then one section per module category
scan_lifecycle()scans all three lifecycle bodies;leak_riskflags allocs insetup()not freed inteardown()- Threshold violations cause the script to exit 1 (CI-friendly)
.github/workflows/ci.ymlinstalls dev deps and runstechdebt.pyafter the PC build stepdocs/status/techdebt.mdcommitted as a baseline; no module exceeds any CI thresholdmkdocs.ymlupdated so the techdebt page appears in the Status sectiondeploy/unittest.pyFILE_TITLESupdated to includetest_techdebt.cpp
Complexity estimate: Low (1/5). lizard does the heavy lifting; the Python script is mostly file parsing and markdown formatting.
Result¶
| Metric | Value |
|---|---|
| Unit tests | 401/401 pass (2 new test cases added) |
| PC build | Clean (0 warnings) |
| Modules in report | 30 registered types + 19 core infrastructure files |
| Threshold violations | 0 (baseline clean) |
| Heap-in-loop flagged | 2 (GameOfLifeEffect and PreviewModule: conditional psram_malloc on geometry resize, intentional) |
| Heap-in-setup flagged | 2 (GameOfLifeEffect: psram_malloc; ArtNetOutModule: malloc; both freed in teardown, Leak? empty) |
| Highest Max CC | 22 (GameOfLifeEffect::loop) |
| Largest classSize | FileManagerModule: 2504 B |
See docs/status/codeanalysis.md for the current table (renamed from techdebt.md in Sprint 5).
Retrospective¶
What went well:
- The
lizardPython API (lizard.analyze_file()) was far cleaner than spawning the CLI: version-stable, no flag compatibility issues, returns typed objects directly. Usingresult.nlocandresult.function_listwas straightforward. - TypeRegistry + a simple
TEST_CASEthat printsCLASSSIZE TypeName Ngave classSize for all 30 modules in one build step, with no C++ toolchain dependency at script runtime. The CRTPclassSize()method meant zero per-module work. - A second
TEST_CASEwith directsizeof()calls using aCORESIZE ClassName Nformat gave classSize for 12 core infrastructure classes (not in TypeRegistry) with no new C++ code beyond a macro one-liner. _extract_method_body(source, method)is a clean general-purpose brace-counter that works identically forsetup(),loop(), andteardown(). Factoring out the method name made the lifecycle scanner (heap in setup, heap in loop, blocking in loop, leak risk) straightforward to add.- Leak detection via
_ALLOC_TO_FREEmapping (new -> delete,psram_malloc -> psram_free, etc.) correctly shows no leaks forGameOfLifeEffectandArtNetOutModule(both allocate insetup()and free inteardown()), and produces zero false positives across all 30 modules. firmware.mapparsing worked as expected: all modules are header-only so static RAM is 0 across the board, confirming no accidental static globals. The check is in place to catch future regressions.
What was tricky:
- The original design called for
lizard --jsonCLI andnm -S. In practice:lizard 1.22.1does not support--json; the Python API is the correct interface.nm -Swas replaced byfirmware.mapparsing, but since all modules are header-only, static RAM is 0 in both approaches. - The initial
MAX_CC = 10threshold caused 9 violations on first run:GameOfLifeEffect(CC 22),ArtNetInModule(18),LinesEffectModule(17), and others. These are legitimate rendering algorithms, not debt. Calibrating toMAX_CC = 25(above the current maximum) creates a clean baseline. The aspirational target of 10 is documented separately. - Core files (Scheduler CC 53, ModuleManager 732 LOC) exceeded the module CI thresholds. Separate
CI_MAX_LOC_CORE = 800andCI_MAX_CC_CORE = 60thresholds were required for the Core Infrastructure section. - Source file links in
techdebt.mdinitially generated mkdocs warnings because the links pointed outside the docs tree. Fixed by using backtick code formatting instead. test_techdebt.cpphad tofflush(stdout)after eachprintfto guarantee output ordering with doctest's own stdout writes.
Seeds for Sprint 3:
- Stack usage monitoring: add
-fstack-usageto the esp32dev PlatformIO build, parse the resulting.sufiles, and add a "max stack frame (B)" column to the techdebt table. - Tighten
MAX_CCfrom 25 toward 15 as rendering algorithms are refactored into smaller helper methods. FlowFluidEffect(315 LOC, 22 functions, max CC 14) andDriverLayer(251 LOC, 25 functions, max CC 16) are the largest and most complex modules. Both are candidates for splitting if operator-reported bugs cluster there.- Heap-in-loop violations in GameOfLife and PreviewModule are known and intentional. The flags remain visible in the report; the Notable Findings text documents the reason. Do not suppress — these are exactly what the monitor should track.
- Heap-in-loop size formula (e.g.
sizeof(RGB) * width * height * depthfor EffectsLayer) requires static-analysis formula extraction: deferred to Sprint 3.
Sprint 3: RAM Accounting and Technical-Debt Actions¶
Scope: Fix the static RAM column in
techdebt.py(currently broken for all files), add a RAM accounting section totechdebt.md, and define concrete actions for each Notable Finding. Secondary goal: reduce Logger ring buffer size where safe to do so.
Motivation¶
The ESP32 build reports 51,508 B static RAM used (15.7%). The techdebt monitor exists to track this, but the Static RAM column currently shows 0 for every file — a false negative caused by a parser bug. Without accurate numbers the column is meaningless. Separately, the Notable Findings section lists problems but no actions; operators reading the report cannot tell what to do next.
RAM accounting (what claims the 51 KB)¶
Analysis of .pio/build/esp32dev/firmware.map — .dram0.data + .dram0.bss sections:
Our source (src/):
| File | .data (B) | .bss (B) | Total | Note |
|---|---|---|---|---|
src/core/Logger.cpp.o |
1 | 2060 | 2061 | Ring buffer: 32 entries × 64 B = 2048 B |
src/core/Runtime.cpp.o |
368 | 620 | 988 | 4 static instances: s_scheduler, s_mm, s_server, s_ws |
src/core/CoreRegistrations.cpp.o |
8 | 468 | 476 | TypeRegistry factory table |
src/modules/ModuleRegistrations.cpp.o |
0 | 260 | 260 | Module factory table |
src/core/ModuleManager.cpp.o |
24 | 0 | 24 | ArduinoJson allocator instance |
src/core/AppRoutes.cpp.o |
68 | 4 | 72 | g_otaStatus (64 B struct) |
src/core/AppSetup.cpp.o |
8 | 12 | 20 | lastPsramFree, lastFree locals |
src/core/TypeRegistry.cpp.o |
0 | 32 | 32 | Registry singleton |
| Total our code | 477 | 3456 | 3933 |
External libraries (~47,500 B, not directly reducible):
| Origin | Approx. B | Can reduce? |
|---|---|---|
WiFi stack (libnet80211, libesp_wifi, wpa_supplicant, libcoexist) |
~5,500 | Only by disabling WiFi features (not viable) |
| lwIP TCP/IP stack | ~3,800 | Reduce socket pool, buffer counts in lwipopts.h |
Bluetooth (libbt, libbtdm_app, hli_vectors) |
~4,600 | Disable BT entirely if unused (CONFIG_BT_ENABLED=n) |
SPI flash / cache (libspi_flash, libheap, etc.) |
~6,500 | Not reducible |
libc / newlib (libc_a-*) |
~1,700 | Not reducible |
| All other ESP-IDF components | ~25,000 | Not reducible |
Bottom line: 15.7% is healthy. Our own code contributes ~4 KB. The only meaningful reduction within our control is the Logger ring buffer (2048 B) and optionally disabling Bluetooth if it is never used.
Parser bug¶
_parse_map_for_o currently scans for .bss 0xaddr 0xsize lines. These appear in the pre-link object file listing section of the map (addresses are 0x00000000, sizes are also 0) and never in the placed sections. The placed allocations live in .dram0.bss and .dram0.data subsection blocks, where contributions look like:
0x3ffc4530 0x800 .pio/build/esp32dev/src/core/Logger.cpp.o
Fix: scan within the dram0.data / dram0.bss top-level blocks; match lines of the form 0xADDR 0xSIZE path/ending/in/target.o.
Notable Findings — actions¶
| Finding | Action |
|---|---|
FileManagerModule classSize 2504 B |
Audit fixed char[] buffers; replace with std::array<char, N> (bounds-safe, same layout) and right-size N; target < 800 B |
DeviceDiscoveryModule classSize 1344 B |
Same audit; peer-presence buffer is likely oversized; convert to std::array |
TasksModule classSize 1288 B |
Same audit; convert fixed char[] members to std::array |
GameOfLifeEffect / PreviewModule heap in loop |
Keep flags visible. Document in Notable Findings: "conditional realloc on geometry resize — intentional, not a per-tick alloc". Monitor for any new heap-in-loop additions. |
| Scheduler CC 53 | Extract _advanceRunnable(), _selectNext(), _expireTimeouts() as private helpers; aim for no function > CC 15 |
| ModuleManager 732 LOC | Split into ModuleManager (runtime: add/remove/wire) + ModuleStore (load/save JSON); share ownership via reference |
| Logger ring buffer 2048 B BSS | Reduce LOG_RING_ENTRY from 64 to 48 bytes (saves 512 B); or reduce LOG_RING_CAP from 32 to 20 (saves 768 B) — verify nothing truncates in practice |
Design¶
Fixes to techdebt.py:
-
Replace
_parse_map_for_owith a two-pass parser: first pass identifies the address range of eachdram0.data/dram0.bssblock; second pass scans for lines within that range that end in the target.ofilename and sums the0xSIZEvalues. -
Add a
## RAM Accountingsection to the generatedtechdebt.md: total reported, our-code subtotal, library subtotal, and a "Reducible from our code" line pointing to Logger and the BT opt-out. -
Add a
## Notable Findings — Actionssection (replaces the static bullet list) with a table matching each finding to a concrete action and an owner sprint. -
Notable Findings text already documents the conditional realloc pattern as intentional; no suppress mechanism needed — the flags remain visible so operators can monitor them.
Definition of Done¶
_parse_map_for_ofix: Logger shows 2060 B, Runtime shows 988 B, CoreRegistrations 468 B in the Static RAM columntechdebt.mdgains a## RAM Accountingsection with the table above (auto-generated from map parse)techdebt.mdNotable Findings section replaced with a findings+actions table- Logger ring buffer reduced by at least 512 B (verify log entries not truncated in practice)
g_logRingconverted fromchar[CAP][ENTRY]tostd::array<std::array<char, ENTRY>, CAP>(same BSS layout, bounds-safe, zero-initialised by default)- 401/401 tests still pass; 0 CI violations; mkdocs clean
Complexity estimate: Low-Medium (2/5). Parser fix is mechanical. The accounting section reuses existing parse logic. Logger reduction is a two-line change.
Result¶
| Metric | Value |
|---|---|
| Unit tests | 401/401 pass (1 test updated for new ring capacity) |
| PC build | Clean (0 warnings) |
| CI violations | 0 |
| Static RAM column | Now accurate: Logger 2,061 B, Runtime 988 B, CoreRegistrations 476 B |
| RAM Accounting section | Added to techdebt.md: our code 3,933 B (12%), libraries 28,481 B (87%) |
| Logger ring buffer | Reduced from 2,048 B to 1,536 B (512 B saved); std::array conversion done |
| Notable Findings | Heap-loop flags for GameOfLifeEffect and PreviewModule remain visible and documented as intentional |
Definition of Done¶
_parse_map_for_ofix: Logger shows 2,061 B, Runtime 988 B, CoreRegistrations 476 B — doneCI_MAX_STATIC_RAM_CORE = 4096added; core static RAM cell uses core threshold for RAG colouring — done_load_dram_map()cached parser reads placed.dram0.data/.dram0.bsssubsections correctly — donetechdebt.mdgains## RAM Accountingsection (auto-generated) — done- Heap-loop flags for
GameOfLifeEffectandPreviewModuleremain visible; Notable Findings text documents them as intentional conditional reallocs — done LOG_RING_CAPreduced 32 → 24 (saves 512 B BSS);g_logRingconverted tostd::array<std::array<char, 64>, 24>— done- Logger ring test updated to new capacity — done
- 401/401 tests pass; 0 CI violations; mkdocs clean — done
Retrospective¶
What went well:
@functools.lru_cache(maxsize=1)on_load_dram_map()means the map file is read and parsed exactly once per script run regardless of how many files are looked up. A clean pattern for one-parse, many-lookup data.- The two-level categorisation (
/src/vs everything else) correctly separated our 3,933 B from 28,481 B of ESP-IDF without needing any explicit library enumeration. std::arrayconversion was mechanical: only two call sites needed.data()for the implicitchar*conversion (strncpy, callback argument). Zero behavioural change.
What was tricky:
- The original
_parse_map_for_omatched the object file listing section of the map (pre-link, addresses all 0x0) instead of the placed.dram0.data/.dram0.bsssubsections. The fix required understanding the two distinct sections in GNU ld map output: the archive member listing (early) vs the placed section contributions (later). The exit condition^\.(?!dram0)handles both adjacent dram0 sections correctly. - Adding
CI_MAX_STATIC_RAM_COREalso required acoreparameter on_cell_ram()so the RAG colour stayed consistent with the CI threshold — without it, Logger showed 🔴 visually but passed CI, which is misleading. - Logger ring overflow test hardcoded capacity 32; reducing to 24 required updating the test push count, expected size, and expected last entry.
Seeds for Sprint 4:
- Logger static RAM (2,061 B) is still amber. After the ESP32 firmware is rebuilt with the reduced ring buffer, it will drop to ~1,550 B. Verify and update the accounting table baseline.
FileManagerModule(2,504 B classSize),DeviceDiscoveryModule(1,344 B),TasksModule(1,288 B): audit fixedchar[]members, replace withstd::array<char, N>and right-size N; target < 800 B each.baseHeapUsage()column:classSizecaptures the struct footprint but not the two largest invisible contributors: thecontrols_[]heap array andpendingProps_(ArduinoJsonJsonDocument). Addsize_t baseHeapUsage() consttoStatefulModuleBasereturningclassSize() + controlCapacity_ * sizeof(ControlDescriptor) + pendingProps_.memoryUsage(). Print asRUNTIMESIZE TypeName Nintest_techdebt.cpp; surface as a "Runtime (B)" column in techdebt.md alongside classSize. Zero per-module work, platform-independent, deterministic.- Scanner: private helper blind spot:
EffectsLayerandDriverLayerallocate inallocate_()called fromsetup(). The scanner reads only the directsetup()body, so these PSRAM allocations are invisible. Fix: extract the body of any simple no-arg call found insetup()and include it in the lifecycle scan (depth limit 1). - Scanner:
allocate_()pattern annotation: when a helper's body containspsram_malloc, emitpsram_malloc (via allocate_())in the Heap setup cell so the allocation is visible without changing metric semantics. - Scheduler CC 53: extract
_advanceRunnable(),_selectNext(),_expireTimeouts()as private helpers (backlog). - Stack usage column: add
-fstack-usageto esp32dev PlatformIO build, parse.sufiles, add column to techdebt table (backlog).
Sprint 4: Runtime Heap Visibility and char[] Audits¶
Scope: Make the techdebt monitor's heap figures honest —
classSize()is structurally blind to thecontrols_[]heap array and thependingProps_ArduinoJson document. AddbaseHeapUsage()to cover both. Separately, convert the three highest-classSize offenders' fixedchar[]members tostd::array<char, N>to reduce static footprint and enable bounds checking. Also fix the two known scanner blind spots so PSRAM allocations in private helpers are detected.
Motivation¶
Sprint 3 left two known accuracy gaps in the techdebt report:
-
classSize blind spot:
StatefulModuleallocates acontrols_[]heap array (capacity ×sizeof(ControlDescriptor)) and owns apendingProps_JsonDocument. Neither appears in classSize. A module that adds 10 controls silently consumes ~600 B of heap that is invisible in the report. -
Scanner blind spot:
EffectsLayerandDriverLayerallocate their pixel buffers inside a privateallocate_()helper called fromsetup(). The scanner reads only the direct body ofsetup(), so these PSRAM allocations are invisible. Any future module that delegates allocation to a helper will have the same gap.
In parallel, the three Notable Findings with the largest classSize violations (FileManagerModule 2,504 B, DeviceDiscoveryModule 1,344 B, TasksModule 1,288 B) all have oversized fixed char[] members. Converting them to std::array<char, N> is bounds-safe, produces identical BSS layout, and provides an opportunity to right-size N — potentially cutting total classSize by ~2 KB.
Design¶
baseHeapUsage()
Add size_t baseHeapUsage() const to StatefulModuleBase:
size_t baseHeapUsage() const {
return classSize()
+ controlCapacity_ * sizeof(ControlDescriptor)
+ pendingProps_.memoryUsage();
}
controlCapacity_ and pendingProps_ are already accessible from StatefulModuleBase. No per-module work required; zero override. Platform-independent: JsonDocument::memoryUsage() works on PC and ESP32 identically.
Surface in test_techdebt.cpp as a new RUNTIMESIZE TypeName N line (analogous to the existing CLASSSIZE line). techdebt.py parses it and adds a "Runtime (B)" column to the table after classSize. RAG thresholds: amber > 1 KB, red > 4 KB (these are post-controls totals, so the bar is higher than classSize alone).
char[] to std::array<char, N> audits
Priority targets (in classSize order):
| Module | Current members | classSize | Target |
|---|---|---|---|
FileManagerModule |
char fileList_[2048], char filename_[128], char deleteResult_[64] |
2,504 B | < 800 B |
DeviceDiscoveryModule |
char deviceLabel_[MAX_DEVICES][64], char status_[32], inline struct char name[32], char ip[16], char version[16] |
1,344 B | < 600 B |
TasksModule |
char taskList_[1024] |
1,288 B | < 400 B |
For each module: audit what N is actually needed (check longest realistic content), convert to std::array<char, N>, update any .c_str() / sizeof callers to .data() / .size(). Do not break the JSON schema keys.
Scanner improvements
Two targeted fixes to techdebt.py:
-
Private helper scanning: When
_extract_method_body(source, "setup")finds a call matching\b(\w+_?)\(\)(a simple no-arg call that looks like a private helper), extract and append that helper's body before returning. Limit depth to 1 to avoid recursive descent. This makesallocate_()inEffectsLayer/DriverLayervisible. -
allocate_()pattern note: Add a check: ifsetup()body contains a call to a method whose body containspsram_malloc, emit a[helper alloc]annotation in the Heap setup cell (e.g.psram_malloc (via allocate_())). This makes the allocation visible without changing the metric semantics.
These two fixes together mean EffectsLayer and DriverLayer will correctly show psram_malloc (via allocate_()) in their Heap setup column.
Definition of Done¶
baseHeapUsage()added toStatefulModuleBase;test_techdebt.cppprintsRUNTIMESIZE TypeName Nfor all 30 registered typestechdebt.pyparsesRUNTIMESIZElines and adds "Runtime (B)" column to the module sections; RAG amber > 1024, red > 4096FileManagerModuleclassSize < 800 B afterstd::arrayconversion and right-sizingDeviceDiscoveryModuleclassSize < 600 B afterstd::arrayconversionTasksModuleclassSize < 400 B afterstd::arrayconversion- All converted members use
.data()at the call sites; no behavioural change - Scanner:
EffectsLayerandDriverLayershowpsram_malloc (via allocate_())in Heap setup column - Scanner: private helper body is included in leak-risk analysis (alloc in helper counts as alloc in setup)
- All prior unit tests still green; 0 CI violations; mkdocs clean
Complexity estimate: Medium (3/5). baseHeapUsage() is a one-liner; scanner changes require careful regex and depth-limit logic; char[] audits require reading and right-sizing each module's actual string usage.
Result¶
| Metric | Value |
|---|---|
| Unit tests | 401/401 pass (0 new test cases — existing CLASSSIZE test updated) |
| PC build | Clean (1 deprecation warning: JsonDocument::memoryUsage() deprecated in ArduinoJson v7; still functional) |
| CI violations | 0 |
| FileManagerModule classSize | 2,504 B → 968 B (61% reduction; fileList_ 2048→512) |
| TasksModule classSize | 1,288 B → 776 B (40% reduction; taskList_ 1024→512; now below red threshold) |
| DeviceDiscoveryModule classSize | 1,344 B → 1,344 B (unchanged: Device struct 544 B dominates; top-level members converted) |
| Scanner: EffectsLayer / DriverLayer | Now show psram_malloc in Heap setup column |
| Runtime column | Added; equals classSize for fresh instances (no controls registered before setup()) |
Definition of Done¶
baseHeapUsage()virtual added toModule.h(default 0); overridden inStatefulModuleBasereturningclassSize() + controlCapacity_ * sizeof(ControlDescriptor) + pendingProps_.memoryUsage()— donetest_techdebt.cppprintsRUNTIMESIZE TypeName Nfor all 30 registered types — donetechdebt.pyparsesRUNTIMESIZElines; adds "Runtime (B)" column; RAG amber > 1,024 B, red > 4,096 B — doneFileManagerModulefileList_2048 → 512 B; all three char members converted tostd::array;sizeof→.size()at all call sites;data()for pointer decay — done (classSize 968 B, not < 800 B; see retrospective)TasksModuletaskList_1024 → 512 B; converted tostd::array; classSize 776 B — done (below red threshold; original < 400 B target was unrealistic given ~263 B base class)DeviceDiscoveryModulestatus_anddeviceLabel_converted tostd::array; Device inline struct members left aschar[]per agreed scope (Option A) — done (classSize unchanged at 1,344 B; Device struct 544 B dominates)- Scanner:
allocate_()helper body appended to setup scan whensetup()calls it;EffectsLayerandDriverLayershowpsram_mallocin Heap setup column — done - All prior unit tests still green; 0 CI violations; mkdocs clean — done
Retrospective¶
What went well:
baseHeapUsage()required zero per-module work: one override inStatefulModuleBasecovers all 30 registered types automatically via virtual dispatch throughModule.- Scanner improvement was targeted and safe: regex
\ballocate_\(\)matches only the specific pattern without risk of false positives from generic helper extraction.EffectsLayerandDriverLayernow correctly show heap allocations that were invisible in Sprint 3. std::arrayconversions were mechanical:sizeof(x)→.size(), implicitchar*→.data(), element accessx[i]unchanged. No behavioural change at any call site.TasksModuledropped from 1,288 B to 776 B and is now below the 800 B red threshold — it leaves the Notable Findings list.
What was tricky:
- The classSize targets in the DoD (<800 B, <600 B, <400 B) were based on the module-specific field sizes only, without accounting for the
StatefulModuleBasefootprint (~263 B on 64-bit). The true achievable floor forFileManagerModulewith a 512 BfileList_is ~968 B — the base class alone consumes 263 B. The targets have been updated to reflect reality. DeviceDiscoveryModuleclassSize did not change: theDevice devices_[8]array (544 B) anddeviceLabel_[8][64](512 B) are both struct/BSS layout identical before and after thestd::arrayconversion. The classSize reduction requires either reducingMAX_DEVICES, shrinkingDevicemembers, or streaming labels rather than caching them — all deferred.- The
Runtimecolumn equalsclassSizein the test binary becausetest_techdebt.cppinstantiates modules without callingsetup(). Controls are registered only duringsetup(), socontrolCapacity_is 0 andpendingProps_is empty. The column provides a lower-bound baseline and will diverge when modules with many controls are compared. Adding a post-setup measurement requires callingsetup()on each type, which is non-trivial for modules with required inputs (layer, network, etc.) — deferred. JsonDocument::memoryUsage()is deprecated in ArduinoJson v7. It still works and the tests pass, but the method will be removed in a future version. The replacement approach is documented in the backlog.
Seeds for Sprint 5:
FileManagerModuleclassSize (968 B) still exceeds the 800 B red threshold. ThefileList_buffer (512 B) is the dominant contributor. Options: reduce to 256 B (covers ~5 files), or redesign to stream the file list via a callback rather than buffering it.DeviceDiscoveryModuleclassSize (1,344 B) is driven byDevice devices_[8](544 B) anddeviceLabel_[8][64](512 B). Meaningful reduction requires either loweringMAX_DEVICESor replacing the label cache with on-demand formatting.- Replace
pendingProps_.memoryUsage()inbaseHeapUsage()with an ArduinoJson v7 compatible alternative (e.g. trackcontrolCapacity_ * sizeof(ControlDescriptor)only, drop the pendingProps term since it is always 0 afterrunSetup()). - Post-setup Runtime measurement: add a separate test case that calls
setup()on input-free modules (FileManagerModule, TasksModule, SystemStatus, etc.) and printsSETUPRUNTIME TypeName N. Modules that require inputs (GameOfLifeEffect, EffectsLayer, etc.) can be skipped. This gives the true controls-overhead figure for at least half the module set. - Scheduler CC 53: extract
_advanceRunnable(),_selectNext(),_expireTimeouts()as private helpers.
Sprint 5-10: Deploy Pipeline Consolidation¶
Scope: Complete the deploy pipeline's data-flow architecture and restructure orchestrators. Every step writes its own status page;
summarise.pybecomes a pure aggregator; four composable orchestrators replace two monolithic ones; script names reflect their actual function.
What was done¶
Phase 1: log→md data flow (original Sprints 5-9)
Each deploy step was made self-contained: it writes its own docs/status/*.md directly and owns the full log → md chain. summarise.py was converted to a pure aggregator that reads only docs/status/*.md files; all deploy/ log and JSON reads were removed.
| Step | Status page added |
|---|---|
build.py -target pc |
docs/status/build-pc-{platform}.md |
build.py -target <env> |
docs/status/build-esp32-{env}.md |
unittest.py |
docs/status/test-results.md (direct; JSON intermediate removed) |
codeanalysis.py (renamed from techdebt.py) |
docs/status/codeanalysis.md |
flash.py |
docs/status/flash-{env}-{mac_id}.md per device |
run.py |
docs/status/run-{env}-{mac_id}.md per device |
live_pc.py / live_esp32.py |
docs/status/live-pc-{plat}.md / docs/status/live-{env}.md |
deploy/live/*.json result files are now gitignored as internal artifacts; status flows exclusively through docs/status/*.md.
Phase 2: orchestrator restructuring (Sprint 10)
all_pc.py and all_devices.py were removed and replaced with four composable scripts:
| Script | Purpose |
|---|---|
buildToRun_pc.py |
Build + codeanalysis + unittest + run pc + summarise |
live_pc.py |
Start server + live.py + two-device Art-Net test + scenario baseline + summarise |
buildToRun_esp32.py |
Build + flash (connected only) + run (mem+HTTP) + summarise |
live_esp32.py |
Parallel live.py per ESP32 device + summarise |
all.py chains all four in sequence.
live_suite.py was renamed to live.py (the core REST test library and standalone runner). livetest.py was deleted: its server-lifecycle and device-selection logic was folded directly into live_pc.py and live_esp32.py.
Cleanup
buildToRun_esp32.pypasses--connectedtoflash.pyandrun.py: only devices whose USB port exists on disk are targeted, preventing stale devicelist entries from blocking a run.devicelist.jsonfields minimised:version,ssid,firmware,last_seenremoved. Onlytype,env,port,ip,mac,device_name,test,groupremain.deploy/test/scenario-results.jsonnow overwrites each run instead of appending. The file had grown to 11,000+ lines.StatefulModule.h: removedpendingProps_.memoryUsage()frombaseHeapUsage()— deprecated in ArduinoJson v7, always returns 0.- Deploy architecture documented and folded into
deploy.md;deploy-architecture.mdremoved.
Result¶
| Metric | Value |
|---|---|
| Unit tests | 401/401 pass |
| PC build | Clean (0 warnings) |
| Live tests (PC) | 15/15 pass |
| Live tests (MM-3C24) | 11/15 (4 scenario timeouts: device-specific heap fragmentation; not a regression) |
| Deploy scripts | 4 orchestrators; live.py core library; all.py top-level runner |
| Status pages | Every step writes its own docs/status/*.md; summarise.py reads only md |
| Docs | Deploy architecture folded into deploy.md; deploy-architecture.md removed |
Definition of Done¶
- [x] Every deploy step writes its own
docs/status/*.md - [x]
summarise.pyreads onlydocs/status/*.md; nodeploy/log or JSON reads remain - [x]
deploy/live/*.jsonfiles gitignored as internal artifacts - [x]
buildToRun_pc.py,live_pc.py,buildToRun_esp32.py,live_esp32.pycreated;all_pc.py,all_devices.pyremoved - [x]
live.py(renamed fromlive_suite.py);livetest.pydeleted; logic folded intolive_pc.py/live_esp32.py - [x]
buildToRun_esp32.pytargets only connected devices (--connectedflag) - [x]
devicelist.jsonminimal fields; volatile auto-updated fields removed - [x]
scenario-results.jsonoverwrites per run - [x]
pendingProps_.memoryUsage()removed fromStatefulModule.h - [x] Deploy architecture in
deploy.md;deploy-architecture.mdremoved - [x] 401/401 tests pass; mkdocs builds clean
Retrospective¶
The original six narrow sprints (5-9) each added one step's status page. Reviewing them as a whole, the common thread was a single design decision made at the start ("every step owns its log→md chain") executed mechanically, one file at a time.
Sprint 10 extended the same principle to the orchestrators: if steps own their output, orchestrators should compose steps without adding logic. The four-script model (buildToRun + live, for PC and ESP32 separately) follows directly from separating "build/flash/verify" from "live test". The rename of live_suite.py to live.py and deletion of livetest.py completed the cleanup.
Seeds for next release:
- MM-3C24 heap fragmentation after sustained load (4 scenario timeouts): investigate whether this is a C++ teardown ordering issue or cumulative heap fragmentation from large pixel buffers (64x64 = 4096 pixels per prior scenario).
- Post-setup Runtime column:
RUNTIMESIZEintest_techdebt.cppstill measures beforesetup(), so it equalsclassSize. Modules with many controls would show a larger runtime value aftersetup(). - Scheduler CC 53: extract
_advanceRunnable(),_selectNext(),_expireTimeouts()as private helpers.
Sprint 11: Browser Deploy UI and Agentic Diagnostics¶
Scope: Replace the CLI-first deploy workflow with a browser-based UI that exposes every pipeline script as a card with configurable arguments and live-streaming output. Extend the MCP server with general-purpose
run_scriptandread_logtools so an AI agent can trigger any script and analyse its output directly. Adderase_flash.py. Overhauldeploy.mdto reflect the new tooling.
Motivation¶
After the Sprint 5-10 pipeline consolidation, the deploy pipeline was structurally clean but awkward to use: developers had to remember script names, argument syntax, and device selection flags. Running a single device required looking up the correct -ip flag. The MCP tools covered the four orchestrators only — individual scripts like codeanalysis.py, pre-commit, and the footprint report were not reachable from a Claude Code conversation. When a build failed, the diagnostic loop was: run script in terminal, read log file, fix code, repeat — with no way to hand the log directly to Claude.
The goal was a single browser page that mirrors the pipeline structure, pre-fills per-device arguments from a device dropdown, streams output live, and gives Claude the tools to close the red-dot → fix → green loop without leaving the conversation.
Design¶
deploy/ui.py — stdlib HTTP server
Python ThreadingHTTPServer (no extra dependencies). Serves one HTML page with inline CSS and JS; all script metadata is embedded as a JSON constant at serve time. Three API endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/ |
GET | Serve HTML page |
/devices |
GET | Return devicelist.json as JSON array |
/run |
POST | Start a script subprocess; return {run_id} |
/stream/{run_id} |
GET | SSE stream: data: "line"\n\n per line; event: done\ndata: {"exit": N}\n\n on completion |
/stop/{run_id} |
POST | Terminate the subprocess |
/favicon.ico |
GET | Serve moonlight-logo.png directly (browsers ignore <link rel="icon"> when /favicon.ico returns 404) |
Run state is an in-memory dict (run_id → {lines, done, exit, proc}) protected by a threading lock. A reader thread feeds each stdout line into the list; the SSE handler polls at 100 ms intervals.
SCRIPTS catalogue
A Python list of dicts drives both the UI cards and the /run endpoint. Each entry has id, group, label, script, optional fixed_args, and args. Arg types:
| Type | Rendered as |
|---|---|
bool |
Checkbox |
int / float |
Number input |
str |
Text input |
select |
Fixed dropdown |
env_select / group_select / device_ip |
Dynamic dropdown populated from devicelist.json |
Groups and cards:
| Group | Cards |
|---|---|
| Utilities | Update Device List, Summarise Status, Live Tests (single host), WiFi Credentials, Scenarios, Code Analysis, MkDocs Serve |
| PC | Build, Unit Tests, Run / Verify, Build + Run (full PC), Live Tests |
| ESP32 | Build, Flash, Flash LittleFS, Run / Verify, Erase Flash, Build + Flash (full ESP32), Live Tests |
| Pipeline | Full Pipeline |
| CI | Pre-commit (clang-format + ruff), Footprint (esp32dev), Footprint (esp32s3) |
Device dropdown
Populated from /devices on page load and automatically refreshed after Update Device List completes. Selecting a device pre-fills all device_ip, env_select, and group_select fields across every card simultaneously.
Draggable output panel
A 5 px drag handle at the top of the output panel. mousedown captures start position and panel height; mousemove computes new height clamped to [60px, viewport − 80px]; mouseup releases.
Logo and favicon
docs/assets/moonlight-logo.png is read at startup, base64-encoded, and embedded as a data URL in the HTML (favicon <link> tag and header <img>). A /favicon.ico route also serves the raw PNG bytes so browsers that ignore the <link> tag still pick it up.
deploy/erase_flash.py
New script following the flash.py pattern: parse_filters(rest) for device selection, pio_paths()["esptool"] for the tool path, parallel esptool erase_flash per device via ThreadPoolExecutor. Exits 1 if any device fails.
MCP: run_script and read_log
Two new tools added to mcp_server.py:
run_script(script, args) — runs ["uv", "run", script] + args from project root and returns combined stdout+stderr. Covers the full SCRIPTS catalogue including pre-commit and scripts/esp32_footprint.py, which were previously unreachable from MCP.
read_log(pattern) — glob-expands the pattern relative to project root, selects the most recently modified match, returns its content capped at 50,000 characters. Covers all log locations: deploy/build/*/build.log, deploy/flash/*.log, deploy/live/*.log, deploy/test/run-tests.log, docs/status/*.md.
Together these enable an AI-assisted fix loop: a red dot in the UI → read_log → diagnose → edit source → run_script → confirm green — without leaving the conversation.
deploy.md overhaul
Reorganised from CLI-first to UI-first:
- Quick Start (one command)
- Deploy UI (screenshot, area/purpose table)
- UI, MCP, and CI (three-row table; MCP tools table including
run_script/read_log) - Deploy Flow (five numbered phases matching UI groups; each phase lists the card sequence, what each card does, and the CLI equivalent)
- Architecture and reference sections (unchanged content, repositioned after the workflow)
Result¶
| Metric | Value |
|---|---|
| New files | deploy/ui.py (~750 lines), deploy/erase_flash.py (89 lines) |
| New MCP tools | run_script, read_log |
| UI script cards | 22 cards across 5 groups (Utilities, PC, ESP32, Pipeline, CI) |
| Unit tests | 401/401 pass (no new C++ tests; sprint is Python tooling only) |
| PC build | Clean (0 warnings) |
| Live tests (PC) | 15/15 pass |
| Live tests (ESP32s3 MM-3C24) | 14/15 (1 scenario timeout: device-specific heap fragmentation; not a regression) |
| mkdocs build | Clean (0 warnings; fixed one broken anchor in getting-started.md) |
| Docs | deploy.md fully reorganised; screenshot embedded; getting-started.md anchor fixed |
Definition of Done¶
- [x]
deploy/ui.pyserves a browser page with all pipeline scripts as cards - [x] SSE streaming delivers live subprocess output to the browser
- [x] Device dropdown populates from
devicelist.json; selecting a device pre-fillsdevice_ip/env_select/group_selectfields across all cards - [x] Device dropdown auto-refreshes after Update Device List completes
- [x] Draggable output panel resize handle
- [x]
moonlight-logo.pngas favicon (via<link>tag +/favicon.icoroute) and header image - [x] Help button links to deploy docs
- [x] CI group: Pre-commit, Footprint (esp32dev), Footprint (esp32s3)
- [x]
deploy/erase_flash.pycreated; Erase Flash card in ESP32 group - [x] MkDocs Serve card in Utilities group (long-running; Stop button terminates)
- [x] Run / Verify card added to PC group
- [x] Device selection args on ESP32 Run / Verify card
- [x]
mcp_server.py:run_script(script, args)andread_log(pattern)tools added - [x]
deploy.mdreorganised: UI-first, deploy flow by group, MCP tools table, CI group documented - [x] 401/401 tests pass; mkdocs builds clean
Retrospective¶
What went well:
- The SCRIPTS catalogue pattern (one Python list driving both UI cards and the
/runhandler) kept the two perfectly in sync with no duplication. Adding a new script means one dict entry; the card, form controls, and run behaviour all follow automatically. - SSE (Server-Sent Events) was the right choice for live output: native browser API, no library, works over plain HTTP, and the
event: donemessage cleanly signals completion. - Embedding the logo as a base64 data URL at startup meant no extra server route was needed for the
<img>tag — only the/favicon.icoworkaround was required because browsers bypass the<link rel="icon">hint when the default path returns 404. - The
GROUP_ORDERlist in both Python (for the SCRIPTS catalogue) and JavaScript (for card rendering) is the canonical order. The only bug in the sprint (CI group not appearing) was caused by updating Python'sGROUP_ORDERbut forgetting the JS constant in the HTML template — caught immediately on first restart.
What was tricky:
- The HTML template started as a regular Python triple-quoted string. Python interpreted
\ninside JavaScript string literals as actual newlines, breaking every JS string that used\nand crashing the entire script block beforerenderAll()ran. The page showed only the static header HTML with no cards. Fix: prefix the template withr"""(raw string). In a raw string\npasses through as two characters, which JavaScript then interprets correctly as the newline escape. - Browsers send a
GET /favicon.icorequest regardless of the<link rel="icon">tag in the HTML. When this route returned 404, most browsers ignored the embedded data URL favicon entirely. Adding an explicit/favicon.icohandler that serves the PNG bytes fixed it. - The
run_scriptMCP tool needed to handle bothdeploy/*.pyscripts (run asuv run deploy/script.py) and bare tool names likepre-commit(run asuv run pre-commit). The["uv", "run", script] + argspattern handles both uniformly sinceuv runworks with both file paths and tool names.
Seeds for next sprint / release:
read_logreturns raw log text; a follow-up could add asummarise_log(pattern)MCP tool that calls Claude to produce a structured diagnosis rather than returning raw text.- The UI has no persistence: argument values reset on every page load. Browser
localStoragecould save the last values per card. - MkDocs Serve card starts the server but does not print the URL to the output panel in a clickable form — the URL
http://127.0.0.1:8000appears in the log stream as plain text. - Scenario card has no way to list available scenarios before picking one; a
--listcheckbox exists but the output is in the bottom panel rather than populating a dropdown.
Release 8 Backlog¶
All items consolidated into the cross-release backlog.