After upgrade kexec into v1.13.2, CM5 eMMC takes ~2m13s between the SDHCI
controller registering and mmc0 actually becoming usable. The Talos config
acquire state machine (`acquire.go::stateDisk`) checks STATE in the first
seconds of boot, sees `VolumePhaseMissing`, and transitions one-way to
`stateEmbedded` -> `stateMaintenanceEnter`. When STATE later becomes
ready, the state machine doesn't re-enter `stateDisk`, so the node stays
in maintenance forever despite the on-disk config.yaml being intact.
This patch makes stateDisk tolerate transient phase=missing for up to
5 minutes (stateMissingDiskTimeout) before falling through to embedded.
A 5-second ticker on the outer Run loop ensures the timeout can fire
even when no further volume-status events arrive (e.g. truly missing
STATE on a fresh install).
Validated 2026-05-25 via canonical 3-CP rolling upgrade on a freshly
flashed v1.12.4 home-test cluster: all 3 blades upgraded sequentially
to v1.13.2-7 (this patch), each came back stage=running with config
loaded automatically and k8s Ready within ~5 min, no manual remediation.
See doc-compute-blade-kubernetes/talos-upgrade-validation/session-2026-05-25/E2E-VALIDATED.md.
Fast-init hardware sees no change — STATE reaches ready within seconds
and the existing path runs.
This patch file mirrors a commit that already existed in checkouts/talos
(`a50511de7` — "grub: EFI-at-/boot fallback for BOOT-less SBC layout in
Upgrade path") but was never landed back into patches/siderolabs/talos/.
Extracted with `git format-patch` from the checkout so subsequent
`make patches` runs reproduce the same tree on a fresh clone.
Complements 0005 by handling the Upgrade code-path (in addition to the
fresh-install code-path 0005 already covers) for SBC layouts that don't
have a separate BOOT partition.
Two complementary fixes after end-to-end local installer build:
1. New talos/0001 patch — Replace hack/modules-arm64.txt with the
intersection of upstream's initramfs list and our RPi 6.12.47
build's actual modules (155 entries, down from upstream's 241).
Initramfs target was failing with exit 123 in xargs install -D
because upstream lists modules our kernel doesn't build (SATA,
HID device drivers, some upstream-only crypto helpers).
2. Makefile: add --network=host to the metal docker run.
The installer step already had it, but the metal step did not.
For local-registry builds (REGISTRY=127.0.0.1:5001), the imager
container needs --network=host to reach the host's registry to
pull the overlay image when generating the raw disk image.
Harmless on CI (no behavioural change against docker.io).
Validated locally end-to-end:
- kernel image: 234MB (RPi 6.12.47 with RP1 driver support)
- overlay image: 9.7MB (U-Boot + firmware + DTBs)
- imager image: 346MB
- installer-base: 105MB
- installer: ~100MB
- metal-arm64.raw.zst: 94MB (final flashable disk image)
The v1.13.2 rebase of pkgs 0001 only restored some RP1-related kernel
options (PINCTRL_RP1, COMMON_CLK_RP1, PINCTRL_BCM2712) because those
hunks happened to apply cleanly against upstream v1.13.0's 6.18.24-era
config-arm64. Several others were silently dropped, causing:
ld.lld: error: undefined symbol: rp1_get_platform
at the vmlinux link step (~19 min into local kernel build).
Re-added:
- CONFIG_MFD_RP1=y (defines rp1_get_platform)
- CONFIG_COMMON_CLK_RP1_SDIO=y
- CONFIG_FB_BCM2708=y (RPi framebuffer)
- CONFIG_PWM_PIO_RP1=y (RPi PWM via PIO)
- CONFIG_PWM_BRCMSTB=y (was "not set")
Local build now succeeds: svrnty/talos-rpi5-kernel:v1.13.0-local
loaded into local Docker (234MB).
- Makefile: TALOS_VERSION v1.12.4 -> v1.13.2, PKG_VERSION v1.12.0 -> v1.13.0
- siderolabs/talos 0001 (modules-arm64.txt): removed; hack/modules-arm64.txt
is a CI assertion file with no build-time references. Will be regenerated
from a real RPi 6.12.47 kernel build as a follow-up.
- siderolabs/talos 0005 (BOOT partition GRUB): rebased onto v1.13.2's
Install/Upgrade refactor. installEFI struct field is gone upstream; ported
the BOOT-partition probe + EFI-at-/boot fallback to work with the new
efiFound local var and added a bootFromEFI struct field for runGrubInstall.
- siderolabs/pkgs 0001: rebased onto v1.13.0. Kernel config header bumped
to 6.12.47. config-arm64 not fully regenerated for RPi 6.12.47 yet -- some
upstream v1.13 6.18.x symbols (LIBIE_ADMINQ, IDPF, etc) remain in the file
but the kernel's Kconfig silently drops unknown options during build.
Enable GPIO UART0 on Pi5/CM5 via dtoverlay=uart0-pi5 in
configTxtAppend. Remove the old 0002 patch that targeted the
debug UART (ttyAMA10) — Compute Blade uses GPIO 14/15 (ttyAMA0).
Renumber overlay patches (old 0003 becomes 0002).
Update README with tested serial console docs: wiring diagram,
even parity config, 3.3V requirement, and read-only limitation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document recommended storage layouts per node role:
- Control planes: NVMe boot for fast etcd I/O
- Postgres/storage: eMMC boot + NVMe data at /var/mnt/data
- Compute workers: eMMC only, stateless
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The [pi5]/[all] section headers in configTxtAppend created duplicate
sections when concatenated with the overlay's base config.txt, which
already ends with [pi5]/[all]. The RPi firmware parser choked on the
duplicate headers, preventing NVMe boot on fresh installs.
Remove the section headers — dtparam and overclock settings now land
under the existing [all] scope from the base config.txt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply overlay patch 0003 (EFI mount path detection for SBC layouts)
in the build so upgrades write firmware/config.txt to the correct
path. Update README with patch 0003, PCIe Gen 3 in features list,
and expanded PCIe Gen 3 instructions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Config.txt is set correctly at flash time. Upgrades via talosctl
don't override firmware config (overlay writes to wrong path on
SBC layout, which is harmless). Users who need custom config.txt
set it once during initial flash.
- Use configTxtAppend with PCIe Gen 3 + overclock
- Put dtparam=pciex1_gen=3 in [pi5] section
- Remove patch 0003 (SBC overlay upgrade fix) — too risky,
deleted GRUB's BOOTAA64.EFI in v8
- Remove full configTxt replacement mode
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The v8 overlay patch deleted /boot/EFI/ to clean up stale firmware,
but this also removed GRUB's BOOTAA64.EFI, bricking the node.
Fix: keep SBC layout detection (write to /boot/ not /boot/EFI/) but
remove the os.RemoveAll that destroyed GRUB. Stale firmware files in
/boot/EFI/ are harmless.
Re-enable PCIe Gen 3 (dtparam=pciex1_gen=3) and full configTxt mode,
now that the overlay installer correctly writes to the EFI partition
root on SBC layouts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PCIe Gen 3 changes (dtparam=pciex1_gen=3, full configTxt
replacement, SBC overlay upgrade fix) caused boot failures during
talosctl upgrade on CM5 nodes. Revert to the pre-Gen3 state:
- configTxtAppend (overclock only) instead of full configTxt replacement
- Remove 0003 overlay patch application (kept in patches/ for future use)
PCIe Gen 3 support will be re-added after root cause analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes in one:
1. SBC overlay upgrade path: the overlay installer was always writing
to /boot/EFI, but on SBC layouts (no BOOT partition) the GRUB code
mounts EFI at /boot. Config.txt and firmware ended up in a stale
/boot/EFI/ subdirectory, invisible to the firmware. The installer
now detects the SBC layout and writes to the correct location.
2. PCIe Gen 3: dtparam=pciex1_gen=3 works on CM5 (the DT overrides
exist), so the custom pcie-gen3.dtbo overlay is unnecessary.
Simplified to just use dtparam in config.txt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CM5 DTB (bcm2712-rpi-cm5-cm5io.dtb) lacks the pciex1 alias that
the Pi 5 DTB provides, making dtparam=pciex1_gen=3 silently fail.
Add a custom device tree overlay (pcie-gen3.dtbo) that targets
/axi/pcie@1000110000 directly to set max-link-speed = <3>. The overlay
is embedded in the SBC installer and written to /boot/EFI/overlays/
during install/upgrade.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dtparam=pciex1_gen=3 was being appended after the [all] section,
but RPi firmware requires PCIe dtparams in the [pi5] section.
Switch from configTxtAppend to full configTxt replacement to control
section ordering. Also add dtparam=pciex1 to explicitly enable the
external PCIe link.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds dtparam=pciex1_gen=3 to config.txt overlay. Benchmarked Gen 2 baseline
on all 3 pg nodes showing consistent ~375 MB/s write throughput, bottlenecked
by the Gen 2 x1 lane limit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On fresh SBC images, the EFI partition has sd-boot UKI files but no
GRUB config. During upgrade, Probe() found sd-boot and used it, which
failed because RPi5/CM5 firmware lacks EFI SetVariableRT support.
Add arm64 guard to Probe(): when no GRUB config is found, skip sd-boot
probing and return a fresh GRUB config. This transitions from sd-boot
to GRUB on the first upgrade from a fresh flash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Patch 0005 fixes talosctl upgrade on SBC layouts (RPi5/CM5) where
the disk has no separate BOOT (XFS) partition — only EFI (VFAT).
Falls back to mounting EFI at /boot for probe, install, and revert.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Talos assumes bare metal kernels support open_tree on anonymous FS
(added in 6.15). The RPi downstream kernel (6.12.x) does not, causing
shadow bind mount failures for /etc files and cascading network init
failures. This patch removes the InContainer() gate so the capability
check runs on all platforms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ip6_gre.ko exists in Talos upstream module list (v1.12.4) but not
in the RPi downstream kernel build. Only add it to the removal side
of the patch, not our custom module list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hardcoded job-level PATH env wiped out nvm/node, breaking
actions/checkout. Use GITHUB_PATH to prepend GNU sed's gnubin
directory while preserving the runner's inherited PATH.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Talos v1.12.4 added kernel/net/ipv6/ip6_gre.ko to modules-arm64.txt.
Update our patch to match. Also silence gmake checkouts-clean stdout
in auto-update.sh to prevent it leaking into GITHUB_OUTPUT.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BSD sed on macOS requires `sed -i ''` but auto-update.sh uses GNU
`sed -i` syntax. The workflows installed gnu-sed via Homebrew but
never added it to PATH, causing "invalid command code M" failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Push the full installer tar with crane first (preserving all layers),
then re-wrap with docker buildx to add provenance and SBOM attestation
for Docker Scout compliance. Buildx can pull the image from the registry
since crane already pushed it, avoiding the docker-container driver
limitation with locally loaded images.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The docker buildx build wrapper with docker-container driver cannot
access locally loaded images, causing it to only capture the first
layer (22MB base) and drop the kernel (~98MB) and overlay (~3MB).
Switch back to crane push which pushes the tar as-is, preserving
all 3 layers. Attestation args remain on actual build steps where
buildx works correctly.
Fixes broken tags: v1.12.3-k6.12.47-3, v1.12.3-k6.12.47-4
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CM5 on Compute Blade doesn't have an SD slot for booting Raspberry Pi
OS. Use rpiboot recovery mode over USB instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents the dd + EEPROM configuration approach for booting Talos
from NVMe on RPi5/CM5. Includes BOOT_ORDER, PCIE_PROBE settings,
and optional PCIe Gen 3 configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NVMe kernel driver is already built-in (CONFIG_BLK_DEV_NVME=y). The
expected approach is simply dd'ing the metal image to NVMe and setting
EEPROM BOOT_ORDER=0xf416 + PCIE_PROBE=1. Pending hardware validation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The overlay was using console=ttyAMA0 (GPIO 14/15) but the RPi5 debug
UART is ttyAMA10 (JST connector between HDMI ports on Pi5, test pads
TP35/TP36 on CM5). Also adds earlycon for early boot output and disables
GPIO UART on Pi5 in config.txt to avoid U-Boot compatibility issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Force GRUB instead of sd-boot on arm64 and pass --no-nvram to
grub-install, working around the SetVariableRT firmware limitation
that prevents in-place upgrades on RPi5/CM5 hardware.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the 16K page override from the kernel patch, preserving
upstream Talos's default 4K pages. RPi5 hardware works correctly
with 4K pages — the RPi Foundation's 16K default is a TLB
performance optimization (~5%), not a hardware requirement.
Benefits:
- Correct memory accounting (4x less overhead per page)
- Full software compatibility (jemalloc, Longhorn, F2FS, etc.)
- No OOM surprises on control-plane nodes
- Aligned with upstream Talos kernel config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Document SetVariableRT upgrade failure, 16K page size implications,
serial console issue, and SBC install disk behavior
- Add production roadmap (4K pages, GRUB boot, serial fix, NVMe)
- Make overlay Go patch conditional: apply only on Go 1.24.x,
skip on 1.25+ where CVEs are already fixed upstream
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker Scout requires buildx-style provenance+SBOM, not cosign
attestations. Replace crane push with docker load + buildx build
(--provenance=mode=max --sbom=true) for the installer image. Use
buildx imagetools create for the release tag to preserve attestations.
Remove cosign/syft from CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move CI/CD, runner setup, secrets, and project structure to
TECHNICAL.md. Streamline README as a user-facing guide with
install/upgrade instructions. Fix Docker badges for arm64.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Attach cosign+syft SBOM attestations to crane-pushed installer and
release images to satisfy Docker Scout supply chain policy. Replace
docker tag/push with crane copy for the release target. Remove the
Scout CVE scan target and clean up release notes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MPL 2.0 LICENSE file for compliance
- Add license section and upstream attribution to README
- Upgrade provenance attestation from mode=min to mode=max
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Patch sbc-raspberrypi5 overlay to use Go 1.24.13 (fixes 1C/7H/12M/1L CVEs)
- Add ATTESTATION_ARGS (--provenance=true --sbom=true) to all buildx targets
- Override upstream --provenance=false via TARGET_ARGS (last flag wins)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>