Commit Graph

58 Commits

Author SHA1 Message Date
d260fe20ab Add reference architecture for production CM5 clusters
Document recommended storage layouts per node role:
- Control planes: NVMe boot for fast etcd I/O
- Postgres/storage: eMMC boot + NVMe data at /var/mnt/data
- Compute workers: eMMC only, stateless

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 16:53:19 -05:00
4e867e2055 Fix boot failure: remove duplicate config.txt section headers
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m38s
The [pi5]/[all] section headers in configTxtAppend created duplicate
sections when concatenated with the overlay's base config.txt, which
already ends with [pi5]/[all]. The RPi firmware parser choked on the
duplicate headers, preventing NVMe boot on fresh installs.

Remove the section headers — dtparam and overclock settings now land
under the existing [all] scope from the base config.txt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:00:54 -05:00
d4a55c670c Apply SBC overlay upgrade fix, add PCIe Gen 3 docs
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m11s
Check Upstream Updates / check-and-build (push) Successful in 5s
Apply overlay patch 0003 (EFI mount path detection for SBC layouts)
in the build so upgrades write firmware/config.txt to the correct
path. Update README with patch 0003, PCIe Gen 3 in features list,
and expanded PCIe Gen 3 instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 14:29:17 -05:00
5152b6cb44 Simplify config: use configTxtAppend, drop SBC overlay patch
All checks were successful
Build Talos CM5 Image / build (push) Successful in 2m41s
Check Upstream Updates / check-and-build (push) Successful in 5s
Config.txt is set correctly at flash time. Upgrades via talosctl
don't override firmware config (overlay writes to wrong path on
SBC layout, which is harmless). Users who need custom config.txt
set it once during initial flash.

- Use configTxtAppend with PCIe Gen 3 + overclock
- Put dtparam=pciex1_gen=3 in [pi5] section
- Remove patch 0003 (SBC overlay upgrade fix) — too risky,
  deleted GRUB's BOOTAA64.EFI in v8
- Remove full configTxt replacement mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 16:47:42 -05:00
338a2c0021 Fix SBC overlay upgrade: preserve GRUB, enable PCIe Gen 3
All checks were successful
Build Talos CM5 Image / build (push) Successful in 2m58s
The v8 overlay patch deleted /boot/EFI/ to clean up stale firmware,
but this also removed GRUB's BOOTAA64.EFI, bricking the node.

Fix: keep SBC layout detection (write to /boot/ not /boot/EFI/) but
remove the os.RemoveAll that destroyed GRUB. Stale firmware files in
/boot/EFI/ are harmless.

Re-enable PCIe Gen 3 (dtparam=pciex1_gen=3) and full configTxt mode,
now that the overlay installer correctly writes to the EFI partition
root on SBC layouts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 15:50:52 -05:00
b5201f7906 Revert PCIe Gen 3 config: restore safe defaults for public image
All checks were successful
Build Talos CM5 Image / build (push) Successful in 2m55s
Check Upstream Updates / check-and-build (push) Successful in 4s
The PCIe Gen 3 changes (dtparam=pciex1_gen=3, full configTxt
replacement, SBC overlay upgrade fix) caused boot failures during
talosctl upgrade on CM5 nodes. Revert to the pre-Gen3 state:
- configTxtAppend (overclock only) instead of full configTxt replacement
- Remove 0003 overlay patch application (kept in patches/ for future use)

PCIe Gen 3 support will be re-added after root cause analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:27:37 -05:00
91d86de629 Fix SBC overlay upgrade + simplify PCIe Gen 3 config
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m22s
Two fixes in one:

1. SBC overlay upgrade path: the overlay installer was always writing
   to /boot/EFI, but on SBC layouts (no BOOT partition) the GRUB code
   mounts EFI at /boot. Config.txt and firmware ended up in a stale
   /boot/EFI/ subdirectory, invisible to the firmware. The installer
   now detects the SBC layout and writes to the correct location.

2. PCIe Gen 3: dtparam=pciex1_gen=3 works on CM5 (the DT overrides
   exist), so the custom pcie-gen3.dtbo overlay is unnecessary.
   Simplified to just use dtparam in config.txt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 19:45:39 -05:00
3cfbe794f7 Fix PCIe Gen 3 on CM5: custom DT overlay for missing pciex1 alias
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m20s
The CM5 DTB (bcm2712-rpi-cm5-cm5io.dtb) lacks the pciex1 alias that
the Pi 5 DTB provides, making dtparam=pciex1_gen=3 silently fail.

Add a custom device tree overlay (pcie-gen3.dtbo) that targets
/axi/pcie@1000110000 directly to set max-link-speed = <3>. The overlay
is embedded in the SBC installer and written to /boot/EFI/overlays/
during install/upgrade.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 19:08:26 -05:00
a9cc56e315 Fix PCIe Gen 3: move dtparam into [pi5] section of config.txt
All checks were successful
Build Talos CM5 Image / build (push) Successful in 2m53s
The dtparam=pciex1_gen=3 was being appended after the [all] section,
but RPi firmware requires PCIe dtparams in the [pi5] section.
Switch from configTxtAppend to full configTxt replacement to control
section ordering. Also add dtparam=pciex1 to explicitly enable the
external PCIe link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:45:05 -05:00
66a3d11984 Enable PCIe Gen 3 for NVMe: ~800 MB/s vs ~375 MB/s Gen 2
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m25s
Adds dtparam=pciex1_gen=3 to config.txt overlay. Benchmarked Gen 2 baseline
on all 3 pg nodes showing consistent ~375 MB/s write throughput, bottlenecked
by the Gen 2 x1 lane limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:09:30 -05:00
754f49c562 Update README: bump example tag to v1.12.4-k6.12.47-4
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:51:17 -05:00
4fed64844a Fix GRUB patch: skip sd-boot probe on arm64 for first upgrade
All checks were successful
Build Talos CM5 Image / build (push) Successful in 4m50s
On fresh SBC images, the EFI partition has sd-boot UKI files but no
GRUB config. During upgrade, Probe() found sd-boot and used it, which
failed because RPi5/CM5 firmware lacks EFI SetVariableRT support.

Add arm64 guard to Probe(): when no GRUB config is found, skip sd-boot
probing and return a fresh GRUB config. This transitions from sd-boot
to GRUB on the first upgrade from a fresh flash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:50:42 -05:00
8c562c7155 Update README: NVMe boot tested on Compute Blade
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:18:24 -05:00
fc020410f1 Update README: in-place upgrades tested, add patches table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 12:40:13 -05:00
ca36438d12 Add GRUB SBC upgrade patch: handle missing BOOT partition
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m15s
Patch 0005 fixes talosctl upgrade on SBC layouts (RPi5/CM5) where
the disk has no separate BOOT (XFS) partition — only EFI (VFAT).
Falls back to mounting EFI at /boot for probe, install, and revert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 11:37:05 -05:00
6cffb4e311 Add opentree fallback patch for RPi downstream kernel (<6.15)
All checks were successful
Build Talos CM5 Image / build (push) Successful in 2m56s
Check Upstream Updates / check-and-build (push) Successful in 4s
Talos assumes bare metal kernels support open_tree on anonymous FS
(added in 6.15). The RPi downstream kernel (6.12.x) does not, causing
shadow bind mount failures for /etc files and cascading network init
failures. This patch removes the InContainer() gate so the capability
check runs on all platforms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 14:25:03 -05:00
Mathias Beaulieu-Duncan
5c81953278 Fix modules patch: ip6_gre.ko not in RPi downstream kernel
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m17s
ip6_gre.ko exists in Talos upstream module list (v1.12.4) but not
in the RPi downstream kernel build. Only add it to the removal side
of the patch, not our custom module list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:55:45 -05:00
Mathias Beaulieu-Duncan
a4e934a4e9 Fix CI PATH: prepend GNU sed via GITHUB_PATH instead of replacing PATH
Some checks failed
Build Talos CM5 Image / build (push) Failing after 28s
The hardcoded job-level PATH env wiped out nvm/node, breaking
actions/checkout. Use GITHUB_PATH to prepend GNU sed's gnubin
directory while preserving the runner's inherited PATH.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:54:34 -05:00
Mathias Beaulieu-Duncan
6c75585c0a Bump upstream: v1.12.4-k6.12.47-1
Some checks failed
Build Talos CM5 Image / build (push) Failing after 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:53:12 -05:00
Mathias Beaulieu-Duncan
37f9292ef1 Update arm64 modules patch for Talos v1.12.4 (add ip6_gre)
Talos v1.12.4 added kernel/net/ipv6/ip6_gre.ko to modules-arm64.txt.
Update our patch to match. Also silence gmake checkouts-clean stdout
in auto-update.sh to prevent it leaking into GITHUB_OUTPUT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:50:45 -05:00
Mathias Beaulieu-Duncan
dc37b435c3 Fix GNU sed PATH in CI workflows for macOS runner
BSD sed on macOS requires `sed -i ''` but auto-update.sh uses GNU
`sed -i` syntax. The workflows installed gnu-sed via Homebrew but
never added it to PATH, causing "invalid command code M" failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:45:46 -05:00
Mathias Beaulieu-Duncan
58b9ccb56c Add supply chain attestation to installer image via crane + buildx
Some checks failed
Build Talos CM5 Image / build (push) Successful in 5m19s
Check Upstream Updates / check-and-build (push) Failing after 13s
Push the full installer tar with crane first (preserving all layers),
then re-wrap with docker buildx to add provenance and SBOM attestation
for Docker Scout compliance. Buildx can pull the image from the registry
since crane already pushed it, avoiding the docker-container driver
limitation with locally loaded images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:57:54 -05:00
Mathias Beaulieu-Duncan
784fb4d5f6 Fix installer image missing kernel and overlay layers
All checks were successful
Build Talos CM5 Image / build (push) Successful in 4m5s
The docker buildx build wrapper with docker-container driver cannot
access locally loaded images, causing it to only capture the first
layer (22MB base) and drop the kernel (~98MB) and overlay (~3MB).

Switch back to crane push which pushes the tar as-is, preserving
all 3 layers. Attestation args remain on actual build steps where
buildx works correctly.

Fixes broken tags: v1.12.3-k6.12.47-3, v1.12.3-k6.12.47-4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:46:10 -05:00
Mathias Beaulieu-Duncan
9c0075057b Use rpiboot for EEPROM config in NVMe guide
CM5 on Compute Blade doesn't have an SD slot for booting Raspberry Pi
OS. Use rpiboot recovery mode over USB instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:05:13 -05:00
Mathias Beaulieu-Duncan
5b59f8de8d Add NVMe boot guide (untested) to README
Documents the dd + EEPROM configuration approach for booting Talos
from NVMe on RPi5/CM5. Includes BOOT_ORDER, PCIE_PROBE settings,
and optional PCIe Gen 3 configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:00:16 -05:00
Mathias Beaulieu-Duncan
f3132a310e Update NVMe boot status: dd + EEPROM config approach
NVMe kernel driver is already built-in (CONFIG_BLK_DEV_NVME=y). The
expected approach is simply dd'ing the metal image to NVMe and setting
EEPROM BOOT_ORDER=0xf416 + PCIE_PROBE=1. Pending hardware validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:57:01 -05:00
Mathias Beaulieu-Duncan
970d9685f1 Fix serial console for RPi5/CM5 debug UART (ttyAMA10)
The overlay was using console=ttyAMA0 (GPIO 14/15) but the RPi5 debug
UART is ttyAMA10 (JST connector between HDMI ports on Pi5, test pads
TP35/TP36 on CM5). Also adds earlycon for early boot output and disables
GPIO UART on Pi5 in config.txt to avoid U-Boot compatibility issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:47:18 -05:00
Mathias Beaulieu-Duncan
689b9402a8 Add GRUB bootloader patches for talosctl upgrade on RPi5/CM5
All checks were successful
Build Talos CM5 Image / build (push) Successful in 1h4m48s
Force GRUB instead of sd-boot on arm64 and pass --no-nvram to
  grub-install, working around the SetVariableRT firmware limitation
  that prevents in-place upgrades on RPi5/CM5 hardware.

  Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:20:18 -05:00
Mathias Beaulieu-Duncan
b1eb322d7b Switch to 4K page size for production readiness
Remove the 16K page override from the kernel patch, preserving
upstream Talos's default 4K pages. RPi5 hardware works correctly
with 4K pages — the RPi Foundation's 16K default is a TLB
performance optimization (~5%), not a hardware requirement.

Benefits:
- Correct memory accounting (4x less overhead per page)
- Full software compatibility (jemalloc, Longhorn, F2FS, etc.)
- No OOM surprises on control-plane nodes
- Aligned with upstream Talos kernel config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:28:22 -05:00
Mathias Beaulieu-Duncan
8178ba195e Add known issues, roadmap, and conditional Go toolchain patch
- Document SetVariableRT upgrade failure, 16K page size implications,
  serial console issue, and SBC install disk behavior
- Add production roadmap (4K pages, GRUB boot, serial fix, NVMe)
- Make overlay Go patch conditional: apply only on Go 1.24.x,
  skip on 1.25+ where CVEs are already fixed upstream

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:05:51 -05:00
Mathias Beaulieu-Duncan
d933444fbc Fix double-v badge bug and add table segment updates in README sync
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:33:33 -05:00
Mathias Beaulieu-Duncan
09addfa626 Auto-update README versions when upstream updates are detected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:28:40 -05:00
Mathias Beaulieu-Duncan
7fceae1418 Point all version badges to upstream repo main pages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:16:25 -05:00
Mathias Beaulieu-Duncan
6ca561592f Fix RPi kernel badge link — repo has no version-tagged releases
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:14:58 -05:00
Mathias Beaulieu-Duncan
2b2205f503 Link version badges to upstream GitHub releases
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:06:48 -05:00
Mathias Beaulieu-Duncan
6f24c8ef46 Replace cosign with buildx attestations for Docker Scout compliance
All checks were successful
Build Talos CM5 Image / build (push) Successful in 2m49s
Docker Scout requires buildx-style provenance+SBOM, not cosign
attestations. Replace crane push with docker load + buildx build
(--provenance=mode=max --sbom=true) for the installer image. Use
buildx imagetools create for the release tag to preserve attestations.
Remove cosign/syft from CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 17:05:20 -05:00
Mathias Beaulieu-Duncan
2f307aecec Open all external links in new tab (target=_blank)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:57:29 -05:00
Mathias Beaulieu-Duncan
ee085a7606 Replace version table with Docker-style badges for all components
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:56:46 -05:00
Mathias Beaulieu-Duncan
907dd98b24 Split README into user manual and TECHNICAL.md
Move CI/CD, runner setup, secrets, and project structure to
TECHNICAL.md. Streamline README as a user-facing guide with
install/upgrade instructions. Fix Docker badges for arm64.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:55:37 -05:00
Mathias Beaulieu-Duncan
2618de74e8 Update README with Docker Hub badges, version table, and tag format
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:51:04 -05:00
Mathias Beaulieu-Duncan
ba3c42f561 Add SBOM attestations to installer/release images, remove Scout
All checks were successful
Build Talos CM5 Image / build (push) Successful in 7m0s
Attach cosign+syft SBOM attestations to crane-pushed installer and
release images to satisfy Docker Scout supply chain policy. Replace
docker tag/push with crane copy for the release target. Remove the
Scout CVE scan target and clean up release notes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:48:56 -05:00
Mathias Beaulieu-Duncan
44aa3793ee Add LICENSE, update README, upgrade provenance to max-mode
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m29s
- Add MPL 2.0 LICENSE file for compliance
- Add license section and upstream attribution to README
- Upgrade provenance attestation from mode=min to mode=max

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:57:11 -05:00
Mathias Beaulieu-Duncan
5abca73056 Fix 21 Go stdlib CVEs and enable supply chain attestations
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m26s
- Patch sbc-raspberrypi5 overlay to use Go 1.24.13 (fixes 1C/7H/12M/1L CVEs)
- Add ATTESTATION_ARGS (--provenance=true --sbom=true) to all buildx targets
- Override upstream --provenance=false via TARGET_ARGS (last flag wins)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:36:13 -05:00
Mathias Beaulieu-Duncan
0d3941eb91 Add daily auto-update workflow and fix overlay dirty tag
All checks were successful
Build Talos CM5 Image / build (push) Successful in 3m6s
- Rewrite check-upstream.sh to parse RPi kernel version from patch file
- Add auto-update.sh for automated version bumps with patch smoke test
- Rewrite check-updates.yaml as daily auto-build with issue fallback
- Update build.yaml release body to show Talos + kernel versions from tag
- Fix overlay dirty tag: remove --dirty from SBCOVERLAY_TAG git describe
  (the sed rewrite of pkg.yaml is intentional, not an accidental change)

Tag strategy: v{TALOS}-k{KERNEL}-{BUILD} (e.g. v1.12.3-k6.12.47-1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 15:05:46 -05:00
Mathias Beaulieu-Duncan
3a824e960f Regenerate talos patch for v1.12.3
Some checks failed
Build Talos CM5 Image / build (push) Failing after 31m33s
Patch was stale — regenerated from the working checkout to match
the v1.12.3 hack/modules-arm64.txt index.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:27:34 -05:00
Mathias Beaulieu-Duncan
f2b8a0ec65 Fix talos patch — restore hack/modules-arm64.txt
Some checks failed
Build Talos CM5 Image / build (push) Failing after 13s
The talos patch was incorrectly replaced with pkgs-repo changes
(Pkgfile, kernel config). Restored the correct patch that modifies
hack/modules-arm64.txt in the talos checkout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:26:12 -05:00
Mathias Beaulieu-Duncan
a3a3881cff Bump RPi kernel to stable_20250916 (6.12.47)
Some checks failed
Build Talos CM5 Image / build (push) Failing after 19s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:11:55 -05:00
Mathias Beaulieu-Duncan
2b5fd0a25e Update patches for Talos v1.12.3 / pkgs v1.12.0
Regenerated patches to match current upstream checkouts:
- pkgs: updated kernel version, checksums, and config-arm64
- talos: reworked to patch Pkgfile, kernel config, and pkg.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:06:01 -05:00
Mathias Beaulieu-Duncan
e98c573bae Add Docker Scout CVE scanning and switch CI to gmake
- Add `scout` Makefile target that scans all 5 pushed images with
  `docker scout quickview` and writes a summary to _out/scout-report.md
- Switch all CI workflow steps from `make` to `gmake` for GNU Make 4.x
- Add brew dependency step for make, gnu-sed, and crane
- Include CVE summary in Gitea release notes via jq JSON escaping
- Update `clean` target to remove _out/ directory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:01:13 -05:00
623c5d3694 Fix Docker Buildx setup for Docker Desktop on macOS
Some checks failed
Build Talos CM5 Image / build (push) Failing after 1s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 18:43:27 -05:00