Adding Windows support to Zarf
tldr; A quick dive into some issues making
zarf
Windows compatible.
Setup Windows dev environment
-
Remote desktop into a Windows 11 machine on my network
-
Install deps w/
scoop
(its like brew for windows)scoop install git curl make go nodejs k9s kubectl kind gpg
-
Startup Docker w/ the
WSL2 backend
-
Provision a fresh cluster w/
kind create cluster
-
Clone zarf w/ Github command-line tool
gh repo clone defenseunicorns/zarf cd zarf
-
Setup GPG Git signing - Github docs
-
Tell Git where to find GPG:
# replace <user> with your username git config --global gpg.program "C:\Users\<user>\scoop\apps\gpg\current\bin\gpg.exe"
-
Start developing
Adding a new build target to the Makefile
Adding a new build target to the Makefile
was simple.
I wasn't looking to make the Makefile Windows compatible, just create a Windows build target.
Once the binary is built, the experience should be 1:1 on *nix, so there is little need to refactor the build system.
...
ZARF_BIN := ./build/zarf
...
ifeq ($(OS),Windows_NT)
ZARF_BIN := $(addsuffix -windows-amd64.exe,$(ZARF_BIN))
else
...
build-cli-windows-amd: build-injector-registry-amd build-ui
GOOS=windows GOARCH=amd64 go build -ldflags="$(BUILD_ARGS)" -o build/zarf-windows-amd64.exe main.go
...
build-cli: build-cli-linux-amd build-cli-linux-arm build-cli-mac-intel build-cli-mac-apple build-cli-windows-amd ## Build the CLI
...
at the time of writing this, I was only looking to get x86_64 Windows support, ARM is a future ask
Fixing filepaths
Since Windows uses \
as a path separator any code doing manual path joins/concatenation needed to be refactored.
Modern Windows is pretty chill about file paths using
/
, but it's best to be safe and utilize Go's native OS handling
examples:
// src/internal/helm/utils.go
import (
...
"path/filepath"
...
)
// StandardName generates a predictable full path for a helm chart for Zarf
func StandardName(destination string, chart types.ZarfChart) string {
return destination + "/" + chart.Name + "-" + chart.Version
}
// becomes
func StandardName(destination string, chart types.ZarfChart) string {
return filepath.Join(destination, chart.Name+"-"+chart.Version)
}
// src/internal/packager/validate/validate.go
import (
...
"os"
"path/filepath"
"strings"
...
)
...
// add a forward slash to end of path if it doesn't have one
if .strings.HasSuffix(path, "/") {
path = path + "/"
}
...
// becomes
if .strings.HasSuffix(path, string(os.PathSeparator)) {
path = filepath.Clean(path) + string(os.PathSeparator)
}
Troubleshooting zarf init
At this point compiling works just fine:
$ make build-cli-windows-amd && ls build
Directory: C:\bin\zarf\build
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 10/4/2022 4:42 PM ui
-a--- 10/4/2022 4:42 PM 13119488 zarf-registry-amd64
-a--- 10/4/2022 4:42 PM 101293056 zarf-windows-amd64.exe
&&
works because I am using PowerShell Version 7+
But zarf init
fails.
$ cd build
$ kind delete cluster && kind create cluster
$ .\zarf-windows-amd64.exe init -l=trace
# very long output omitted for sanity
panic: runtime error: index out of range [0] with length 0
goroutine 1 [running]:
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).getAttachablePodForService(0xc00068e3f0)
C:/bin/zarf/src/internal/k8s/tunnel.go:449 +0x18d
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).getAttachablePodForResource(0xc00068e3f0)
C:/bin/zarf/src/internal/k8s/tunnel.go:429 +0xaa
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).establish(0xc00068e3f0)
C:/bin/zarf/src/internal/k8s/tunnel.go:326 +0x445
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).Connect(0xc00068e3f0, {0x7ff68c016fdf, 0x8}, 0x0)
C:/bin/zarf/src/internal/k8s/tunnel.go:191 +0x39a
github.com/defenseunicorns/zarf/src/internal/packager.hasSeedImages(0xc0005ec1b0)
C:/bin/zarf/src/internal/packager/injector.go:200 +0xd1
github.com/defenseunicorns/zarf/src/internal/packager.runInjectionMadness({{0xc000a8a390, 0x30}, {0xc00097f300, 0x3e}, {0xc00097f380, 0x3e}, {0xc00097f400, 0x3f}, {0xc00097f480, 0x3b}, ...})
C:/bin/zarf/src/internal/packager/injector.go:104 +0x73f
github.com/defenseunicorns/zarf/src/internal/packager.deployComponents({{0xc000a8a390, 0x30}, {0xc00097f300, 0x3e}, {0xc00097f380, 0x3e}, {0xc00097f400, 0x3f}, {0xc00097f480, 0x3b}, ...}, ...)
C:/bin/zarf/src/internal/packager/deploy.go:158 +0x2fb
github.com/defenseunicorns/zarf/src/internal/packager.Deploy()
C:/bin/zarf/src/internal/packager/deploy.go:112 +0x8d8
github.com/defenseunicorns/zarf/src/cmd.glob..func5(0x7ff68b5dd880?, {0x7ff68c0025d6?, 0x1?, 0x1?})
C:/bin/zarf/src/cmd/initialize.go:120 +0x725
github.com/spf13/cobra.(*Command).execute(0x7ff68b5dd880, {0xc000bc5a70, 0x1, 0x1})
C:/Users/razzle/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0x7ff68b5dbd00)
C:/Users/razzle/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
C:/Users/razzle/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918
github.com/defenseunicorns/zarf/src/cmd.Execute()
C:/bin/zarf/src/cmd/root.go:49 +0x25
main.main()
C:/bin/zarf/main.go:20 +0x6f
Super clear error message am I right?
After a few short hours of troubleshooting, I was able to trace the root of the issue to src\internal\packager\injector.go
.
For some context, when zarf initializes it spins up an injector
pod in the zarf
namespace. This pod carries out two stages.
There are two VolumeMounts created for this pod, zarf-stage1
which is a collection of ConfigMaps, and zarf-stage2
which is initialized as an EmptyDir (this distinciton is important later on).
init-injector
In stage1 (run in an InitContainer), the cleverly designed Rust binary zarf-injector
is mounted into the pod and used to create /zarf-stage2/seed-image.tar
.
injector
Ok, stage2, hype city. In this stage the internal registry is populated using the Go binary /zarf-stage2/zarf-registry
.
Looking at the logs in injector
, this is where the core issue lies.
$ kubectl describe pods/injector -n zarf | Select-String Message:
Message: failed to create containerd task: failed to create shim task:
OCI runtime create failed: runccreate failed: unable to start container
process: exec: "/zarf-stage2/zarf-registry": permission denied: unknown
From this message, the issue is that there is a permission issue on /zarf-stage2/zarf-registry
.
Swapping the container image to ubuntu
, and changing injector
's startup command to ["ls", "-la"]
, I got these interesting results:
# stage2's contents when run on mac
drwxrwxrwx 2 root root 4096 Oct 3 22:55 .
drwxr-xr-x 1 root root 4096 Oct 3 22:55 ..
-rw-r--r-- 1 root root 9278976 Oct 3 22:55 seed-image.tar
-rwx------ 1 root root 12582912 Oct 3 22:55 zarf-registry
# stage2's contents when run on windows
drwxrwxrwx 2 root root 4096 Oct 3 22:49 .
drwxr-xr-x 1 root root 4096 Oct 3 22:49 ..
-rw-rw-rw- 1 root root 9951232 Oct 3 22:48 seed-image.tar
-rw-rw-rw- 1 root root 13119488 Oct 3 22:48 zarf-registry
so why the permissions mismatch?
Answer: differences between *nix file systems and Windows file system
There is no chmod +x
in Windows
Windows doesn't determine if a file is executable via a filesystem flag, it determines via filename.
Anything on $PATH
that ends in .exe
is an executable. However, our Rust/Golang Linux compiled binaries do not.
So... during the interim stage, when our zarf-init-amd64.tar.zst
is unpacked into a temporary directory...
all files lose ability to be executed.
This is where the fun begins
EmptyDir doesn't have the ability to chmod
itself upon instantiation, but ConfigMaps do.
So let's leverage our already existing hack (zarf-injector
) to chmod 777 /zarf-stage2/*
.
// src/injector/stage1/src/main.rs
use glob::glob;
use std::os::unix::fs::PermissionsExt;
fn chmod777(path: &str) {
println.("chmod 777 {}", path);
fs::set_permissions(path, PermissionsExt::from_mode(0o777)).unwrap();
}
fn main() {
...
for entry in glob("/zarf-stage2/**/*").unwrap() {
match entry {
Ok(path) => chmod777(path.to_str().unwrap()),
Err(e) => println.("{:?}", e),
}
}
}
Using the build-rust-injector.yml
workflow to release an alpha version of this new Rust binary, and changing the init
package's refs:
# packages/zarf-injector/zarf.yaml
...
components:
- name: zarf-injector
only:
cluster:
architecture: amd64
...
files:
# Rust Injector Binary
- source: sget://defenseunicorns/zarf-injector:amd64-v0.20.0-32-alpha
...
- name: zarf-injector
only:
cluster:
architecture: arm64
...
files:
# Rust Injector Binary
- source: sget://defenseunicorns/zarf-injector:arm64-v0.20.0-32-alpha
...
We can now run zarf init
on Windows.
$ kind delete cluster && kind create cluster
$ make build-cli-windows-amd
$ make init-package
$ cd build
$ .\zarf-windows-amd-64.exe init
...
✔ Zarf deployment complete
Application | Username | Password | Connect
Registry | zarf-push | never | zarf connect registry
Logging | zarf-admin | gonna | zarf connect logging
Git | zarf-git-user | give | zarf connect git
Git (read-only) | zarf-git-read-user | you | zarf connect git
Creating CI tests
Running Docker on Windows in Github CI is not an option currently. Full Stop.
So, only tests matching src/test/e2e/[00-19]_*_test.go
(this is not a real regular expression) can be run.
Drawing from previous test workflows, namely test-kind.yml
and test-k3d.yml
, I drew up a Windows workflow:
# .github/workflows/test-windows.yml
# parts of the configuration have been omitted due to overlap with other test files
# this showcase is only meant to highlight the differences in CI between windows and linux
name: test-windows
...
jobs:
validate:
runs-on: windows-latest
steps:
- name: "Dependency: Install Golang"
uses: actions/setup-go@v3
with:
go-version: 1.19.x
- name: "Dependency: Install Scoop+Make+NodeJS"
shell: pwsh
run: |
Set-ExecutionPolicy RemoteSigned -scope CurrentUser
iex "& {$(irm get.scoop.sh)} -RunAsAdmin"
Join-Path (Resolve-Path ~).Path "scoop\shims" >> $Env:GITHUB_PATH
scoop install [email protected] [email protected]
...
# this stuff in the middle is pretty 1:1 w/ the linux workflow
- name: "Run Tests"
shell: pwsh
run: |
make test-e2e ARCH=amd64 -e RUN_CLUSTER_TESTS=false
# ^ this environment variable is later used to only run certain tests
- name: "Cleanup"
shell: pwsh
run: |
Remove-Item -Recurse -Force .\build\
Ok, the CI workflow is done, but what happens when we run make test-e2e
? A f*ck ton of errors is what happens. This is the final hurdle, and I am still not 100% done.
To summarize:
examples/component-scripts
was broken because the scripts usedtouch
, which doesn't exist on Windowssrc/cmd/package.go
needed some regular expression adjusted to allow\
and:
as valid characters in Zarf's cache pathsrc/test/e2e/00_use_cli_test.go
needed an error message expectation tweak due to OS differencessrc/test/e2e/20+
tests had code added to them to skip if the env varRUN_CLUSTER_TESTS
isfalse
There is still a lot to do, but getting Zarf fully Windows capable is nearly there.
UPDATE: 2022-10-08
PR
#832
has been merged intomaster
and will be included in the next release of Zarf.Windows support is still in alpha, but it's now possible to run Zarf on Windows.