tldr; A quick dive into some issues making zarf Windows compatible.

Setup Windows dev environment

  1. Remote desktop into a Windows 11 machine on my network

  2. Install deps w/ scoop (its like brew for windows)

    scoop install git curl make go nodejs k9s kubectl kind gpg
    
  3. Startup Docker w/ the WSL2 backend

  4. Provision a fresh cluster w/

    kind create cluster
    
  5. Clone zarf w/ Github command-line tool

    gh repo clone defenseunicorns/zarf
    cd zarf
    
  6. Setup GPG Git signing - Github docs

  7. Tell Git where to find GPG:

    # replace <user> with your username
    git config --global gpg.program "C:\Users\<user>\scoop\apps\gpg\current\bin\gpg.exe"
    
  8. Start developing

Adding a new build target to the Makefile

Adding a new build target to the Makefile was simple.

I wasn't looking to make the Makefile Windows compatible, just create a Windows build target.

Once the binary is built, the experience should be 1:1 on *nix, so there is little need to refactor the build system.

...
ZARF_BIN := ./build/zarf
...
ifeq ($(OS),Windows_NT)
    ZARF_BIN := $(addsuffix -windows-amd64.exe,$(ZARF_BIN))
else
...
build-cli-windows-amd: build-injector-registry-amd build-ui
    GOOS=windows GOARCH=amd64 go build -ldflags="$(BUILD_ARGS)" -o build/zarf-windows-amd64.exe main.go
...
build-cli: build-cli-linux-amd build-cli-linux-arm build-cli-mac-intel build-cli-mac-apple build-cli-windows-amd ## Build the CLI
...

at the time of writing this, I was only looking to get x86_64 Windows support, ARM is a future ask

Fixing filepaths

Since Windows uses \ as a path separator any code doing manual path joins/concatenation needed to be refactored.

Modern Windows is pretty chill about file paths using /, but it's best to be safe and utilize Go's native OS handling

examples:

// src/internal/helm/utils.go
import (
    ...
    "path/filepath"
    ...
)

// StandardName generates a predictable full path for a helm chart for Zarf
func StandardName(destination string, chart types.ZarfChart) string {
    return destination + "/" + chart.Name + "-" + chart.Version
}
// becomes
func StandardName(destination string, chart types.ZarfChart) string {
    return filepath.Join(destination, chart.Name+"-"+chart.Version)
}
// src/internal/packager/validate/validate.go
import (
    ...
    "os"
    "path/filepath"
    "strings"
    ...
)
    ...
    // add a forward slash to end of path if it doesn't have one
    if .strings.HasSuffix(path, "/") {
        path = path + "/"
    }
    ...
    // becomes
    if .strings.HasSuffix(path, string(os.PathSeparator)) {
        path = filepath.Clean(path) + string(os.PathSeparator)
    }

Troubleshooting zarf init

At this point compiling works just fine:

$ make build-cli-windows-amd && ls build

    Directory: C:\bin\zarf\build

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----           10/4/2022  4:42 PM                ui
-a---           10/4/2022  4:42 PM       13119488 zarf-registry-amd64
-a---           10/4/2022  4:42 PM      101293056 zarf-windows-amd64.exe

&& works because I am using PowerShell Version 7+

But zarf init fails.

$ cd build
$ kind delete cluster && kind create cluster
$ .\zarf-windows-amd64.exe init -l=trace

# very long output omitted for sanity

panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).getAttachablePodForService(0xc00068e3f0)
        C:/bin/zarf/src/internal/k8s/tunnel.go:449 +0x18d
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).getAttachablePodForResource(0xc00068e3f0)
        C:/bin/zarf/src/internal/k8s/tunnel.go:429 +0xaa
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).establish(0xc00068e3f0)
        C:/bin/zarf/src/internal/k8s/tunnel.go:326 +0x445
github.com/defenseunicorns/zarf/src/internal/k8s.(*Tunnel).Connect(0xc00068e3f0, {0x7ff68c016fdf, 0x8}, 0x0)
        C:/bin/zarf/src/internal/k8s/tunnel.go:191 +0x39a
github.com/defenseunicorns/zarf/src/internal/packager.hasSeedImages(0xc0005ec1b0)
        C:/bin/zarf/src/internal/packager/injector.go:200 +0xd1
github.com/defenseunicorns/zarf/src/internal/packager.runInjectionMadness({{0xc000a8a390, 0x30}, {0xc00097f300, 0x3e}, {0xc00097f380, 0x3e}, {0xc00097f400, 0x3f}, {0xc00097f480, 0x3b}, ...})
        C:/bin/zarf/src/internal/packager/injector.go:104 +0x73f
github.com/defenseunicorns/zarf/src/internal/packager.deployComponents({{0xc000a8a390, 0x30}, {0xc00097f300, 0x3e}, {0xc00097f380, 0x3e}, {0xc00097f400, 0x3f}, {0xc00097f480, 0x3b}, ...}, ...)
        C:/bin/zarf/src/internal/packager/deploy.go:158 +0x2fb
github.com/defenseunicorns/zarf/src/internal/packager.Deploy()
        C:/bin/zarf/src/internal/packager/deploy.go:112 +0x8d8
github.com/defenseunicorns/zarf/src/cmd.glob..func5(0x7ff68b5dd880?, {0x7ff68c0025d6?, 0x1?, 0x1?})
        C:/bin/zarf/src/cmd/initialize.go:120 +0x725
github.com/spf13/cobra.(*Command).execute(0x7ff68b5dd880, {0xc000bc5a70, 0x1, 0x1})
        C:/Users/razzle/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0x7ff68b5dbd00)
        C:/Users/razzle/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        C:/Users/razzle/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918
github.com/defenseunicorns/zarf/src/cmd.Execute()
        C:/bin/zarf/src/cmd/root.go:49 +0x25
main.main()
        C:/bin/zarf/main.go:20 +0x6f

Super clear error message am I right?

After a few short hours of troubleshooting, I was able to trace the root of the issue to src\internal\packager\injector.go.

For some context, when zarf initializes it spins up an injector pod in the zarf namespace. This pod carries out two stages.

There are two VolumeMounts created for this pod, zarf-stage1 which is a collection of ConfigMaps, and zarf-stage2 which is initialized as an EmptyDir (this distinciton is important later on).

init-injector

In stage1 (run in an InitContainer), the cleverly designed Rust binary zarf-injector is mounted into the pod and used to create /zarf-stage2/seed-image.tar.

injector

Ok, stage2, hype city. In this stage the internal registry is populated using the Go binary /zarf-stage2/zarf-registry.

Looking at the logs in injector, this is where the core issue lies.

$ kubectl describe pods/injector -n zarf | Select-String Message:

Message:      failed to create containerd task: failed to create shim task: 
OCI runtime create failed: runccreate failed: unable to start container 
process: exec: "/zarf-stage2/zarf-registry": permission denied: unknown

From this message, the issue is that there is a permission issue on /zarf-stage2/zarf-registry.

Swapping the container image to ubuntu, and changing injector's startup command to ["ls", "-la"], I got these interesting results:

# stage2's contents when run on mac
drwxrwxrwx 2 root root 4096 Oct 3 22:55 .
drwxr-xr-x 1 root root 4096 Oct 3 22:55 ..
-rw-r--r-- 1 root root 9278976 Oct 3 22:55 seed-image.tar
-rwx------ 1 root root 12582912 Oct 3 22:55 zarf-registry

# stage2's contents when run on windows
drwxrwxrwx 2 root root 4096 Oct 3 22:49 .
drwxr-xr-x 1 root root 4096 Oct 3 22:49 ..
-rw-rw-rw- 1 root root 9951232 Oct 3 22:48 seed-image.tar
-rw-rw-rw- 1 root root 13119488 Oct 3 22:48 zarf-registry

so why the permissions mismatch?

Answer: differences between *nix file systems and Windows file system

There is no chmod +x in Windows

Windows doesn't determine if a file is executable via a filesystem flag, it determines via filename.

Anything on $PATH that ends in .exe is an executable. However, our Rust/Golang Linux compiled binaries do not.

So... during the interim stage, when our zarf-init-amd64.tar.zst is unpacked into a temporary directory... all files lose ability to be executed.

This is where the fun begins

EmptyDir doesn't have the ability to chmod itself upon instantiation, but ConfigMaps do.

So let's leverage our already existing hack (zarf-injector) to chmod 777 /zarf-stage2/*.

// src/injector/stage1/src/main.rs
use glob::glob;
use std::os::unix::fs::PermissionsExt;

fn chmod777(path: &str) {
    println.("chmod 777 {}", path);
    fs::set_permissions(path, PermissionsExt::from_mode(0o777)).unwrap();
}

fn main() {
    ...

    for entry in glob("/zarf-stage2/**/*").unwrap() {
        match entry {
            Ok(path) => chmod777(path.to_str().unwrap()),
            Err(e) => println.("{:?}", e),
        }
    }
}

Using the build-rust-injector.yml workflow to release an alpha version of this new Rust binary, and changing the init package's refs:

# packages/zarf-injector/zarf.yaml

...

components:
  - name: zarf-injector
    only:
      cluster:
        architecture: amd64
    ...
    files:
      # Rust Injector Binary
      - source: sget://defenseunicorns/zarf-injector:amd64-v0.20.0-32-alpha
      ...

  - name: zarf-injector
    only:
      cluster:
        architecture: arm64
    ...
    files:
      # Rust Injector Binary
      - source: sget://defenseunicorns/zarf-injector:arm64-v0.20.0-32-alpha
      ...

We can now run zarf init on Windows.

$ kind delete cluster && kind create cluster
$ make build-cli-windows-amd
$ make init-package
$ cd build
$ .\zarf-windows-amd-64.exe init

...

  ✔  Zarf deployment complete

     Application     | Username           | Password | Connect
     Registry        | zarf-push          | never    | zarf connect registry
     Logging         | zarf-admin         | gonna    | zarf connect logging
     Git             | zarf-git-user      | give     | zarf connect git
     Git (read-only) | zarf-git-read-user | you      | zarf connect git

Creating CI tests

Running Docker on Windows in Github CI is not an option currently. Full Stop.

So, only tests matching src/test/e2e/[00-19]_*_test.go (this is not a real regular expression) can be run.

Drawing from previous test workflows, namely test-kind.yml and test-k3d.yml, I drew up a Windows workflow:

# .github/workflows/test-windows.yml
# parts of the configuration have been omitted due to overlap with other test files
# this showcase is only meant to highlight the differences in CI between windows and linux
name: test-windows

...

jobs:
  validate:
    runs-on: windows-latest
    steps:
      - name: "Dependency: Install Golang"
        uses: actions/setup-go@v3
        with:
          go-version: 1.19.x

      - name: "Dependency: Install Scoop+Make+NodeJS"
        shell: pwsh
        run: |
          Set-ExecutionPolicy RemoteSigned -scope CurrentUser
          iex "& {$(irm get.scoop.sh)} -RunAsAdmin"
          Join-Path (Resolve-Path ~).Path "scoop\shims" >> $Env:GITHUB_PATH
          scoop install [email protected] [email protected]

      ...
      # this stuff in the middle is pretty 1:1 w/ the linux workflow

      - name: "Run Tests"
        shell: pwsh
        run: |
          make test-e2e ARCH=amd64 -e RUN_CLUSTER_TESTS=false
        #                           ^ this environment variable is later used to only run certain tests 

      - name: "Cleanup"
        shell: pwsh
        run: |
          Remove-Item -Recurse -Force .\build\

Ok, the CI workflow is done, but what happens when we run make test-e2e? A f*ck ton of errors is what happens. This is the final hurdle, and I am still not 100% done.

To summarize:

  • examples/component-scripts was broken because the scripts used touch, which doesn't exist on Windows
  • src/cmd/package.go needed some regular expression adjusted to allow \ and : as valid characters in Zarf's cache path
  • src/test/e2e/00_use_cli_test.go needed an error message expectation tweak due to OS differences
  • src/test/e2e/20+ tests had code added to them to skip if the env var RUN_CLUSTER_TESTS is false

There is still a lot to do, but getting Zarf fully Windows capable is nearly there.

UPDATE: 2022-10-08

PR #832 has been merged into master and will be included in the next release of Zarf.

Windows support is still in alpha, but it's now possible to run Zarf on Windows.