Nix function from string to pseudorandom integer
Or: Load balancing remote builders with pseudorandom numbers.
Background1
Nix remote builders are fantastic - a machine reachable by SSH plus
a few lines of configuration
(and avoiding some
footguns), and you got
yourself seamless remote builds. The main issue is the network traffic between
the builder and client host, but in case that will slow things down, disabling
remote builders is usually as simple as adding --builders ''
or a similar
option to the command. Conversely, --max-jobs 0
forces remote builds.
Another issue is when a group of people, each on their own machine, are sharing a pool of builders. Since none of these are aware of each other, and there’s no central scheduling, it’s easy for a build machine to end up as a choke point, having every client asking it for resources before trying the next host in the list. This can delay build start-up. Even worse, if a builder is configured to run more than one job in parallel, it might end up doing that even when other builders are doing nothing. This means there will be unnecessary resource contention.
Finally, we want to minimise download time. It would be ideal for each build from a single user to always go to the same builder, so that the builder doesn’t have to download anything which is already in the Nix store. But since builders are shared, we can’t always guarantee this. So each user should have a unique list of builder priorities. With a lot of users and a lot of builders, this should result in a decent spread of load across the machines, and minimal downloads for each individual user.
The hack
There is an equivalent to a “priority” for builders which is the speed factor. The higher the speed factor, the more preferred the host. So we could generate a pseudo-random list of speed factors for each user, and achieve crude load balancing this way. (If we configured each user’s machine centrally we could of course assign each of them a list of speed factors, but this is not the scenario I’m dealing with.)
All we need now is a pseudo-random number generator which is likely to be unique per combination of user and builder. Thanks to Joel McCracken pointing out the relevant building blocks, this should do the trick:
pkgs.lib.fromHexString (
builtins.substring 0 15 (
builtins.hashString "md5" "your string"));
How it can be used in the build machine configuration23:
{ config, pkgs, ... }:
{
config.nix.buildMachines =
let
pseudoRandomSpeedFactor =
content: 2 + pkgs.lib.fromHexString (builtins.substring 0 15 (builtins.hashString "md5" content));
in
[
rec {
hostName = "[…]";
speedFactor = pseudoRandomSpeedFactor (hostName + sshUser);
sshUser = "[…]";
# Other properties omitted
}
# Other builders omitted
];
}
To verify the new configuration, we can pull out the mapping from client host to
builder host and speed factor with
nix eval .#nixosConfigurations --apply 'clientHosts: builtins.mapAttrs (_clientHost: props: map (builder: {${builder.hostName} = builder.speedFactor;}) props.config.nix.buildMachines) clientHosts'
.
For example, with two NixOS host configurations “weak” and “strong”, where
“strong” doesn’t have a remote builder, and “weak” uses “strong” as the builder:
❯ nix eval .#nixosConfigurations --apply 'clientHosts: builtins.mapAttrs (_clientHost: props: map (builder: {${builder.hostName} = builder.speedFactor;}) props.config.nix.buildMachines) clientHosts'
{ strong = [ ]; weak = [ { strong = 290089632656129126; } ]; }
-
Caveat: I’m relatively new to remote builders. They are pretty simple, so I haven’t actually read the Nix code to verify all my assertions, and have made assumptions about how they work based on the “the simplest thing which could possibly work” heuristic. Please let me know if you have any factual corrections. ↩
-
If everyone is using the same SSH user, you could use
config.networking.hostName
instead ofsshUser
. ↩ -
The 16 hex characters (“0” through “9”, “a” through “f”) representing four bits each, and the 15 character limit of
fromHexString
, means that the final number is between 0 and 2^(15×4)-1, inclusive. Since a speed factor of 0 would be pointless, and a speed factor of 1 presumably makes the remote builder equally likely to be used to the local host, I’m adding 2 to the result so that it’s between 2 and 2^(15×4)+1, inclusive. Yes, this is a silly thing to worry about, but my brain would not shut up about the corner case. ↩
No webmentions were found.