Speeding up Debian package builds

So @ehelms managed to nerdsnipe me yesterday when he asked why our Debian builds for Foreman take hours (compared to <10 min for RPM on Koji)… Well, take some :tea: and sit by the :fire:, it’s story time!

When building a (RPM or Debian) package, you roughly do the following steps:

  1. obtain the source (usually a combination of the project source code and some packaging recipe)
  2. obtain all build dependencies (usually defined in the packaging recipe)
  3. execute the actual build step(s)
  4. take the built result and put it into an actual “package” (or multiple), this includes:
    1. copying files into a useful structure
    2. scanning the files for dependencies (like if the file is a bash script, add an automatical dependency on bash)
    3. scanning the files for provides (like if the package contain /bin/bash, allow other packages to depend on that)
    4. create a list of installed files for later verification
    5. compressing the installed files into a package (or multiple)

RPM and Debian builds do roughly the same “steps”, so why are Debian builds so much slower?

Obtaining the source and the (packaging) build-dependencies isn’t much different, but now the differences start.
For RPM, we package all Ruby gems and NodeJS modules as RPM, so when they are needed, we can just install them in binary form and they are present almost immediately. For Debian however, we only build-depend on Ruby and NodeJS itself, and all gems/modules are installed with bundler/npm at build time (step 3 above) from their sources. This takes quite some time (Ruby: ~8min, NodeJS: ~22min) already.
But then, because we ship these gems/modules inside our Foreman package, step 4 also needs to process all these additional files (and there are many in node_modules).
In numbers:

  • dh_install (step 4.1) takes 40(!) minutes
  • dh_shlibdeps (step 4.2) is sometimes slow (the runs I looked at today weren’t, but I’ve seen ~10min in the past)
  • dh_makeshlibs (step 4.3) is sometimes slow (the runs I looked at today weren’t, but I’ve seen ~10min in the past)
  • dh_md5sums (step 4.4) takes 25(!) minutes
  • dh_builddeb (step 4.5) takes 25(!) minutes

This gets amplified by the fact that our Debian build node is running on slower HW than our Koji (no ultra fast NVMe disks).

So, uh, what can we do about that?

Obviously, if we wouldn’t vendor all those Ruby gems and Node modules, we would instantly win, as we wouldn’t need to build them every time, and not include them in our packages. But this is quite a lot of work (also long-term for updating those packages), and we just don’t have that.

The next best thing is to limit the number of vendored packages and make the “put into package” step smarter.

  • Today we install all Node dependencies from package.json, but this also includes quite a few CI/Test/Lint packages, which we don’t need to build our assets. When building RPMs, we exclude these, and we should do the same with Debian. – I tried this and it cuts down the npm install time to 2 minutes (from 22 minutes!), dh_install to 40 seconds (from 40 minutes!), dh_builddeb to 2 minutes (from 25 minutes!)
  • Creating checksums for package contents is optional, and while I think we should still provide checksums for most of our core packages, foreman-assets which includes all the Node modules needed to build assets could probably be excluded (this package is used for building plugins, but users hardly ever need it installed on their systems) – I tried this, and it cuts down the dh_md5sums run to 30 seconds (from 25 minutes!)
  • We know that the vendored Ruby/Node packages don’t include any files we want to offer other packages to depend on, neither they contain anything that we would need to generate dependencies for, so we can exclude these from dh_shlibdeps and dh_makeshlibs

If you’re curious (you wouldn’t have read until here if you weren’t) you can find my experiments in

https://github.com/theforeman/foreman-packaging/pull/7020

Yepp, that’s 106 minutes saved on every build. Each build takes only ~30 minutes now

Oh, and this accidentally cuts foreman-assets from ~218MB to ~56MB in size.

6 Likes

One very interesting fact (which is worth a reply, vs an edit, no less!) is that all those node_modules excludes for md5sums, shlibs etc can be dropped, and you get the same speedup by only excluding all those nasty Node modules from being installed in the first place!

https://github.com/theforeman/foreman-packaging/pull/7022

I’ve opened Bug #33317: better differentiate between build, develop and test dependencies for JavaScript - Foreman for a general issue how we can improve this, and https://projects.theforeman.org/issues/33318https://projects.theforeman.org/issues/33318 (with a PR) for the actual data for now

Thanks everyone (especially @Ron_Lavi, @amirfefer and @ekohl)!

We have now cleaned out the "double npm install" (Bug #33319: don't run npm-fix-foreman-stories.sh on every npm install - Foreman) issue and have a way to differentiate between build and other development dependencies (Feature #33318: provide a list of JavaScript dependencies that can be ignored during package building - Foreman).

The PR for actually using this knowledge is still open: drop node dependencies which are unused for building by evgeni · Pull Request #7022 · theforeman/foreman-packaging �, but I don’t see any problems with it getting in soon.

I’ve left the overall issue Bug #33317: better differentiate between build, develop and test dependencies for JavaScript - Foreman still open, as @Ron_Lavi mentioned it might be possible to to better differentiate dependencies in the future (e.g. by putting all build ones into dependencies and all other into devDependencies as Forman is not really a Node project, so its dependencies are never used in the real meaning of that field).

And now that we have a new and more powerful build infra (thanks conova!), this is down to ~15 minutes!

4 Likes