Nightlies assets are now broken

I’d like to see a concrete proposal on how we’d actually implement it. Some questions to consider:

  • Can we still (Foreman) RPMs without internet access?
  • Can we see which licenses are used?
  • Can we see which source packages + versions are used?
  • Can we track security issues in dependencies?
  • Is the modules blob arch-specific and do we need to generate one for x86 and one for arm? Is it even OS-specific and do we need 1 for Debian (Stretch), 1 for EL7 and 2 for Ubuntu (Xenial, Bionic)?
  • How much disk space will we need to store them?
  • How do we ensure the vendor.js in foreman.rpm won’t break plugin-bundle.js when something updates?

I’m not claiming we solve all of these well with our current solution.

That is indeed what we do now.

What do you mean save a snapshot? Like a tarball or just a list of versions?

2 Likes

Can we still (Foreman) RPMs without internet access?

  • Defiantly, the build results are only static assets. (and imh, should be served from a micro-service)

Can we see which licenses are used?

Can we see which source packages + versions are used?

  • Yes, like the Gemfile.lock, today, npm and yarn create package-lock.json and yarn.lock.
    Is this the kind of solution we are looking for?

Can we track security issues in dependencies?

Is the modules blob arch-specific and do we need to generate one for x86 and one for arm? Is it even OS-specific and do we need 1 for Debian (Stretch), 1 for EL7 and 2 for Ubuntu (Xenial, Bionic)?

  • No afaik, the build results meant to run in a browser and shouldn’t use any “native extention”.
    It is an OS-specific (not sure in what level) only when you build your code to run on the machine itslef (NodeJS)

How much disk space will we need to store them?

  • The node_modules folder can be really heavy, especially when you install your dev-dependecy on the machine. On my machine, it’s 500M while the build results are 5.2M
    therefore, I changed my mind and I think we should save only the lock file (704K). I think this should be our signature.

How do we ensure the vendor.js in foreman.rpm won’t break plugin-bundle.js when something updates?

  • That’s a really good question! Afaik, right now, we are ensuring it by building the core and the plugins together. so you can’t really update the core without update the plugins or the opposite.
    I think we need to find a solution where we can build core and plugins (each plugin) separately and independently.

Which I think is good, I don’t see a reason to keep them in production.

I changed my mind, I think keeping the lock file would be the right approach.

This is now fixed.

Can we still (Foreman) RPMs without internet access?

  • Defiantly, the build results are only static assets. (and imh, should be served from a micro-service)

Reminds me of https://www.reddit.com/user/Defiantly_Not_A_Bot :wink:

I do wonder where we store all the resulting bundles. Do we expect
plugin authors to include them in gem releases? Do we expect packaging
to carry these files for every branch? This sounds like a maintenance
disaster but I could be missing something.

Also note that currently users can install foreman-assets, make a change
in the code and recompile assets to change their local installation. If
we only store the resulting bundles then this is lost. Do we want to
limit our users in that freedom?

Currently we are also safe from leftpad situations because we have
copies of all our sources. This allows us to rebuild all our packages
even if NPM would be unreachable (DDoS, service outage, firewalls).

Can we see which licenses are used?

That page doesn’t load unless I include a ton of javascript sources from
third parties. Can’t say if it’s what I’m looking for so I’ll explain.

Some licenses require you to ship the license or it’s considered a
violation of the license. This is why you see Android and Apple both
having a lot of open source licenses when you go to the legal info page.
Other licenses can be incompatible. We already do a poor job of caring
about this, but we shouldn’t do a worse job either.

Can we see which source packages + versions are used?

  • Yes, like the Gemfile.lock, today, npm and yarn create package-lock.json and yarn.lock.
    Is this the kind of solution we are looking for?

So that means I need to go over every repo what ships webpack sources?
It can be a start but personally I hate lockfiles. IMHO you should
always specify dependencies you can work with and find the best match
that’s possible.

Can we track security issues in dependencies?

Is there a way to see this for branches? If we can’t see it for releases
it’s not that useful.

Is the modules blob arch-specific and do we need to generate one for x86 and one for arm? Is it even OS-specific and do we need 1 for Debian (Stretch), 1 for EL7 and 2 for Ubuntu (Xenial, Bionic)?

  • No afaik, the build results meant to run in a browser and shouldn’t use any “native extention”.
    It is an OS-specific (not sure in what level) only when you build your code to run on the machine itslef (NodeJS)

I think it was more that I was considering that we want something that
fits the above requirements. Then you probably want the node modules
something.

How much disk space will we need to store them?

  • The node_modules folder can be really heavy, especially when you install your dev-dependecy on the machine. On my machine, it’s 500M while the build results are 5.2M
    therefore, I changed my mind and I think we should save only the lock file (704K). I think this should be our signature.

Agreed that storing entire module bundles is undesired, but there are
big benefits of having access to the actual sources.

How do we ensure the vendor.js in foreman.rpm won’t break plugin-bundle.js when something updates?

  • That’s a really good question! Afaik, right now, we are ensuring it by building the core and the plugins together. so you can’t really update the core without update the plugins or the opposite.
    I think we need to find a solution where we can build core and plugins (each plugin) separately and independently.

I’ve been arguing that we need a solution for that because they’ve
always been independent. Plugins have been carried to multiple newer
releases and were only rebuilt if needed. If this becomes a requirement
it means we really need to re-engineer our build pipeline for everything
that uses webpack and trigger rebuilds on reverse dependencies. It’s
probably best to automatically create an exact requirement in the RPMs
if they’re so tightly coupled.

I wonder what the implications of this are for Katello and being to
release independently. We can probably still revbump on the packaging
side but I might be missing something.

1 Like

+1 for making the process simpler and making the dev setup closer to the production one.

Btw how does manageIQ build and distribute webpack assets?

I got webpack --bail working with the rpm packaged npm modules from our repo today, but I only reproduced the issue and wasn’t able to find out which package is causing the issue. Webpack runs without any error but the resulting assets are broken.
One thing I spotted though it that without webpack’s ModuleConcatenationPlugin everything gets compiled correctly.

Here’s diff of npm packages that we currently build nightlies from vs. packages I was able to build the assets correctly with:
https://paste.fedoraproject.org/paste/0dp1Gbbia8YniQtSEGHC6Q
There’s probably only a subset that needs to be updated but I’m afraid the easiest way would be to update all.

2 Likes

I’m a little late to the overall discussion, but I’ll answer the questions below based on how another build process handles this for Foreman and what we’ve learned from it.

Yes. The node-modules tarball would be generated alongside a Foreman (or plugin’s) source generation and be just another source for that RPM generated from npm install. For example, rake pkg:generate_source would generate:

  • foreman-1.19.0.develop.tar.bz2
  • foreman-node-modules-1.19.0.develop.tar.bz2

If the npm installed module comes with a license then we can see it in the source. To make this more obvious, we can generate a manifest that lists every package and every license for each package and bundle that into the tarball.

This would all be in the node-modules tarball. Further, the package-json lockfile can be included to make this easier to grok.

About as well as we can today I’d say.

Now this is an interesting question that I think needs to be separated.

As far as I know, Debian already differs from RPM packaging in that it has internet access during builds and uses npm directly?

As for are the modules arch specific, the answer is no except for the one binary module we carry: node-sass. This module has to be built as a package and carried as such by RPM rules. However, that’s still only one package that has to be built out of 109.

  • npm install followed by a tar-gz of node_modules: 95M per Foreman RPM
  • NODE_ENV=produciton npm install yields: 50M per Foreman RPM

If we can adjust so that production assets are only those needed to compile webpack, we can reduce the size of our tarballs (potentially further).

I feel like this a potential existing problem as we’ve seen in some cases already.

Other projects that do this same type of process:

  • cockpit
  • manageiq

I created a ruby script that takes output of npm list --json and runs add_npm_package.sh for each of the packages. I executed it with the list from my working nightly instance and this is the result:
https://github.com/theforeman/foreman-packaging/pull/2620

My hope is that it should produce working set of asset packages.

Now I’ll let @ekohl tell me how much silly this approach is :slight_smile:

Not bad! The biggest concern is that not every node module is needed in production (e.g. testing modules, linting). For the moment, you can use all the nodejs-* defined packages in the package_manifest.yml. Ultimately, we need to ensure that:

NODE_ENV=production npm install

installs only the set of absolutely required for webpack compile NPM modules and base the listing off of that.

I actually kind of like that approach. There are some details (like @ehelms) mentioned but other than that it’s very much in line with where I was already wanting to go. I just didn’t think of using npm list --json yet.

Another thing that I want to get better at is determining if something should be bundled or not. Currently it’s a very simple if deps > 3 but sometimes we have all those deps already anyway. See for example nodejs-axios in your PR.

You also have a few commits that update a package from x.y.z-1 to x.y.z-1 where you do update its bundled dependencies. In that case it should bump to x.y.z-2 which the script currently totally ignores because I didn’t really think of that use case yet.

Like I said: some details we need to improve, but overall I really like this and definitely going to iterate further on this.

and also --no-optional

Yeah, I hoped that add_npm_package.sh would detect that. When I found out it got re-packaged with the same version I decided to share what I have first to gather some feedback.

I’ll try to filter out the dev dependencies and I’ll share the script for further iterations.

https://github.com/theforeman/foreman-packaging/blob/rpm/develop/check_npm_version.sh could probably be modified to check for a specific version rather than just being present. rpmdev-vercmp could be used for version comparison.

I added a check for versions so that it doesn’t change packages that don’t need to be updated.

Calling npm list with --production --no-optional should solve limiting the packages to production ones only.

I’m not sure how great npm2rpm is and if we can accept it’s output blindly. At the moment the script makes significant changes to the spec files that seem to have been created by hand. Anyway here it is and we can start experimenting with it:
https://github.com/theforeman/foreman-packaging/pull/2627

Awesome! The output of npm2rpm has changed quite a bit because I rewrite it. It skips a few things that are no longer needed. Recent specs should already use this, older ones don’t. Cleaning up the situation and finding the right versions to bump to was on my list before branching but I hadn’t gotten around to it yet.

FWIW, the original logic was that we split the packages so that dependencies were the modules we used in the code and thus shipped to users, while devDependencies were the modules that were needed for the build or development process. I guess we could use maybe optionalDependenices for those that aren’t required for build time? sadly it doesn’t look like package.json has any way of grouping modules together other then (ab)using the built in groups.

I went through and re-aligned package.json so that NODE_ENV=production npm install would work properly and only load what’s needed.

On top of that, to help things stay clean and working I am proposing an addition to PR tests to check asset precompile works:

Found a way to reproduce the error on forklift centos7-devel. It’s basically following the same process we do in our packaging. We want to mimic the same thing packaging does:

  • First, compile assets on a Foreman without Katello. Let’s call this host “Alice”
  • Then on another Foreman with Katello compile assets for Katello. Let’s call this host “Bob”
  • On Alice, enable Katello now, but do NOT compile assets for it.
  • Copy katello/public/webpack/katello from “Bob” to “Alice”.
  • Run the foreman server.

In terms of code

  • Get a centos7-devel instance in forklift. This will be “Alice”
  • rm -r foreman/public/webpack foreman/node_modules This clears everything.
  • Make sure you DON’T have Katello enabled in foreman/bundler.d/Gemfile.local.rb or foreman/bundler.d/katello.local.rb. You can comment the line in the Gemfile for this.
  • npm install (yes, in development, as we want node_modules/.bin/webpack to be available)
  • NODE_ENV=production node_modules/.bin/webpack --bail --config config/webpack.config.js
  • At this point, we have a Foreman installation with webpack assets. You may run RAILS_ENV=production rails server if you want to test it.
  • Get a nightly forklift host, centos7-katello-nightly. This will be “Bob”
  • This host will contain the compiled Katello assets at /usr/share/foreman/public/webpack/katello. Copy them to the devel host. You could do:
# On "bob"
cp -r /usr/share/foreman/public/webpack/katello/ katello
# On your hypervisor, at the forklift dir
scp -r -i .vagrant/machines/centos7-katello-nightly/libvirt/private_key vagrant@centos7-katello-nightly:~/katello  .
scp -r katello vagrant@centos7-devel:~/katello_webpack
  • SSH to “Alice” again.
  • Enable Katello again in the Gemfile, uncommenting the line at bundler.d/katello.local.rb.
  • Copy now the Katello assets from “Bob”. cp -r ~/katello_webpack ~/katello/public/webpack/katello
  • Go to https://centos7-devel.yourdomain/redhat_repositories
  • You’ll see the same error we see on nightlies.

I have not checked if this is the same problem that affects foreman_ansible and foreman_remote_execution.

I also think this problem could be reproduced by creating the Katello webpack assets in any other host, it does not necessarily have to be a nightly host, but I haven’t tried yet. I think this is very likely. In such case, we will have to workaround https://github.com/webpack/webpack/issues/959, https://github.com/webpack/webpack/issues/1155 - thanks @tbrisker for finding these.

I’ll update the thread with any findings.

Confirmed. I can reproduce the same error we get in nightly by compiling the latest Katello webpack assets in a host different to the Foreman host, then sending them over to my Foreman host(TypeError: e[t] is undefined). In this case I was using katello master which includes https://github.com/Katello/katello/pull/7416 . I have seen multiple messages implying merging that PR would fix nightlies but so far I can’t see that.

At least, right now we have a clear way to quickly reproduce the issue and try out different fixes.

Quick update after two more days of digging into this:
We’ve replaced the webpack HashModuleIdsPlugin with NamedModulesPlugin which lets us see the name of the module that is failing to load instead of the hash.
Now debugging with the latest nightlies, we see katello is asking for a module named ../../../../../../../usr/lib/node_modules/react/index.js, while vendor.js lists that module as ../../../../usr/lib/node_modules/react/index.js. Note the 3 level hierarchy difference between the names - this is why we’re seeing the failures in nightlies on the katello webpack pages.

Next step: identify why katello packages are built from a folder 3 levels deeper than core, leading to the different relative path to the modules. @packaging/@infra - any idea why we’re seeing this?