RFE: Storing files in Foreman using Katello

As part of my work on the foreman_rh_cloud plugin, I need a place to store reports. Currently I use the filesystem directly, which is not a great solution especially if we think about HA enabled setups.
While working on this, I have discovered a couple of other use cases that need file storage in the system.
So far I can identify two types of file storage usage:

  1. Files that are uploaded by hosts for different processing, like forwarding them into the cloud. Good examples for this kind of work would be reports generated on the host that need to be forwarded to the Red Hat Cloud services, like insights-client reports, OpenSCAP reports or rh_cloud reports.
  2. Files that are uploaded by the users into the Foreman server and will be consumed by the managed hosts. Examples would be Remote Execution actions that will have files attached to them, so the files would be consumed by the action running on a specific host.
  3. Files that will be synced from the cloud into Foreman by some automatic process. An example for that would be using Foreman as a cache service for auto-updatable components like insights-client. This executable tries to download latest metadata file each time it generates a report. Instead of pinging the cloud each time, we would like to have the files synced once in a while into Foreman and then distributed from there.

Current we can see two ways we can integrate Katelo and Pulp as the backend to store files:

  1. Using native Katello features
  2. Using ActiveStorage as a common interface to storage capabilities and implementing a backend in Katello similar to postgres adapter

Discussion points:

  • Native Katello support enables more integration with Katello workflows
  • ActiveStorage enables decoupling from a specific backend and enables replacing it with any other supported backend
  • ActiveStorage does not have a native UI, meaning each plugin that consumes file storage will need to implement its own ways to access the file storage (both for upload and for download
  • Native Katello will cause dependencies between plugins - thing we should try to avoid.
  • Active storage will give more control on the visibility and permissions of the files, since the plugin has to implement the endpoint.
  • Active storage will probably require creating object directly in Pulp and those object would not be visible from Katello UI.

I would like to start a discussion about ways of implementing file storage service as part of Foreman.

1 Like

I’ve struggled with simmilar problem in the past for reports rendering on background where we needed to store the report until retrieved by user.
I guess it is very similar problem and what we’ve come up with was stroring the files as blobs in database, as files are hard in HA setups and Foreman does not have an answer for that.

ActiveStorage with pulp as background is pretty nice solution to the problem, but I guess we would need to introduce Pulp as Foreman dependency. Which I would be ok with, but it has implications.
I’m mentioning this because it could be a good starting point for any such implementation (make reports use this solution)

I’d not like to use Katello only solution as then we would always strugle with Foreman instances that do not have Katello installed.

I think it boils down to:

  1. Bring pulp as dependency for foreman core
  2. Implement two backgrounds for ActiveStorage
  3. Gate these funcionalities in REX, OpenSCAP behind katello being installed

IMHO 1 or 2 should be the way to go, but 3 is also possible.

1 Like

ActiveStorage with pulp as background is pretty nice solution to the problem, but I guess we would need to introduce Pulp as Foreman dependency.

If we go the ActiveStorage route we won’t have to do it actually. We will be able to configure Foreman with any backend that we like. For example there is already an adapter for postgres blobs or any other adapter that is already shipped with ActiveStorage.

Yeah, with ActiveStorage we have more room to decide on solution for vanilla Foreman.

Thanks @ezr-ondrej for the background with report templates. I think that introducing Pulp as a Foreman dependency would be a huge step and I’d be worried with even bigger difference between “Foreman without Katello” vs “Foreman with Katello”. I’d also like to limit gating features in plugins based on presence of other plugins to miminum, because it comes with a great complexity.

I think standardizing on ActiveStorage with PG adapter would be a good start. It does not prevent us to change the gear later.

Later I’d also consider whether it isn’t already a time to make more plugins part of the core, including Katello. Simply because the amount of conditions like “if x is present I can do y” is quite big already. But I don’t want to sidetrack this covnersation with such a big topic.

I’ve already suggested activestorage and leveraging multiple backends. This is similar to the Django storage model, which is pluggable. Pulp uses django-storages to optionally provide S3 storage.

This would also be my preference. Longer term you could even look at a model where Katello implements the Pulp storage backend, but Foreman uses PG blob storage.

Another benefit of starting with ActiveStorage using PG blob storage is that you can already start migrating plugins to it, even if the Pulp storage backend isn’t ready: from a plugin perspective the interface shouldn’t change.

2 Likes

To conclude the plan of action:
Since I don’t see objections to ActiveStorage usage for managing file artifacts in Foreman and plugins, I would introduce ActiveStorage installation migration and a new dependency for postgres blob adapter active_storage-postgresql

Stay tuned for a PR link

1 Like

First version of the PR is out: https://github.com/theforeman/foreman/pull/9339