RFC: REX pull provider

If we did go with the first route (with the job fetched and the results sent over HTTP), the smart proxy deployment would need to be upgraded to Puma.

Now I understand, yes, Smart Proxy itself (the Ruby process) will not scale well and HTTP polling is not an option. That was not what I was suggesting, I was thinking developing our very own (external) lightweight process doing handling of all WebSocket requests. Smart Proxy would talk to the process via IPC (e.g. REST API over localhost or UNIX socket).

Well, there are brokers which supports both p2p and p/s messaging patterns. But let’s not play with words.

As Ewoud mentioned, this cannot be done due to security, our users do maintain servers for different clients and we simply can’t allow them to easily subscribe to all info. This must be point to point.

This smells like very much the only option we have. Therefore if a new host is created, we’d need to update Mosquitto ACL configuration files on the Foreman Server and all Smart Proxies. There is also a possibility to write an AUTH PLUGIN that would perform this dynamically by performing requests to Foreman Server for example.

I think we need to consider both security and scalability from the very beginning. I think ideal solution is that a client has X509 certificate (puppet, rhsm or foreman-cert tool) that entitles the client to access its very own messaging endpoint. Whatever this is (p2p or p/s) is an implementation detail, but the communication must be point-to-point.

Again, I need to think loud - wouldn’t be easier to develop our own simple websockets service (not in Ruby preferably) that can be tightly designed and integrated for our own use case? With HTTP(s) handshake it’s something our users know very well and firewalls allow by default, it will “just work”. This does not sound like an enterprise bus with tons of messages, types and endpoint. We need to communicate straightforward messages like “run this script” or “are you alive”?

I didn’t say I’m suggesting this for production deployments. In production deployments this should be forbidden and the point-to-point-ness enforced by ACLs. I did this to lower resource requirements when testing another part of the stack.

Not necessarily the rules can be generic in shape “client/$name”, where $name is taken from client’s certificate which the client presents when connecting to the broker. If it really works like that, we wouldn’t need to update the configuration.

How would we enforce that someone from organization A is not trying to connect with its own certificate to client/name.from.org.b? I just briefly read Mosquitto configuration but there’s not much flexibility.

I tested just such an ACL:

pattern read rex/%u
topic write rex/results
user admin
topic write rex/#
topic read rex/results

This let any client with an ID of X (where X was the CN of the cert they connected with) to read from a topic of ‘rex/X’ and let them write to ‘rex/results’.

It also let a client with an id of ‘admin’ (verifeid by their cert with a CN of admin) to write to all ‘rex/*’ topics and read from ‘rex/results’.

Lzap, if we’re using cert-based authentication, then org boundries don’t really matter, a client can’t ready from anyone else’s queue without their client certificate.

Nice, then static configuration file should do it I suppose, including federated scheme (routed brokers). The only thing to keep in mind is blocklist client name “admin” just in case.

I swear i had replied to this before the holiday, but apparently not!

The current target is the ability to execute a job on 10K hosts per smart proxy over the course of 4 hours. At the same time, a job executing on 2 hosts should not take 4 hours and should execute within a small amount of time (a few minutes at the most). A polling based mechanism handles the first situation easy, but the second situation demands much more performance.

2 Likes

I wanted to provide an update to this. The general flow is very similar with a few key differences. As we worked through user stories, it became obvious that creating a new pull provider would result in:

  • Duplicated job templates
  • Needing to enhance the UI to support running the same ‘job’ across multiple providers, potentially with different inputs

Talking through it we realized that to a large extent the templates used to generate instructions to run on a host should not be fully tied to the technology used to execute it on the host.
This is similar to what evgeni brought up here: Relationship between Ansible and REX proxy plugins - #9 by evgeni

While this could long term lead to a break between the templating (scripts & ansible playbooks) and the execution technology (ssh & ansible) for the time being this has some implications for the ‘pull provider’ work.

We came away with:

  • There should not be a dedicated pull provider
  • The ‘SSH’ provider sould be named the ‘Script’ provider
  • The ‘Script’ provider should support executing via SSH or Pull on a given smart proxy
  • The ‘Script’ provider should support an optional MQTT notification if configured to use Pull
  • The sys admin installing & configuring the Smart proxy would decide if the Script provider should use SSH or Pull
  • The foreman application does not need to know which host will use pull or which host will use SSH, it will simply use a smart proxy for execution, that smart proxy will use the method it is configured for

I’ve updated our Google Doc with user stories and Personas that i invite you review: RHC inspired MQTT Pull Provider - Google Documenten

This could have been a webhook event, so users can actually implement anything they want. Looks like a good fit, webhooks are fire-and-forget events which can be configured to be executed on remote services (HTTP) or via Smart Proxy Shellhook plugin. We could provide an example script for MQTT.

The flow (or at least in my head) is roughly like this:

  1. User triggers a job for a host
  2. Foreman selects a smart proxy for the host and delegates the job there
  3. Smart proxy optionally notifies the host over mqtt
  4. Host grabs the thing to run from the smart proxy and runs it, reporting results back to the smart proxy

Although I’d be glad if we could reuse webhooks for this, I don’t really see how we could do that without radically altering the flow.