RFC: Goodbye Qpid, Hello Artemis: Evolving Messaging in Katello

Jonathon_Turel · January 17, 2020, 4:45pm

This RFC is predicated on these facts regarding Katello’s future:

Pulp2 will be replaced by Pulp3 which does not use Qpid (no more katello-agent)
The only remaining usage of Qpid would be delivering messages from Candlepin to Katello.

Several months back folks from Katello, Candlepin, and others met to determine a path forward with respect to the above. We concluded that by connecting Katello to Candlepin’s internal broker, ActiveMQ Artemis, Qpid could be removed from the architecture completely with the switchover the Pulp3.

Since this work doesn’t depend on Pulp3 we can reduce the amount of change in that cutover release by including the Artemis migration in an earlier one such as 3.16

Two deployment options:

Embedded Artemis

This is the initial recommendation. The advantage is that making this happen really only involves exposing a listener in Candlepin’s configuration for external connections and creating some new SSL certs for client and server.

The disadvantage is that, to my knowledge, there is no way to get any information about the broker - how many connections are open / how deep are the queues / etc. This is something useful that qpid gives us through qpid-stat

Standalone Artemis

The advantage here is that we can scale better in future architectures. The obvious example being multiple Candlepin instances all sharing a single Artemis broker. In fact, Red Hat’s internal Candlepin deployment of 10+ nodes is moving toward this configuration. There is also an excellent management interface for seeing all kinds of information including messages within queues.

The disadvantage is that more work will be required to achieve this setup relative to the embedded option. Packaging, installer, etc.

Future Work

Katello has its own internal event queue which is backed by a table in the database. This is a candidate for refactoring to take advantage of Artemis. Even in the embedded setup we have full control of the broker in terms of queues, connections, and so forth to make this possible.

Proof of Concept
I have a PR open against Katello which shows how this works via the STOMP protocol. The PR has notes on how to configure your environment if you’d like to give it a try.

The PR also demonstrates Katello’s internal event system using Artemis. While it does work well, I’m not necessarily advocating for its inclusion in the initial migration of consuming events from Candlepin.

Now it’s time for you to let me know your questions / thoughts / concerns!

ehelms · January 17, 2020, 5:28pm

After a quick look around, I did not find any native EL7 packages for Artemis. This is something we’d package and carry ourselves in this deployment model?

Is there a way, from Katello’s connection to Artemis, to gather this information? Or Candlepin’s?

ehelms · January 17, 2020, 5:29pm

Would it be possible for you to split this part of the PR out into a separate commit included with the PR? This would help folks understand the changes to consume Artemis events vs. the changes for Katello itself to use Artemis.

ekohl · January 17, 2020, 5:54pm

How easy is it to switch over from the built in to a standalone?

Jonathon_Turel · January 21, 2020, 2:56pm

I also found nothing. I can’t think of an alternative to packaging an RPM ourselves.

I was able to drop the full installation tarball on my system and use the CLI to get info on the queues of the embedded instance. There is also a REST API but we’d need to land jars, configuration etc. That sounds like a headache.

There is also a management queue that we can send messages to but it’s not very well documented, so I’ll try to get a hold of the Artemis devs and get some information on it. If we need this capability then I think this would be the best way, if we can get it working.

Jonathon_Turel · January 21, 2020, 2:57pm

It’s possible but for my own sanity I’d like to keep it together. I can add notes on the PR of where one should draw their attention to w.r.t the Candlepin side.

Jonathon_Turel · January 21, 2020, 2:59pm

Assuming you’re talking switching Candlepin over - it’s only a matter of setting values like this is candlepin.conf:

candlepin.audit.hornetq.embedded=false
candlepin.audit.hornetq.broker_url=tcp://localhost:61617

ekohl · January 21, 2020, 3:07pm

Jonathon_Turel:

How easy is it to switch over from the built in to a standalone?

Assuming you’re talking switching Candlepin over - it’s only a matter of setting values like this is candlepin.conf:
candlepin.audit.hornetq.embedded=false
candlepin.audit.hornetq.broker_url=tcp://localhost:61617

But is there any data to migrate?

The use case I’m thinking about is that we start with an embedded instance but if we need to scale up/out it’s possible to deploy a standalone instance.

Jonathon_Turel · January 21, 2020, 3:46pm

It might be possible (though I don’t see any documentation mentioning) to move the journal files used from one installation to the other. Would need testing. My preference is to allow the queues to drain during upgrade. They should never be that deep - I’m guessing queues are already empty during upgrades.

ekohl · January 21, 2020, 4:03pm

That also sounds like a good plan. You do need tools to verify the queue is really empty and ensure nothing is producing more messages, otherwise you’ll end up in a possibly infinite waiting state.

lzap · January 24, 2020, 9:38am

My very first idea - as an upstream user I would like to see in some future a deployment without Candlepin but with Pulp. Subscriptions are less relevant upstream, granted there are some users (likely RH customers) who use that (e.g. for testing what’s next for Satellite) but if we do such a tight coupling this could easily kill this. So initial feeling - let’s do this as loose coupled as possible.

As much as I would like to prevent another Java process running as an engineer, as an operator I strongly believe we should avoid this. How can I restart Candlepin without restarting the broker for other services?

Monitoring of both Pulp and Candlepin is long overdue, these Bugzillas I filed years ago should have been priorized. If folks expose those metrics then this should not be a problem.

From the operational perspective, this is the best way to go. We should pay as much attention to resource limits of that service and set it up accordingly.

Can you please create a messaging adapter in Katello so it does not rely on STOMP internally? I briefly read the spec and it seems to be quite trivial compared to more complex APIs like JMS. I believe your usecase is pretty simple and it can be generalized to the degree when a new adapter could be easily written in case something is wrong. Now that we have Redis available, I see no reason why it could not be an option. Actually, if Redis (in durable mode) was part of the PoC that would make me more comfortable that we have tried this super-lightweight option too. Because from what I’ve heard Redis should be used for Dynflow/Tasks too.

We should absolutely try to avoid having our users to maintain two brokers. Even if this makes engineering less comfortable, confident or longer, history showed (MongoDB) that it is not worth it.

barnabycourt · January 27, 2020, 7:47pm

There are existing APIs in Candlepin to view the queue depth for the Candlepin queues (GET /candlepin/admin/queues).

lzap · January 28, 2020, 8:38am

Cool, do you have a documentation describing all the metrics which can be downloaded from the app?

barnabycourt · January 28, 2020, 2:41pm

Details on the API that Candlepin provides are at https://www.candlepinproject.org/swagger/?url=candlepin/swagger-2.9.17.json#!/admin/getQueueStats

More generally, if the broker.xml is updated you can enable the jmx connection to artemis and then you have access to everything described in https://activemq.apache.org/components/artemis/documentation/latest/management.html

Jonathon_Turel · January 31, 2020, 6:25pm

In the future state Katello and Candlepin are the only two that care about the Artemis broker. Katello is also only subscribed to the broker to read messages in the first iteration and so it will simply stop processing messages until the broker recovers - just the same as if the broker were external to Candlepin (ie qpid). If Katello’s own events system would eventually use Artemis then we should absolutely externalize the broker.

While I don’t have an adapter pattern as you’ve mentioned I’ve taken care to limit STOMP’s exposure to a single class and not bleed its API elsewhere in the code it’s a step in that direction. That’s probably good enough in my opinion. Let’s chat on the PR if you want to dive into the code.

It is an option. However, pub/sub is one facet of what Redis offers. I’d like to lean toward a purpose-built broker like we have with Qpid before, and like Candlepin has done with Artemis. We will have a robust, feature-rich broker to rely on if we want to expand in that area. Also, Candlepin would have to be taught to publish messages to Redis, like it currently does for Qpid. That is a burden we would impose on Candlepin.

Bernhard_Suttner · February 4, 2020, 9:27am

Can you explain, why you choose Artemis? AFAIK its (another) java component. Isn’t there a python / ruby implementation which can be used?

qpid is used between Katello server and Katello hosts. Is the plan to use Artemis for this communication, too? Or is there no katello-agent at all with Pulp3? I thought this channel was used to receive some information of connected hosts and push them to katello server - like periodic updates of “Reboot required, New Package List, …”

ehelms · February 4, 2020, 12:28pm

Candlepin uses Artemis to handle task processing and emitting messages for consumption by consumers such as Katello. In a standard deployment this Artemis instance is embedded inside the running Candlepin instance making it transparent to users. For Katello to consume messages today, Candlepin publishes to Artemis and Artemis to Qpid. This creates a middleman that also has a performance impact serializing and de-serializing those messages. By removing Qpid, Katello can consume the messages directly from the internal Artemis. That internal Artemis can also be externalized for re-use or scaling of Candlepin. The Candlepin team has stated they could explore the use of Redis but this would be a non-insignificant change.

Pulp 3 moves Pulp out of the area of tracking host data and performing host actions. Therefore, katello-agent, Qpid and Qpid router are no longer supported components. This, combined with the proposed work here drops Qpid and all it’s components from our stack. At this time, there is no replacement for the pull functionality katello-agent provides and users will be encouraged to use remote execution. Feedback is welcome in this area to help understand user needs.

The katello-agent does not send any direct information back. Rather, the agent listens for messages and performs actions if there is a message for that host. Results are reported back. Information like package profiles are sent to the server via standard API calls that are triggered by yum plugins.

Jonathon_Turel · February 19, 2020, 2:57pm

Thanks to all for the feedback in this thread. My final PR is opened against Katello to start pulling Candlepin events from Artemis. Take a look if interested: https://github.com/Katello/katello/pull/8563

lzap · February 21, 2020, 8:53am

What’s the resolution then? Standalone or embedded?

Jonathon_Turel · February 21, 2020, 8:04pm

Embedded. So it’s what we are already using but eliminating the Qpid middleman by connecting directly.