Dynamic data like database queries in apipie

This is to gather some thoughts around this topic.

The background is that we use apipie to describe our REST API. It adds descriptions and (optional) validation.

In many places we use the EnumValidator. IMHO that validator is great because it also generates a nice description to the user of valid values. One common problem is that the data is actually dynamic where the EnumValidator will store a copy of the concrete list.

In practice this means the application starts up and loads the data. Concrete examples that recently popped up:

Here for the resource_type parameter it calls Permission.resources, which is this method:

We saw the failure there when the test database wasn’t migrated (fixed), but it does mean that during class loading it already calls the database even if it might not need the controllers (such as a rake task). That can slow it down.

In this particular case the data is cached anyway, so if you would change the DB entries it doesn’t make a difference, but there’s another case where it should be dynamic:

Here we can actually see multiple issues. First of all, it’s marked for translation (N_()) but with a dynamic string. That can never work and should be N_("Possible values: %s") % Katello::RepositoryTypeManager.generic_content_types.join(", ") for it to have a chance, but I don’t think we can really make it translatable.

The more interesting part is that it displays the values, but those are loaded on application start up. It doesn’t perform any validation so if the content types change, it’ll still display the old value(s) while accepting the new value(s). You can question the usefulness of the help here.

That it runs on start up can actually trigger interesting failure modes. Bug #37977: Katello prevents foreman server startup when Pulp is not reachable - Katello - Foreman observes that the method call can result in the application reaching out to Pulp and Foreman refuses to start up when Pulp is unavailable.

So we can observe reliability, correctness and performance aren’t what they should be. Question is, what can we do?

I think EnumValidator today should only be used with true static lists, but perhaps we can enhance it to become dynamic. I opened Support passing a callable to EnumValidator by ekohl · Pull Request #946 · Apipie/apipie-rails · GitHub to do this, but failed to realize it then overlaps with ProcValidator.

So now I’m opening the topic to a wider audience: what should we do here?

I agree that access to external systems during class loading is a bad practice and we should avoid it.

I think we have two different use cases here:

  1. Validate static metadata that is hard to obtain. e.g. it’s from the DB/Pulp
  2. Validate a dynamic list of values

In both cases we can’t obtain the list during the class loading and we need to postpone it. I would suggest to postpone it to the latest possible stage - to the actual request for the apipie data. I think we can achieve both if we introduce a concept of examples to the ProcValidator - upon demand a method will be called that will return a list of items that will pass the validation. Since the method will be called on demand, it will be done quite late in the process - which is good for our cause. And since it’s only examples the system will not commit to this specific list, and will be able to change it depending on the actual object validation logic.

So for use case 1, we will get the list of possible values once on first request and cache it forever.
And for use case 2 we can retreive a list that will be correct to the specific point of time, but for the next request we will need to retreve it again.

How does this sound?

I’m trying to think about the performance. If they’re expensive to retrieve then today it’s not an issue because the cost is paid once on start up. On the other hand, that slows down start up and also means it can end being invalid.

I wonder if we could somehow figure out how much time is spent in apipie loading and if we can easily identify the slow or dynamic parts. See how big of an issue this actually is.