I don’t think we have some best practices guide apart from the tuning guide, but you should be able to scale the backend services independently on the rest.
Without any data to back this claim up, I can just say that many smaller jobs should perform better than a single gigantic job due to the way it’s implemented. For details I’ll shamelessly point you to one of my older posts Help to find the bottleneck in foreman-task / remote execution - #2 by aruzicka in the hopes you haven’t seen that one yet.
That is extremely hard to quantify as not all jobs/tasks are equal and it depends on a lot of factors. I would expect a standard deployment to be able to handle a job on ~8000 hosts (as in a single job with 8k hosts in it) within ~15 minutes. Of course, this is more of an educated guess rather than anything else so ymmv.