I have Ansible roles running on a couple thousand systems regularly. In this scenario it’s not uncommon for a small handful of hosts in the run to fail for some transient issue. Is there a way to have the job automatically rerun on failed hosts?
Not sure if this is something easily achievable but figure it can’t hurt to ask, Thanks
Hi,
sadly no, currently there’s nothing like that. As a workaround, you could probably roll your own thing with webhooks set up to fire when a ansible role applying job fails, just make sure you don’t end up in an infinite loop.