Foreman-maintain functionality report for foremanctl port

Hello community,

Lately I’ve been working out how foreman-maintain will look in containerized Foreman, which is deployed via the new foremanctl: GitHub - theforeman/foremanctl · GitHub.

I started by getting feedback from core maintainers on an initial functionality review, and I’d like to share the results below. I’m still looking for feedback, so let me know what you think. There are some open questions left as well which we’d appreciate comments on.

Note that the proposal is targeting an MVP which provides the most important functionality from foreman-maintain in foremanctl. Some items were left out to keep the design simpler to start.

Generally, the use of Ansible in foremanctl makes porting foreman-maintain tasks a lot easier than rewriting in, say, a different language. We can make use of roles that exist today to perform the same tasks we had to implement in code with foreman-maintain.

foreman-maintain → foremanctl: Migration Summary

Please note the following are tracked in SAT Jiras at the moment since it’s where I found most information gathered in one place. They’re all public. We could probably use clones for a number of these in the foremanctl repository issue tracker.

Commands

The following commands exist today in foreman-maintain. The recommendations below prescribe which to keep, and how to track implementation of each.

Command Recommendation Tracked Notes Docs
upgrade Keep SAT-39696 In progress. Upgrading Satellite
update Keep SAT-39697 In progress. Updating Satellite
health Keep SAT-44798 New foremanctl health command for runtime health checks. Retrieving service status
service Drop N/A With foreman.target, this becomes less necessary. Introduce only as necessary. Retrieving service status
backup Keep SAT-44838 Largest untracked area. What to back up may change significantly. Backing up Foreman
restore Keep SAT-44838 To be implemented in the backup epic. Restoring Foreman
maintenance-mode Keep SAT-44796 Blocks port 443, stops timers, disables sync plans. Backing up Foreman, Updating Foreman Server
report Move SAT-44804 SatStats reporting should ideally move to another tool since it’s unrelated to configuring Foreman. This way it could remain Ruby too. See also SAT-44834 for sosreport.
packages Drop N/A Very few host RPMs in containerized model. Can users just manage RPMs with dnf? Or do we still need gating with dnf filtering? Creating a Custom File Type Repository
self-upgrade Rethink SAT-44795 Enables newer maintenance repository and updates foreman-maintain today. The upgrade process will define if this is still necessary.
advanced Drop N/A Developers can run Ansible roles/playbooks directly. Don’t build unless a need (e.g. support?) is identified. White Listing and Skipping Steps
plugin purge-puppet Reworked SAT-40445 Rework is in-progress. plugin can likely go away, but removing Puppet is to-be-determined. Disabling Puppet integration

Orchestration

foreman-maintain scenarios run a number of tasks sequentially. If one task is failing, users have the ability to skip it via --whitelist. With foremanctl, this functionality may be missed. If it is, we can consider implementing skips via Ansible --skip-tags.

Checks

The following is a list of checks, mainly from foreman-maintain, but foremanctl checks are included for completeness. Many of these checks are still relevant to containerized Foreman and require re-implementation in some manner. In foremanctl, these checks could run as part of the checks role and have filtering based on flavor, features, and infrastructure.

Jira: SAT-44858

Check Decision Category Notes
check_features Exists in foremanctl Preflight Validates requested features exist in features.yaml.
check_hostname Exists in foremanctl Preflight Validates FQDN format.
check_database_connection Exists in foremanctl Database Pings databases (external DB mode only). Extend to cover local (containerized) DB.
check_system_requirements Exists in foremanctl System Validates CPU/RAM against tuning profile thresholds.
check_subuid_subgid Exists in foremanctl System Role exists but is not used.
certificate_checks Exists in foremanctl Certificates Runs during deploy, not in checks playbook. Centralize to checks playbook?
check_tmout Keep System TMOUT env var can kill long-running operations. Simple assert.
env_proxy Keep System HTTP_PROXY/HTTPS_PROXY affect podman pulls, container networking, Ansible.
check_ipv6_disable Needs more discussion System ipv6.disable=1 in kernel boot params affects container networking. Check /proc/cmdline. See foremanctl#208.
disk/available_space Keep Disk Root partition >=4GB free. Also check volume mount points.
disk/performance Keep Disk fio benchmarks on Pulp and Foreman DB data.
disk/available_space_candlepin Keep Disk No /var/lib/candlepin on host in containerized model. Mount CP data to /var/lib? (foremanctl#478)
disk/postgresql_mountpoint Rethink Disk /var/lib/pgsql/data seems to be outside of the container, where /var/lib/pgsql/<version>/ is only within the container.
foreman/db_up Keep Database Extend existing check_database_connection to cover local (containerized) DB.
candlepin/db_up Keep Database Same as above.
pulpcore/db_up Keep Database Same as above.
foreman/db_index Keep Database PostgreSQL amcheck on Foreman DB indexes. Run via podman exec or direct connection. Parameterize as one role across all DBs.
candlepin/db_index Keep Database Same as above.
pulpcore/db_index Keep Database Same as above.
validate_external_db_version Keep Database External PostgreSQL version >=13.
check_external_db_evr_permissions Drop Database Was only needed during a past upgrade.
foreman/facts_names Keep Application Warns if any host has >10,000 fact values. DB query.
foreman/check_corrupted_roles Rethink Application Is this check still necessary?
foreman/check_duplicate_permissions Rethink Application Is this check still necessary?
server_ping Keep Application /api/v2/ping end-to-end health check. Extract from deploy into reusable role.
services_up Keep Application Rethink for containers: check systemd service status for all quadlet containers and foreman.target.
foreman_tasks/not_paused Keep Tasks Checks for paused Foreman tasks. Investigate Foreman Ansible Modules for queries.
foreman_tasks/not_running Keep Tasks Checks for running tasks before upgrade. Can wait for completion. Critical pre-upgrade check.
foreman_tasks/invalid/check_old Keep Tasks Tasks >30 days old in paused/stopped state.
foreman_tasks/invalid/check_pending_state Keep Tasks Tasks stuck in pending state.
foreman_tasks/invalid/check_planning_state Keep Tasks Tasks stuck in planning state.
pulpcore/no_running_tasks Keep Tasks Active Pulpcore tasks. Query via Pulp API. Critical pre-upgrade check.
container/podman_login Keep Container Checks podman is logged into registry.redhat.io. Directly relevant for image pulls.
foreman_openscap/invalid_report_associations Rethink Plugin OpenSCAP reports with broken associations. DB query. Conditional on OpenSCAP feature. Is this still necessary?
foreman_proxy/check_tftp_storage Needs more discussion Plugin Cleans old kernel/initramfs from TFTP boot dir. Implementation depends on host vs container.
foreman_proxy/verify_dhcp_config_syntax Keep Plugin Validates ISC DHCP config. Implementation depends on config location.
puppet/verify_no_empty_cacert_requests Keep Plugin Empty Puppet CA cert request files. Conditional on BYOP puppet integration.
foreman/check_puppet_capsules Keep Plugin Smart Proxies with Puppet feature. Conditional on BYOP puppet integration.
maintenance_mode/check_consistency Rethink Maintenance Verifies all maintenance mode components are consistent. Depends on maintenance-mode command decision.
restore/validate_hostname Keep Restore Backup hostname matches current system.
restore/validate_interfaces Keep Restore Network interfaces match backup expectations.
check_sha1_certificate_authority Drop Certificates sha1 should likely no longer exist in certificates after the upgrade to RHEL 9.
check_hotfix_installed Rethink Packages Current RPM-scanning implementation doesn’t apply to containers. Blocked on hotfix delivery design for containerized Foreman.
backup/certs_tar_exist Rethink Backup Certificate storage changes with containers (podman secrets). Part of backup Epic design.
restore/validate_backup Rethink Restore Backup format will be different for containers. Part of restore Epic design.
restore/validate_postgresql_dump_permissions Rethink Restore DB restoration via containers changes the permission model. Part of restore Epic design.
check_subscription_manager_release Drop (Satellite) Repositories Checks if RHEL repos are pinned to a Y version. With how few RPMs foremanctl requires, let’s aim first for maximum flexibility.
system_registration Keep (Satellite) System Checks self-registered to own Satellite. Still problematic.
iop_*/db_up (x5) Keep (Satellite) Database IoP database pings. Parameterize as one role, not 5 copies.
repositories/check_non_rh_repository Drop (Satellite) Repositories With how few RPMs foremanctl requires, we can be more flexible.
repositories/check_upstream_repository Rethink (Satellite) Repositories Upstream Foreman repos on Satellite. Would cause version conflicts.
repositories/validate Rethink Repositories Required RHSM repos available. Useful for foremanctl/hammer RPM updates. Make this work for upstream and Satellite?
non_rh_packages Drop (Satellite) Packages Lists non-Red Hat RPMs. Reduced importance.
root_user Drop System Ansible handles privilege escalation via become.
validate_dnf_config Drop Packages Extremely low risk with so few host packages.
recurring_timers New Container Check systemd timers for recurring Foreman tasks are active and enabled.

So this check came up in a different context today and when looking at it I was very puzzled what exactly this check is trying to verify. Do you know any more context and why it should be kept?

I saw it was disk space related, so I kept it as a health measure. If you don’t think that brings value, we can remove it. I suppose an upgrade shouldn’t be attempted if the disk is going to fill up and crash halfway through.

Well, let me phrase it like this: It’s a good idea to have sanity checks whether a system is in a state where the update is potentially possible. And yes, ensuring there is enough storage for both the end-state of the update but also the intermediate caches etc is a good idea. Verifying whether / has X gigabytes free, where X was determined by a fair dice in 2018, is not that check.

1 Like