RFC: Tool for reporting invalid records in database

In Foreman we constantly change the model’s code, we add / edit / delete validations and not always migrations correctly catch these changes.
Many objects will become invalid during their lifetime, and in some places we even skip validations when creating the object.

Having tool that would report all invalid records in database would be benefit for users in multiple cases:

  • Health status of database records
  • Pre-upgrade check
  • Post-upgrade check

Implementation

I have to ideas how to implement such a feature:

  • Rake task
  • Foreman Feature (page)

Rake task

Simple rake tasks going over all the records and printing output to console or saving it to the .json file, so users can use it in their CI.

From the first version it should have parameter for filtering model classes, so users can specify which classes they want to check.

This would be fairly easy to do, so it could be POC for upcoming improvements. If users will use it, and like it, then we can extend the feature and make it “bigger & better” - UI, API, recurring logic and so on …

Foreman Feature (page)

New page under Monitor section, similar to the Foreman Tasks page, with a general overview and filtering of results. There would be buttons for actions like running the health check, cleaning all the results and so on.

Table with results (found invalid records) would have information about the model (class, id, name), validation messages and links for editing these objects.

In the future we can extend this page and add other health checks, like disk space, RAM monitoring, CPU usage and so on …

Things to consider

  • Performance on big databases
  • Filtering models - We probably don’t want to check all objects.
  • Foreman Tasks and Dynflow should be involved (for performance reasons)
  • … and a lot of other things I didn’t think of yet

Community

What I would like to hear from community:

  • Is this something that you would use? If not please tell why
  • Would you rather have a Rake task or a Foreman page (with API and everything)
  • Is there something to consider, some catches and stuff to handle I didn’t mention above?
3 Likes

From my experience if it is a rake task that is not automatically running, most users will not recognize the feature. If it is an automatic job, it should not only report but do cleanup automatically, so it would need to be very solid.

A new page would make it more easily accessible, but not sure if people look into the monitoring part if no problem is obvious and if it is a problem how likely is it that you will see something preventing Foreman from running instead of the page that could help with cleanup.

So do not get me wrong, I like the idea of cleaning up here, but I would say it should be nothing a user should take care of instead of the system doing this itself.

Could this be addressed by including it in the upgrade manual as an optional step?

I’m leaning to a rake task for simplicity but @Dirk’s comments are valid. I would also wonder how actionable the report would be. How would a user know how to proceed once an invalid entry is reported?

Yes, most people follow the upgrade steps very closely especially since the foreman-installer step is no longer optional. But I am not sure if only optional would be good as then people tend to skip it. And if the frequency is high enough with the additional error sources mentioned by @lstejska as it would catch incorrect migrations when they happen, but not invalid during lifetime and skipped validations which will then have to wait for the next major upgrade. Also if for some reasons a problem would be caused by this, during an upgrade so many more things happen that it would make identify the root cause of the problem harder.

The question is what do you mean by “cleanup”? We can’t delete the records, that could cause many problems and make users pretty angry. And updating objects based on their error message would be also difficult, nearly impossible to do.

I would also wonder how actionable the report would be. How would a user know how to proceed once an invalid entry is reported?

In report would be object model class, id, name or title (if model responds to them), link to show/edit action and displayed validation error message.

How would this then fix the record? By recreating the object as a new entry? I just want to understand what the possible solution could be, so we can perhaps come up with an idea of a solution which involves the user as little as possible, because I fear one will not be interested as long as everything still works and others will try to fix everything without understanding what they do. :wink:

I wouldn’t, that in my opinion should be left on the user. This tool would just inform user what and where is wrong.

I fear one will not be interested as long as everything still works and others will try to fix everything without understanding what they do.

Yeah, that is reasonable fear I’m afraid.

I like the idea, but whatever you do, make sure there is an option to ignore model(s). And by default be sure to ignore huge tables like audits, logs, messages, sources, factvalues, factnames. These are out of my head, there might be more.

Could you maybe give an example of some such invalid records this tool would handle?

Ideally, if we know some objects may be invalid and can reasonably fix them, that should be done in a migration without requiring an additional step from the user.
We’ve had many such cases in the past. For example, when uniqueness for some model wasn’t enforced in the DB and we ended up with duplicate records due to race conditions. In those cases we either deleted all but one of the records (example), or renamed all but one (example) before adding a db constraint to prevent recurrence. In some rare cases where we couldn’t figure out the correct resolution we error out in the migration (example), which isn’t ideal as it leaves the user with a half-migrated db they need to manually clean up.

For more complex checks that may require user action to resolve, we actually already have a tool that does exactly that - foreman_maintain. It has checks that can be run both pre- and post- upgrade or against a running system, and verify there are no invalid records of a certain type in the db (example).
Unfortunately, it is currently not yet supported on Debian and is rather more Satellite-oriented.
Rather than creating yet another tool and another upgrade step, I would much rather we invest in improving the existing tool, making it support all of the Foreman-supported platforms and scenarios, and adding more checks to it.

1 Like