Hi,
As you probably heard (and saw in the last demo) we have a new notification
system, now its time to start using and improving it.
here are some of the content I think make sense to discuss, comments etc
are welcomed:
first, I would like to group notification based of topic, for example,
provisioning, infrastructure errors, plugin specific (e.g. content /
subscriptions) etc
Provisioning:
notification when host was failed to provision - for example we were unable
to render a template, or a proxy was down etc, this is usually hidden from
the user and the only way to figure out whats going on is via logs or
machine console.
an action could be to take the user the to rebuild host modal, which checks
the templates and proxies. another action could be to the host edit page,
but we must relay the actual error so its clear what the issue is.
I would suggest to keep the notification for 3 days or if the host got
reprovisioned - then we could remove / mark as read the old notification.
success provisioning - As we already send an email every time a host has
finished provisioning, i think it make sense to highlight that in the UI as
well, I would default to 24 hours or even less for such events. we could
further think about limiting to either top xx hosts that got provisioned,
or just have a message saying hosts were provisioned, where you can link
and get back to host list of installed_at > time and owner = current.user.
Inventory
fact / report importing
-
when a new host gets imported? does it make sense? currently if we import
a host from facts there is no place where we tell the user it happened, yet
the concern here is that initial import can import thousands of hosts, so
clearly, we can't have one notification per host. -
when there is a failure importing facts (imho reports are already visible
in the host status).
I personally had problems where i was able to create a host record, but Nic
record failed from some reason. as a user, if I would not watch the logs, I
would be clueless of that failure ever happening.
my concern here is that while I get let the user know a failure happened, I
have no action to take him to, for example, I see this in my logs:
2017-01-23T15:31:08 26265d09 [app] [I] Import facts for 'host' completed.
Added: 0, Updated: 4, Deleted 0 facts
2017-01-23T15:31:09 26265d09 [sql] [W] Saving br182 NIC for host host
failed, skipping because:
2017-01-23T15:31:09 26265d09 [sql] [W] IP address has already been taken
2017-01-23T15:31:09 26265d09 [sql] [W] Saving em1 NIC for host host failed,
skipping because:
2017-01-23T15:31:09 26265d09 [sql] [W] IP address can't be blank
2017-01-23T15:31:09 26265d09 [sql] [W] Ip6 can't be blank
2017-01-23T15:31:09 26265d09 [sql] [W] Saving ipmi NIC for host host
failed, skipping because:
2017-01-23T15:31:09 26265d09 [sql] [W] Identifier has already been taken
2017-01-23T15:31:09 26265d09 [app] [I] Completed 201 Created in 569ms
(Views: 4.3ms | ActiveRecord: 149.4ms)
yet we have no place where this information is visible in the UI, I think
we would need to either capture these errors in the db, or have a wizard /
troubleshooting flow to fix this kind of issues. maybe just notifying the
user as a first step is better than nothing.
ENC Failures
- If we were unable to create ENC output for puppet I would like to see a
notification, again I would limit it to a certain amount of events (e.g.
maybe up to 5?) and would clear the notification once the issue is fixed.
Users
- Does it make sense to notify admins when new users start using the system?
- Anything else? (maybe admin change your permissions?)
Org/Loc
- Hosts in mismatch mode?
- Hosts that do not belong to any org/loc (probably because they were
imported?)
Trends
- When trend counter is not running?
System status
- when proxies are down / with failures
- when dynflow, pulp candlepin etc are down
- when compute resources are down?
- when ldap servers are down?
(all of these errors should clear or change their severity once they are
resolved). - when there is a newer version of foreman or plugin available upstream?
Community Templates
- When there is a new version of the templates available at github?
Discovery
- When at least one new host is being discovered in the last xx minutes
- When discovered host fails?
CA Expiry (Puppet and others)
When your certificates are about to expire - I know puppet CA is by default
to 5 years, and sadly clients certs are usually less interesting as if your
CA will expire (which is happening first as the 5 years clock ticks from
your puppet master install time) all of the clients will expire too.
Other areas:
Tasks (would probably prefer not to create a task specific notification,
but rather a notification for what ever that is using the task in the first
place).
Sync failures
Subscription expiry
Virt-who reporting failures
SCAP failures?
Thanks,
Ohad