Foreman does have a nice audit trail for many operations, what's
missing is ability to find how a template (e.g. kickstart) was
rendered. Storing whole template text in audit table is probably not
the best thing to do, production.log is also not a good fit, so here
is a proposal.
I want to create a small API called File audit (*) with ability to
store arbitrary files into
/var/log/foreman/file_audit/id/text-timestamp where "id" is record id
(or multiple ids concatenated with dots), "text" is arbitrary alphanum
text and "timestamp" is unix epoch number. Example:
That API will be used to store contents of all templates rendered, so
users can easily go "back in time" and see how templates are being
rendered. The directory would be root-only reading and files will be
created with restricted permissions (foreman user rw only). On system
with SELinux, security would be more tightened allowing writing only
to foreman_t domain and reading to nobody. For the example above, this
would mean:
1.33.7/my.host.lan-pxelinux-1512492603
1 - file audit type (static list, 1 stands for "template audit entry")
33 - host id
7 - template id
my.host.lan-pxelinux - extra data so users can work and search from command line
1512492603 - UNIX epoch timestamp
Everytime new record is added, a log entry is created into
production.log containing file path. By default, there will be a cron
job deleting all files older than one month. In documentation, we will
ask users to rsync the directory to different location if long-term
archival is needed.
This API could be used for other audit logging, for example when user
uploads a manifest ZIP file in katello or new version of RPM/Puppet
file. This will be the first step to improve audit around templates,
later on we can create a plugin showing the data in audit/host pages
if needed. But in the first phase, administrators could easily
search/grep/diff those files when necessary.
I don't think, this is a good idea. Storing files on the local
filesystem is generally consideres an anti-pattern imho. Both for web
applications and expecially in a container world.
In a clustered setup, you'd have to attach some kind of shared-storage
to the foreman hosts to make this work. Possibly via NFS or something.
This is meant to cause trouble. I generally advise not to have any state
on an application server.
I think, this can and should be done with a database. I agree, that the
audit model might not be the best place for this. But a database
definitely is.
Just my 2 cents.
- Timo
···
Am 05.12.17 um 18:01 schrieb Lukas Zapletal:
Hello,
Foreman does have a nice audit trail for many operations, what's
missing is ability to find how a template (e.g. kickstart) was
rendered. Storing whole template text in audit table is probably not
the best thing to do, production.log is also not a good fit, so here
is a proposal.
I want to create a small API called File audit (*) with ability to
store arbitrary files into
/var/log/foreman/file_audit/id/text-timestamp where "id" is record id
(or multiple ids concatenated with dots), "text" is arbitrary alphanum
text and "timestamp" is unix epoch number. Example:
That API will be used to store contents of all templates rendered, so
users can easily go "back in time" and see how templates are being
rendered. The directory would be root-only reading and files will be
created with restricted permissions (foreman user rw only). On system
with SELinux, security would be more tightened allowing writing only
to foreman_t domain and reading to nobody. For the example above, this
would mean:
1.33.7/my.host.lan-pxelinux-1512492603
1 - file audit type (static list, 1 stands for "template audit entry")
33 - host id
7 - template id
my.host.lan-pxelinux - extra data so users can work and search from command line
1512492603 - UNIX epoch timestamp
Everytime new record is added, a log entry is created into
production.log containing file path. By default, there will be a cron
job deleting all files older than one month. In documentation, we will
ask users to rsync the directory to different location if long-term
archival is needed.
This API could be used for other audit logging, for example when user
uploads a manifest ZIP file in katello or new version of RPM/Puppet
file. This will be the first step to improve audit around templates,
later on we can create a plugin showing the data in audit/host pages
if needed. But in the first phase, administrators could easily
search/grep/diff those files when necessary.
I agree the renderred template should not be stored along audit records we
have today, it does not represent any data change. Also it would be a lot
of data in our db and would make searching of other audit records hard.
Having long kickstart files in log files is neither a good idea.
One small comment, Foreman must remain operational if directory can't be
written to and a directory must be configurable (developer setups). The
only downside is related to HA Foreman setups. Users must share the dir
among all Foreman instances.
We should probably limit this to provisioning templates and partition
tables only in terms of templates rendering. Job templates can grow
rapidly, thousands of files per one invocation. The job template result is
different for each host.
Btw this would be also great for host classification auditing. That answers
the question, what parameter values were available for the host when
ENC/info method was called during provisioning.
It'd also be great if foreman-debug would be gathering this, but I suppose
that's already on your radar.
···
--
Marek
On December 5, 2017 6:02:14 PM Lukas Zapletal <lzap@redhat.com> wrote:
Hello,
Foreman does have a nice audit trail for many operations, what's
missing is ability to find how a template (e.g. kickstart) was
rendered. Storing whole template text in audit table is probably not
the best thing to do, production.log is also not a good fit, so here
is a proposal.
I want to create a small API called File audit (*) with ability to
store arbitrary files into
/var/log/foreman/file_audit/id/text-timestamp where "id" is record id
(or multiple ids concatenated with dots), "text" is arbitrary alphanum
text and "timestamp" is unix epoch number. Example:
That API will be used to store contents of all templates rendered, so
users can easily go "back in time" and see how templates are being
rendered. The directory would be root-only reading and files will be
created with restricted permissions (foreman user rw only). On system
with SELinux, security would be more tightened allowing writing only
to foreman_t domain and reading to nobody. For the example above, this
would mean:
1.33.7/my.host.lan-pxelinux-1512492603
1 - file audit type (static list, 1 stands for "template audit entry")
33 - host id
7 - template id
my.host.lan-pxelinux - extra data so users can work and search from command
line
1512492603 - UNIX epoch timestamp
Everytime new record is added, a log entry is created into
production.log containing file path. By default, there will be a cron
job deleting all files older than one month. In documentation, we will
ask users to rsync the directory to different location if long-term
archival is needed.
This API could be used for other audit logging, for example when user
uploads a manifest ZIP file in katello or new version of RPM/Puppet
file. This will be the first step to improve audit around templates,
later on we can create a plugin showing the data in audit/host pages
if needed. But in the first phase, administrators could easily
search/grep/diff those files when necessary.
(*) if you have a better name, please do propose
--
Later,
Lukas @lzap Zapletal
--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You are right that in general, this is an anti-pattern, but the data
we will be sending is write only logging data. It's essentially a log
sink, but for arbitrary data (blobs, files, name it you want). In
clustered setup, you simply keep the logs on nodes along with other
logging data. On the other hand, we aim at using syslog for everything
in the future, this could be additional thing to keep in mind.
We also discussed storing this in database schema in a formal way (new
table rendered_template with references to host/template_kind) but
arguments against are this is already quite complex part of our
schema, keeping history could easily grow the database and performance
could be problem for Remote Execution when performing large-scale
script runs. We ended up discussing actually removing audit trail data
from database, but this turned to be problem as we need to keep
reference to taxonomy, users and records.
This brings me to alternative proposal:
I plan to work on full syslog support for Rails and Smart Proxy in
January next year. The goal is to achieve structured logging using
some standard format (CEE/CEF). We would send those rendered templates
to Rails log (syslog) and there could be an optional syslog module to
extract those log entries and put them into similar directory
structure. This could be the default setup for small deployments, but
for large-scale we will recommend to send logs over network to central
logging point anyway, so the module could be installed there or
depending on setup this could be loaded into ELK stack. (*)
In short, let's send the rendered templates to syslog, it deals with
multiple lines just fine, we can simply filter these from
production.log and put them into a separate file or even directory
structure.
···
On Tue, Dec 5, 2017 at 6:34 PM, Timo Goebel <mail@timogoebel.name> wrote:
I don't think, this is a good idea. Storing files on the local filesystem is
generally consideres an anti-pattern imho. Both for web applications and
expecially in a container world.
In a clustered setup, you'd have to attach some kind of shared-storage to
the foreman hosts to make this work. Possibly via NFS or something. This is
meant to cause trouble. I generally advise not to have any state on an
application server.
I think, this can and should be done with a database. I agree, that the
audit model might not be the best place for this. But a database definitely
is.
Just my 2 cents.
- Timo
Am 05.12.17 um 18:01 schrieb Lukas Zapletal:
Hello,
Foreman does have a nice audit trail for many operations, what's
missing is ability to find how a template (e.g. kickstart) was
rendered. Storing whole template text in audit table is probably not
the best thing to do, production.log is also not a good fit, so here
is a proposal.
I want to create a small API called File audit (*) with ability to
store arbitrary files into
/var/log/foreman/file_audit/id/text-timestamp where "id" is record id
(or multiple ids concatenated with dots), "text" is arbitrary alphanum
text and "timestamp" is unix epoch number. Example:
That API will be used to store contents of all templates rendered, so
users can easily go "back in time" and see how templates are being
rendered. The directory would be root-only reading and files will be
created with restricted permissions (foreman user rw only). On system
with SELinux, security would be more tightened allowing writing only
to foreman_t domain and reading to nobody. For the example above, this
would mean:
1.33.7/my.host.lan-pxelinux-1512492603
1 - file audit type (static list, 1 stands for "template audit entry")
33 - host id
7 - template id
my.host.lan-pxelinux - extra data so users can work and search from
command line
1512492603 - UNIX epoch timestamp
Everytime new record is added, a log entry is created into
production.log containing file path. By default, there will be a cron
job deleting all files older than one month. In documentation, we will
ask users to rsync the directory to different location if long-term
archival is needed.
This API could be used for other audit logging, for example when user
uploads a manifest ZIP file in katello or new version of RPM/Puppet
file. This will be the first step to improve audit around templates,
later on we can create a plugin showing the data in audit/host pages
if needed. But in the first phase, administrators could easily
search/grep/diff those files when necessary.
(*) if you have a better name, please do propose
--
You received this message because you are subscribed to the Google Groups
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Sorry for the late reply.
I like the solution to treat this as any other log file. However, I’m not sure, if syslog is the way to go. The 12-factor app just logs to STDOUT. That also helps to view the logs with journald.
We have quite a large ELK setup for our logs. The best solution has always been to output the logs directly in JSON as you don’t need to transform the data (which can be quite expensive, depending on the number of logs you need to process).
I haven’t fully started on the design, but so far it looks like that syslog
is the only way that works for all backend systems (pulp and candlepin). I
mean local or UDP syslog endpoints.
That does not mean that Foreman must use it, on RHEL7 there is actually
both syslog and journald running, the latter has only small volatile buffer
forwarding everything to syslog. The key thing is that both services are
usually present, so we can use both as they can be configured both ways
(syslog->journald or journald->syslog). So I expect that we can leverage
journald structured logging in Foreman core app and have backend systems to
log into syslog in the first phase (unstructured). The correlation id,
which is being passed into logs by Candlepin today (!) can be extracted at
later stage.
Can you elaborate on the ELK stack you have? What you mean by JSON logging,
I don’t exactly understand how JSON can be used for that. Can you share
your vision how common structured logging should be implemented in Foreman?
It seems to me it might be more useful to track these templates in a git repository. Would there be an easy way to wrap git for this? This would allow you version control, history and auditing without having to do too much work.
For the record, as part of structured logging, we have added new logger called “blob” which is turned off by default. Every template logs whole contents into this logger after successful rendering so this can appear in production.log or syslog/journald and this can be further integrated with 3rd party logging solutions. Available fields: template_digest, template_name, template_context, template_host_name and template_host_id.
From here, this can be possibly extracted via script or something or simply kept in ELK stack for record purposes.