I'd like your feedback on another topic relating to the schema, this
one at a higher level. Basically, the question is whether you think we
should adopt a flat schema, or a hierarchical schema. To explain:

- With a flat schema, we assign some number of "buckets" to hold event
data, give a name to each bucket, and then for any event that comes in
we extract the relevant data and drop it into that bucket
- With a hierarchical schema, we would have a set of "objects" with
attributes that we could attach to events, and then for any given event
that we receive we extract the relevant data and set the various
attributes of the various objects.

You may have already noted that to some extent we already have a
hierarchical schema, in that we define four top-level containers - the
Initiator, Action, Target, and Observer - and then we have some
pseudo-objects under those like User, Host, etc. Right now we sort of
"fake" an object-based schema by using CamelCase, e.g. InitUserName,
InitUserDomain, etc. With a true hierarchical schema, we'd actually have
an object calling Initiator, which might have a child object called
User, which might have attributes Name, Domain, ID.

Please again ignore details about how this would actually be
implemented; you could always convert from one to the other for internal
storage if need be. Instead focus on whether it would be easier to
access the data you want to see by using a hierarchical model or a flat

These are the pros and cons as I see it:

- The flat schema is a little easier to display in a table and easier
to read in a single-line format, but on the other hand the object schema
yields much more interesting, interactive displays (the SLM event
display, again, sort of "fakes" an object schema but putting the Init*
fields at left and the Target* fields at right.

- If we go to an object schema, we could actually reference almost any
type of object we wanted by re-using something like DMTF's Common
Information Model, which describes virtually any manageable IT resource.
Right now if we want to include, say, a MAC address and we hadn't
thought about that before, we have to completely revise our flat schema
and define a new field. The potential downside is that not every event
would then have even a standard set of fields if the values for those
fields were null.

- The other downside of course is how we migrate from one schema to
another if we fundamentally change how this works. I think this is
do-able, however, if we come up with some migration plans and perhaps
support both models in some way for a while (the flat schema, for
example, is just a representation of some subset of object attributes).

So let's give some examples. Let's say that we have a user opening a
file - simple enough. In a flat schema, this might look like:
{ "InitUserName": "user2", "InitUserID": "104", "InitHostName": "dc01",
"TargetDataName": "syslog-ng.conf", "TargetDataContainer":
"/etc/syslog", "TargetHostName": "dc01" }
Note that it's a little ambiguous that this particular username
represents an account on host 'dc01'

But in an object schema, this might look like:
{ Initiator: { Account: { Name: user2, UserID: 104, Host:
"Target": { "File": { "Name": "syslog-ng.conf", "Container":
"/etc/syslog", "Host": "dc01"}}}

OK, so what do you all think?

DCorlette's Profile:
View this thread: