Data Architecture – A comparative view

19 February 2013 / Shaun Goodman

Data is one of the most important assets to an Organization.  However, just by collecting data does not mean an Organization is “enabled” to make quick, informed and smart business decisions. There is always a multitude of data available to an Organization, sourcing from multiple systems, with each system providing a piece of the data puzzle. So it is any wonder why Organization’s can sometimes feel paralyzed to make sense of it all. So how can we apply meaning to an Organization’s data to “Enable” them?

Data Modeling

Data Modeling is the systematic approach of applying structure to the data domains of an organization to allow it to be understood, reusable and extensible to an Organization’s ever-changing business needs.  First and foremost, the aim of Data Modeling is to capture the core data domains required by an Organization in order to fulfill desired business functions.  These data domains need not be data which is necessarily owned or managed by an Organization, but can be supplementary through data purchase or external data source agreements (e.g. ANZIC codes).

The Data Model – the output of Data Modeling – is the integral component that stiches the “Data” domains of an Organization together in a meaningful way to enable “Information” to be retrieved and exchanged between End Users and other Internal/External systems.

You may ask, why is this important?  Imagine an Organization that has multiple CRM systems, multiple billing systems and you need to answer questions such as – How many customers are currently active?  What is the billed revenue for the last quarter?  You now need to obtain access to and integrate data from multiple source systems, potentially sitting on different technology stacks, with different data concepts, formats, meanings, etc.

This is a very common situation that arises in medium to large organizations, and the ability to obtain a Single Customer view becomes extraordinarily complex.  In my experience, the most common solution to such a problem is the introduction of an Enterprise Data Warehouse (EDW).  In such situations, the EDW is used as the hub for an Organization’s data.  It typically comprises an Enterprise-Wide data model that represents all disparate sources of data in a common format, language and meaning with required structure and relationships to aid in usability and understandability.  The style of data model most commonly implemented in such situations is an “Inmon” normalized data model (typically normalized in 3rd Normal Form).  The centralization and integration of this data not only assists in answering the common questions an Organization has, but provides the platform for Organization’s to garner new insights through correlations and statistical modeling, further solidifying the value proposition a well though-out, integrated data model can provide to an Organization.  Such data hubs can directly or indirectly service the needs of management reporting, operational functions, operational systems and marketing alike.

It is however common for “Inmon” styled data models to have deeply normalized structures that although satisfy the fundamentals of ER modeling, will often require complex join patterns to obtain required data.  Inmon himself recognized this deficiency, and posed that Datamarts be established that meet the needs of the consumers.   Datamarts themselves have an underlying data model, and it need not be the same data model approach as used by the data warehouse layer.   Indeed, it is often the case that Dimensional Modeling is used for consumers who access their data through reporting tools.  Given the focus on datamarts is more around how the data is consumed and usually tied to a specific subject area (e.g. Sales, Revenue, etc.), the types of information required to be retrieved and how it is typically viewed together is generally known, which further lends itself to Dimensional Modeling.

Various solutions exist to extract data from such normalized schemas – Semantic Views forming Logical Data Marts, Materialised Views, Physically separate Data Marts or a combination thereof.  Regardless of the options used, the fundamental of what is delivered is the same – it must be consumable by the end-user from both a usability and performance perspective.  I’ve seen all of these approaches used with varying degrees of success.  There are some obvious benefits associated to logical semantic views – no duplication of data, reduced data delivery time, reduced ETL logic to maintain, etc.   That said, the tradeoff is that it puts a heavy reliance on tuning the database (within the EDW normalized schema) to support the query access paths of the Semantic Views, and in some cases it simply cannot meet performance requirements with tuning alone, or cannot support slowly changing dimension (SCD Type 2) views of data.  I have seen that this approach has worked best when applied against small to medium sized data sets where point-in-time history of dimensions required.  But when moving into the realm of deeply normalised models or the need to have a point-in-time dimensional views of data, this needs to be satisfied by the more commonly used physical Datamart approach.  Although this results in more database tables and increased ETL logic, it will provide versatility to deal with the performance and SCD related aspects of a given solution.  It is not uncommon to see a hybrid of these approaches used as each Organization juggles the trade-off between increased ETL objects, delivery SLAs and performance requirements.

For smaller organizations, it is typical for their operating environments to contain smaller data sets and have fewer systems performing their day-to-day business operations.  This reduced complexity lends itself to Dimensional Modeling.  In such an environment, the required structures to support end-consumers can be built without an intermediate EDW layer, as the focus is more around providing a view into the data, and less around creating a single customer view or supply of data to other system interfaces.  Furthermore, many reporting tools (e.g. Cognos, Business Objects, etc.) prefer to consume data represented in a Dimensional Model as it more geared to OLAP style analysis, and simplifies the reporting build effort by reducing the level of transformations required to abstract the data into Reporting objects.

Using this approach provides benefits such as reduced ETL logic, increased agility to change (as compared to a Data Warehouse and Datamart solution), and provides faster ROI. Any organization – large or small can benefit from the application of an appropriate data model.  Not only does it provide the spring-board for propelling the reporting and analytics capabilities of an Organization, but also supplies a valuable asset and common language from which data throughout an Organization can be communicated.  There are a vast number of Data Modeling tools available in the market place to aid the Data Modeling process (e.g. ERWIN, IBM Data Architect, etc.).  Likewise there are a number of Information Management specialist companies that wield these tools and apply these data modeling approaches day-in and day-out (e.g. InfoReady) to “Enable” the data assets of Organizations.

It is a prudent for any Organization to invest in their information assets.  Knowing where you’ve been, where you are and where you’re heading is the only way to actively participate and get ahead in the ever changing and competitive landscape that Organization’s are faced with today.

Previous  > >  Next