I have spent a fair amount of time building data archival systems. This entails mapping and migrating data from older legacy systems into a more modern and lower cost DBMS like Oracle or SQL Server. Since these are archival systems the data comes over and the integrity and makeup of the data structures largely get preserved as they were on the legacy source system. De-normalized data is rather common in older systems, especially those that were done in Cobol. For good or bad it is what it is.
Typically, my team and I also have to build front end user applications to access this data. If you have every tried to map de-normalized data into a grid control for a user interface you quickly come to realize the processing fun and games you have to contend with. This little ditty is about how to normalize, de-normalized data without having a) to physically modify the table data itself, b) end up performing complex processing in your code to make it grid ready. Keep in mind, this is just one technique. There are many.
A simple example is in order. I have hardcoded the where clause for simplification. One certainly would use this technique using variables. Even though I have demonstrated the technique using a vanilla select statement, one could also accomplish this using a stored procedure.
JTS Acct Fac Date1 Descr1 Amt1 Date2 Descr2 Amt2 ------------------------------------------------------------------------------------------- 211381428873996900 95137 PMH 2010-10-05 DED APPL 321.00 2010-10-05 CRED APPL -21.37 211381428874000815 95137 PMH 2010-11-14 DED APPL 61.00 2010-11-14 OFFSET -61.00 211381428874877321 95137 PMH 2011-08-12 BAL APPL 51.60 2011-08-10 BAL ADJ 00.60
As you can see there are essentially two discrete rows of data for each row. The following is our query that produced the recordset.
SELECT jts as 'JTS', acctnum as 'Acct', facility as 'Fac', convert(varchar(10),entrydate1,20) as 'Date1', descr1 as 'Descr1', amt1 as 'Amt1', date2 as 'Date2', descr2 as 'Descr2', amt2 as 'Amt2' FROM CHARGES WHERE acctnum = 95137
We desire for our output is something that looks like this. The recordset below conforms to a grid control much better than the previous recordset depicted.
Acct Fac Date Descr Amt ----------- ---------------------------------- 95137 PMH 2010-10-05 DED APPL 321.00 95137 PMH 2010-10-05 CRED APPL -21.37 95137 PMH 2010-11-14 DED APPL 61.00 95137 PMH 2010-11-14 OFFSET -61.00 95137 PMH 2011-08-12 BAL APPL 51.60 95137 PMH 2011-08-10 BAL ADJ 00.60
But, how do we achieve this type of output?
Well, if we analyze the first query and output we can quickly determine that the Date, Description, and Amount columns are the same and are essentially being replicated. One does not want to assume by the data and column names alone that they are identical. You also need to take a look at the table schema to make sure the data type are the same as well. Never assume anything when working with legacy data.
In this particular example (which by the way is a scaled down, much simplified example of a real problem I had to deal with on a recent project), the best means for dealing with this pattern of de-normalization is through creating two views. Let’s jump straight to the code.
DROP VIEW chg_amt1 GO CREATE VIEW chg_amt1 as SELECT jts as 'JTS' acctnum as 'Acct', facility as 'Fac', convert(varchar(10),entrydate1,20) as 'Date', descr1 as 'Descr', amt1 as 'Amt', FROM CHARGES GO DROP VIEW chg_amt1 GO USE VIEW chg_amt2 as SELECT jts as 'JTS' acctnum as 'Acct', facility as 'Fac', convert(varchar(10),entrydate2,20) as 'Date', descr2 as 'Descr', amt2 as 'Amt', FROM CHARGES GO
First, I like to add DROP VIEW code at the top of my view. I seldom get my view dialed in on the first try. Since dropping a view has no adverse effect on the data itself it makes my life easier to make and apply changes.
Next, if you analyze the two views you will notice that I have broken out the de-normalized data columns to their respective view. Also notice that for these columns I have used the exact same alias names. This is very important.
Finally, you will note that neither view offers no row restriction. One would not want to run these views by themselves against a large table. Essentially, the two views are the same except for the three referenced de-normalized columns. Also note that both views have the same number of columns and that the column order is the same between the views.
What next? ...
Now that we have created our two views all that remains is to create an SQL query using a UNION that will allow us to normalize our de-normalized data.
SELECT * from chg_amt1 where chg_amt1.acctnum = 95137 UNION SELECT * from chg_amt2 WHERE chg_amt2.acctnum = 95137 ORDER BY jts
As you can see in the code all we have are two simple select statements joined together by a SQL UNION. This simple technique will return the desired recordset.
Acct Fac Date Descr Amt ----------- ---------------------------------- 95137 PMH 2010-10-05 DED APPL 321.00 95137 PMH 2010-10-05 CRED APPL -21.37 95137 PMH 2010-11-14 DED APPL 61.00 95137 PMH 2010-11-14 OFFSET -61.00 95137 PMH 2011-08-12 BAL APPL 51.60 95137 PMH 2011-08-10 BAL ADJ 00.60
It should be noted that the ORDER BY clause is required at the end of the second SELECT. Also important to note, you must use * after the select. While it is possible to choose specific columns when referencing a view it is not possible to do so using the UNION. What happens under the hood (at 50,000 feet) is that the query executor runs each query pretty much the same as if you were to run them separately yourself. It then collates the two recordsets into a single recordset based upon the ORDER BY criteria. This is why it is important that when doing a union the two queries have the same number of columns, the alias names are the same, and ordinal between the two queries the data types line up (if the physically do not, it is possible to CAST or CONVERT the column to get it to conform.
As I said above, this is only one way to deal with de-normalized data. The example presented is appropriate for the technique I just used. If you have a similar problem you now have a solution. There are many scenarios for dealing with de-normalized data and many different approaches and solutions. This technique will serve you if you need to deal with…
This is not the be-all end all example. Just as there are numerous types of saw, so there are numerous means for dealing with de-normalized data.
