20091022

Demystifying the SSIS PIVOT and UNPIVOT Data Flow Tasks

From a high-level, general point of view:

PIVOT:  turns ROWS into COLUMNS
UNPIVOT:  turns COLUMNS into ROWS

From a graphical point of view:
PIVOT:

UNPIVOT:


From another graphical point of view:
PIVOT:


UNPIVOT:


From a logical point of view:
PIVOT:  If you need to summarize or group your row data, most likely you will need to use the PIVOT task

UNPIVOT:  If you have a set of horizontal columns like [Ham, Soda, Milk, Beer, Chips, etc.] or [Period 1, Period 2, Period 3, etc.] or [Joe, Mike, Kelly, Pat, etc.] or [New York, Atlanta, Los Angeles, San Francisco, etc.] and you would like to list these values vertically in rows, UNPIVOT is probably the task for you.

ONE IMPORTANT NOTE:  The Pivot Data Flow Item does NOT aggregate data.  Therefore, to accomplish the first pivot operation above, you will need to use the Aggregate Data Flow Item to aggregate the data before sending the dataset to the Pivot Data Flow Item.


SSIS Example (with actual data shown):

PIVOT operation:


UNPIVOT operation:


Same SSIS Example (with properties shown):


PIVOT operation (2 pics):

Photobucket


UNPIVOT operation:



The trickiest part in setting this up is setting up the properties of the Pivot Data Flow Task:



For the input columns, the most important property is PivotUsage.  Here are the possible values and what they mean:

0 - This column is passed straight through.  (It will still be a column in the output, none in this example)
1 - "Set Key".  The distinct values in this column(s) will become the rows in the output. (in this example, "Name" is the Set Key)
2 - "Pivot Column" The VALUES in this column will become columns in the output as determined by you in the Output Columns.
3 - "Values"  The values in this column will be at the intersection of the rows and columns that you just set up.

Microsoft reference: http://msdn.microsoft.com/en-us/library/ms140308.aspx

If you have six columns that you want displayed in your output, then you'd create six columns here.  In this example, we only want 4 ("Name", "Category X", "Category Y", "Category Z"):





You must map each output column to an input using the input's LineageID.  You put the LineageID into the output column's SourceColumn.  The PivotKeyValue identifies which distinct value in the input's "Pivot Column (PivotUsage = 2)" that will serve as the "key" for this column (in this case it's "x" for the "Category X" column).


Clear as mud?

No comments:

Post a Comment