One way to standardize your SSIS packages (with dynamic SQL)

This time I thought we can have a look at one way of making your SSIS packages a bit more copy and paste friendly. The assumption in this case is that your source is a SQL Server database and that you don’t do too much (any) data conversion on the way.

The idea

The idea, or logic if you want it, is that you don’t want to hard code what data you take or that you don’t have the energy for it. Also, that you want to be able to delta load all packages with minimal effort. One key element in the way I solved this is the package name (has to include the name of the target table).

The control flow

This is the complete control flow, first part truncates, second part get’s a dynamic SELECT statement and the third part inserts the data into the target table

The variables

The following variables are needed, the first one saves the SELECT statement returned by the procedure (later), the second one is static with the schema of our source, the third is also static with the name of the source table and the fourth is an expression that creates a TRUNCATE TABLE schema.TableName based on the name of the package

The procedure (used in the second part of the control flow)

-------------------------------------------------------------------------------------
-- Description:  This procedure generates a dynamic select statement based on the input
--               table and schema. Ideally it can be used by a SSIS statement to handle
--               the retrieval of the source data.
-- Created:      2018-04-03
-- Author:       mattssok.com
-- Execute:      EXEC [usp_SelectFromSource] @Table = 'YourTable', @Schema = 'dbo'
-------------------------------------------------------------------------------------

CREATE PROCEDURE [dbo].[usp_SelectFromSource] 
(
  @Schema nvarchar(100),
  @Table  nvarchar(100),
  @Query  nvarchar(4000) OUTPUT 
)
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @sqlQ   nvarchar(max) = 
    'SELECT ' + 
	  STUFF
      (
        (
          -- Compile the columns that should be in the select statement
          SELECT
            ', ' + QUOTENAME(a.name) 
          FROM        
            sys.columns a
          INNER JOIN 
            sys.tables  b ON (a.object_id = b.object_id)
          INNER JOIN
            sys.schemas c ON (b.schema_id = c.schema_id)
          WHERE
            b.name     = @Table
            AND c.name = @Schema
          ORDER BY    
            a.column_id
          FOR XML PATH(''), TYPE
        )
        .value('.', 'nvarchar(max)'), 1, 2, ''
      ) +
    ' FROM ' + @Schema + '.' + @Table + 
    -- This part is to add a where clause that 
    -- use, if you have one, a column with the keyword 'Updated' as a where filter
    -- the reasoning behind it is that you could start 
    -- delta loading from your source by only modifying this clause
    ' WHERE ' + QUOTENAME(SUBSTRING(@Table, 1, CHARINDEX('_', @Table)) + 
    'UPDATED') + ' >= ''' + CAST(GETDATE() - 7500 AS nvarchar(25)) + ''''

  SET @Query = @sqlQ
  SELECT @Query
END
GO

The output

SELECT [Col1], [Col2], [COL3], [ColN] FROM [schema].[YourTable] WHERE [UPDATED] >= 'Jun 9 1998 1:47PM' (in this case we load a lot of history but just change GETDATE() – 7500 if you want another limit)

The data flow

The first step here gets the source data and you want to do that via the variable ‘vSelectQuery’ that has the statement returned from the procedure, second step adds the SystemExecutionId for controlling purpose and the third step inserts the data

The result

By designing the package like this you get a dynamic package that is easy to replicate and copy. Everything you need to do is to:

  1. Make a copy of the existing package
  2. Make sure that you have a table for inserting the data into
  3. Update the vSourceTable variable
  4. Paste a SELECT x, y, n FROM TableName into the vSelectQuery variable (the purpose here is to refresh the metadata, the actual query will be overwritten at run time)
  5. Click on the ‘Get Source data’ in the data flow to refresh the metadata
  6. Resolve the unmapped references
  7. Map your desired input into the new table (it should be quick since the procedure returns the column in the same order as the table)

The list above might sound like much but after you’ve made one to three packages you will have it down to a couple of minutes, top!

//mattssok

mattssok

Leave a Reply

Your email address will not be published. Required fields are marked *