EXPERT   SQL Service


Simplify Data Transformations

SQL Server 2000 offers powerful and flexible new features to perform data transformations.

by Mary V. Hooke

  SQL Server 2000 introduces several new Data Transformation Services (DTS) features that give you considerably more flexibility in transforming data. These new features include package logs, disconnected editing, and a multiphase data pump.

What you need:
SQL Server 2000
I'll show you how to take advantage of each of these features in a sample application that performs a simple data transformation in two steps, reducing the time and effort required previously to perform this task. The first step executes a Dynamic Properties task, and the second step exports the contents of the Publisher's table in the Access Biblio database to a text file. The DTS package used in the sample application creates a separate record log file that contains the PubId for all the records that transport successfully to the destination file. (Be sure to read the readme file for important configuration tips before you attempt to run the sample application.)

SQL Server 2000's new customizable multiphase data pump is an OLE DB service provider responsible for importing, exporting, and transforming data between data sources. Whenever you execute a Transform Data task, Data Driven Query task, or Parallel Data Pump task, you invoke the data pump. In the previous version of DTS, you were able to customize one phase of the data pump by creating an ActiveX script that would execute during a transformation. If you wrote a transformation script in the earlier version of DTS, it would execute for each row of data processed after the data had been fetched from the source, but before it was inserted into the destination. In the latest version of DTS, you can customize several phases of the data pump that have been exposed (see Table 1). This enhancement provides more flexibility in the areas of error handling and customization.

You must set the property that makes the multiple phases of the data pump accessible though DTS Designer before you can write scripts that customize them. Set this property by right-clicking on the Data Transformation Services node of the Enterprise Manager, then select the Properties menu to invoke the generic Package Properties dialog box. By default, the Show Multiphase Pump in DTS Designer's checkbox isn't checked. Check this box to enable multiphase script writing in DTS Designer. Multiphase scripting is available for all packages you edit from DTS Designer once you set this property.

You can write ActiveX scripts to execute from various phases of the data pump by right-clicking on a transformation and selecting the Properties menu. The Transformation tab contains a dropdown listbox called Phases Filter. Select the phase you want to customize from this list, click on the New command button, and select ActiveX script as the transformation type to bring up the Transformation Options dialog box. Then, click on the Properties window and enter your custom script in the ActiveX Script Transformation Properties window.

The first scripting phase is the Pre Source Data phase. Generally, the DTS data pump executes scripts entered for this phase once, prior to the first fetch from the data source. In the sample application, the Pre Source Data script creates the record log file that contains the list of IDs for the records that the DTS package imported successfully. Typically, scripts you write during this phase initialize data or write out header rows to files.

The second scripting phase is the Row Transformation phase. The previous version of DTS provided only this phase for scripting. Scripts written for this phase execute each time the DTS data pump retrieves a row of data from the source, prior to inserting the row into the destination.

The third phase is the Post Row Transform phase, which includes the On Transform Failure, On Insert Success, and On Insert Failure subphases. The DTS data pump executes scripts written in this phase after each row of data passes through the Transformation phase.

Handle Errors More Easily
The DTS data pump executes the On Transform Error scripts when an error occurs during a transformation. In this ideal location for error-handling routines, the On Transform Error script executes prior to either the On Insert Success or On Insert Failure routines, allowing you to see why an error was raised, fix the problem, and control further execution of the transformation. Scripts written in the On Insert Success phase execute after the row inserts successfully into the destination. Within the On Insert Success subphase, the sample application contains the script to append the ID for a record inserted successfully into the record log file. On Insert Failure scripts execute if a row can't be inserted successfully into the destination.

 
Figure 1 Set the Properties.
The DTS data pump executes the On Batch Complete script after it processes a batch of rows. By default, the DTS data pump transfers all records in one batch. You can change the batch size by right-clicking on the transformation and selecting the Properties menu. The Options tab on the Transform Data Task Properties dialog box contains the information about batch sizes. The DTS data pump executes the On Pump Complete phase after it processes all rows. The Post Source Data phase follows this phase when the task completes. The Post Source Data phase lets you access data in the destination object. The sample application contains a Post Source Data script to append the final footer row to the record log file.

Debugging a DTS package has never been easy, but the addition of DTS package logs is a step in the right direction. SQL Server 7.0 provides the ability to create DTS exception logs, which you can use with Data Driven Query tasks and Transform Data tasks to record information on specific rows of data causing errors during the transformation process. In addition to DTS exception logs, DTS includes the ability to create a package log file. You can generate a package log every time the package is run to provide troubleshooting information. The package log provides an overview of the steps in a package. DTS records each step in the log, along with information as to whether the step executes successfully or if it encounters any errors. This can be helpful in determining where and why a package error occurred.

Add a package log to a package by selecting Properties from the Package menu; the Package Properties dialog box appears and you can find the information for the package log on the Logging tab. You create a log file to be stored within SQL Server by checking the Log Package Execution to SQL Server checkbox. You can view the package log in the Enterprise Manager by opening the Data Transformation Services node in the treeview and selecting Local Packages. Right-click on a package and select the Package Logs menu option. The sample app uses a separate error file (see Listing 1). Package log files are cumulative, so every time a user runs a package, the program appends information for that specific execute to the log file without overwriting the log information for previous package executions.

Modify Properties While Disconnected
The previous version of DTS requires you to establish a connection to a data source and destination before you modify properties for connection objects and data transformations. This ensures the properties you set for the connection and transformation are valid. This is a useful safety precaution, but sometimes it makes it difficult to run a package on a machine different from the one it was created on.

The latest version of DTS lets you configure properties for your DTS package using the Disconnected Edit window. This window allows you to modify properties without establishing a live connection to the data sources and destinations. This capability proves useful in situations where the machine the DTS runs on is different from the machine it was created on.

However, a significant drawback exists to modifying connection and transformation properties without establishing a connection: DTS is unable to ensure that you're making valid changes to the properties. Worse, if you make an invalid property assignment, you might not notice it until you receive an error when attempting to run the package. Another caveat: When you change a task's name using Disconnected Edit, the task's name does not update automatically in the Task Name property for the step the task is associated with. This means that after changing the task's name, you have to use Disconnected Edit to enter the new name manually in the Task Name property for any associated steps. Access the Disconnected Edit window by selecting Disconnected Edit from the Package menu in DTS Designer (Figure 1).

Disconnected Edit also allows you to edit some properties that aren't available through DTS Designer. For example, if you right-click on a connection object in DTS Designer and select the Properties menu, you see there's no way to edit the connection's name. However, you can edit a connection name in the Disconnected Edit window by dropping down the Connections node of the treeview and selecting the desired connection. The properties for the selected connection appear on the right side of the window; double-click on the Name property to edit it. Other properties not exposed elsewhere in DTS Designer, but which you can still edit, include task names and step names.

You should be aware that there are some complications with the version of Disconnected Edit that shipped with the second beta release of SQL Server 2000. After changing the name of some objects, you are unable to modify that object in DTS Designer, because the designer is still trying to refer to the object by its old name. Hopefully, the final release of the product will resolve these complications.

Produce Project Templates
Your ability to create packages without establishing a connection to the data sources allows for the creation of project templates. In addition to any templates that Microsoft might include with SQL Server, you can create your own DTS template files by saving a DTS package with a DTT extension.

Templates are read-only DTS packages that can simplify the creation of packages performing standard tasks. To open a template, right-click on the DTS node in the Enterprise Manager, select All Tasks, then Open Template. After you open a template, you need to use the Disconnected Edit window to set the connection properties for your data sources, as well as any other applicable properties. It's a worthwhile idea to include text annotations in your templates to explain what properties the user will need to set. After you set the necessary properties to allow a template to run on your machine, select Save As from the Package menu and save the template within SQL Server or as a new file with a DTS extension.

Another new feature is the addition of the Dynamic Properties task to the toolbar. The Dynamic Properties task retrieves values from locations outside of DTS and uses them to assign property values within the package. In the sample application, the Data Source property for both the Biblio database and the text file destination is null. Before the data migration takes place, a Dynamic Properties Task is run to read the file path for both of these objects from an INI file and populate them to the Data Source property for the Access and text file connections. To make a Dynamic Properties task that functions in this way, first add a Dynamic Properties task from the toolbar. Right-click on the task and select Properties. Click on the Add button to add a property to be set. Next, select the Data Source property for the Biblio connection and click on the Set button. In the Add/Edit Assignment dialog box that appears, enter the information for the INI file to be read and click on the OK button. You must treat the Data Source property of a text file in the same fashion. In addition to reading from INI files, the Dynamic Properties task can also retrieve values from query results, data files, global variables, constants, and environment variables. You determine the source of the data by your selection in the Source dropdown listbox on the Add/Edit Assignment dialog box.

Besides the Dynamic Properties, the latest version of DTS has new built-in tasks displayed on the task toolbar. Among these tasks are the File Transfer Protocol task, which you use to download files or data from remote locations using FTP, and the Message Queue task, which you use for sending information between packages using message queuing. It's worth noting you can now use DTS to transfer table information such as keys and constraints, not just for transferring data. Take some time to get familiar with the new version of DTS. It's still a useful tool for both simple and complex data transformation tasks, but its new features are definitely a step in the right direction when it comes to improving the flexibility and ease of use of DTS.


Mary V. Hooke is a contributing editor for Visual Basic Programmer's Journal. In addition to writing for VBPJ, Mary's the president of Distinctive Development Ltd., a firm that develops custom Internet and intranet applications in the New York City area. Reach Mary by e-mail at mhooke@distinctivedevelopment.com.