QL Server 2000
introduces several new Data Transformation Services (DTS) features
that give you considerably more flexibility in transforming data.
These new features include package logs, disconnected editing, and a
multiphase data pump.
What you need: SQL Server 2000
| |
I'll
show you how to take advantage of each of these features in a sample
application that performs a simple data transformation in two steps,
reducing the time and effort required previously to perform this
task. The first step executes a Dynamic Properties task, and the
second step exports the contents of the Publisher's table in the
Access Biblio database to a text file. The DTS package used in the
sample application creates a separate record log file that contains
the PubId for all the records that transport successfully to the
destination file. (Be sure to read the readme file for important
configuration tips before you attempt to run the sample
application.)
SQL Server 2000's new customizable multiphase data pump is an OLE
DB service provider responsible for importing, exporting, and
transforming data between data sources. Whenever you execute a
Transform Data task, Data Driven Query task, or Parallel Data Pump
task, you invoke the data pump. In the previous version of DTS, you
were able to customize one phase of the data pump by creating an
ActiveX script that would execute during a transformation. If you
wrote a transformation script in the earlier version of DTS, it
would execute for each row of data processed after the data had been
fetched from the source, but before it was inserted into the
destination. In the latest version of DTS, you can customize several
phases of the data pump that have been exposed (see Table 1). This enhancement
provides more flexibility in the areas of error handling and
customization.
You must set the property that makes the multiple phases of the
data pump accessible though DTS Designer before you can write
scripts that customize them. Set this property by right-clicking on
the Data Transformation Services node of the Enterprise Manager,
then select the Properties menu to invoke the generic Package
Properties dialog box. By default, the Show Multiphase Pump in DTS
Designer's checkbox isn't checked. Check this box to enable
multiphase script writing in DTS Designer. Multiphase scripting is
available for all packages you edit from DTS Designer once you set
this property.
You can write ActiveX scripts to execute from various phases of
the data pump by right-clicking on a transformation and selecting
the Properties menu. The Transformation tab contains a dropdown
listbox called Phases Filter. Select the phase you want to customize
from this list, click on the New command button, and select ActiveX
script as the transformation type to bring up the Transformation
Options dialog box. Then, click on the Properties window and enter
your custom script in the ActiveX Script Transformation Properties
window.
The first scripting phase is the Pre Source Data phase.
Generally, the DTS data pump executes scripts entered for this phase
once, prior to the first fetch from the data source. In the sample
application, the Pre Source Data script creates the record log file
that contains the list of IDs for the records that the DTS package
imported successfully. Typically, scripts you write during this
phase initialize data or write out header rows to files.
The second scripting phase is the Row Transformation phase. The
previous version of DTS provided only this phase for scripting.
Scripts written for this phase execute each time the DTS data pump
retrieves a row of data from the source, prior to inserting the row
into the destination.
The third phase is the Post Row Transform phase, which includes
the On Transform Failure, On Insert Success, and On Insert Failure
subphases. The DTS data pump executes scripts written in this phase
after each row of data passes through the Transformation phase.
Handle Errors More Easily
The DTS data pump executes
the On Transform Error scripts when an error occurs during a
transformation. In this ideal location for error-handling routines,
the On Transform Error script executes prior to either the On Insert
Success or On Insert Failure routines, allowing you to see why an
error was raised, fix the problem, and control further execution of
the transformation. Scripts written in the On Insert Success phase
execute after the row inserts successfully into the destination.
Within the On Insert Success subphase, the sample application
contains the script to append the ID for a record inserted
successfully into the record log file. On Insert Failure scripts
execute if a row can't be inserted successfully into the
destination.
| |
 |
Figure
1 Set the Properties.
|
The DTS data pump
executes the On Batch Complete script after it processes a batch of
rows. By default, the DTS data pump transfers all records in one
batch. You can change the batch size by right-clicking on the
transformation and selecting the Properties menu. The Options tab on
the Transform Data Task Properties dialog box contains the
information about batch sizes. The DTS data pump executes the On
Pump Complete phase after it processes all rows. The Post Source
Data phase follows this phase when the task completes. The Post
Source Data phase lets you access data in the destination object.
The sample application contains a Post Source Data script to append
the final footer row to the record log file.
Debugging a DTS package has never been easy, but the addition of
DTS package logs is a step in the right direction. SQL Server 7.0
provides the ability to create DTS exception logs, which you can use
with Data Driven Query tasks and Transform Data tasks to record
information on specific rows of data causing errors during the
transformation process. In addition to DTS exception logs, DTS
includes the ability to create a package log file. You can generate
a package log every time the package is run to provide
troubleshooting information. The package log provides an overview of
the steps in a package. DTS records each step in the log, along with
information as to whether the step executes successfully or if it
encounters any errors. This can be helpful in determining where and
why a package error occurred.
Add a package log to a package by selecting Properties from the
Package menu; the Package Properties dialog box appears and you can
find the information for the package log on the Logging tab. You
create a log file to be stored within SQL Server by checking the Log
Package Execution to SQL Server checkbox. You can view the package
log in the Enterprise Manager by opening the Data Transformation
Services node in the treeview and selecting Local Packages.
Right-click on a package and select the Package Logs menu option.
The sample app uses a separate error file (see Listing 1). Package log
files are cumulative, so every time a user runs a package, the
program appends information for that specific execute to the log
file without overwriting the log information for previous package
executions.
Modify Properties While Disconnected
The previous
version of DTS requires you to establish a connection to a data
source and destination before you modify properties for connection
objects and data transformations. This ensures the properties you
set for the connection and transformation are valid. This is a
useful safety precaution, but sometimes it makes it difficult to run
a package on a machine different from the one it was created on.
The latest version of DTS lets you configure properties for your
DTS package using the Disconnected Edit window. This window allows
you to modify properties without establishing a live connection to
the data sources and destinations. This capability proves useful in
situations where the machine the DTS runs on is different from the
machine it was created on.
However, a significant drawback exists to modifying connection
and transformation properties without establishing a connection: DTS
is unable to ensure that you're making valid changes to the
properties. Worse, if you make an invalid property assignment, you
might not notice it until you receive an error when attempting to
run the package. Another caveat: When you change a task's name using
Disconnected Edit, the task's name does not update automatically in
the Task Name property for the step the task is associated with.
This means that after changing the task's name, you have to use
Disconnected Edit to enter the new name manually in the Task Name
property for any associated steps. Access the Disconnected Edit
window by selecting Disconnected Edit from the Package menu in DTS
Designer (Figure 1).
Disconnected Edit also allows you to edit some properties that
aren't available through DTS Designer. For example, if you
right-click on a connection object in DTS Designer and select the
Properties menu, you see there's no way to edit the connection's
name. However, you can edit a connection name in the Disconnected
Edit window by dropping down the Connections node of the treeview
and selecting the desired connection. The properties for the
selected connection appear on the right side of the window;
double-click on the Name property to edit it. Other properties not
exposed elsewhere in DTS Designer, but which you can still edit,
include task names and step names.
You should be aware that there are some complications with the
version of Disconnected Edit that shipped with the second beta
release of SQL Server 2000. After changing the name of some objects,
you are unable to modify that object in DTS Designer, because the
designer is still trying to refer to the object by its old name.
Hopefully, the final release of the product will resolve these
complications.
Produce Project Templates
Your ability to create
packages without establishing a connection to the data sources
allows for the creation of project templates. In addition to any
templates that Microsoft might include with SQL Server, you can
create your own DTS template files by saving a DTS package with a
DTT extension.
Templates are read-only DTS packages that can simplify the
creation of packages performing standard tasks. To open a template,
right-click on the DTS node in the Enterprise Manager, select All
Tasks, then Open Template. After you open a template, you need to
use the Disconnected Edit window to set the connection properties
for your data sources, as well as any other applicable properties.
It's a worthwhile idea to include text annotations in your templates
to explain what properties the user will need to set. After you set
the necessary properties to allow a template to run on your machine,
select Save As from the Package menu and save the template within
SQL Server or as a new file with a DTS extension.
Another new feature is the addition of the Dynamic Properties
task to the toolbar. The Dynamic Properties task retrieves values
from locations outside of DTS and uses them to assign property
values within the package. In the sample application, the Data
Source property for both the Biblio database and the text file
destination is null. Before the data migration takes place, a
Dynamic Properties Task is run to read the file path for both of
these objects from an INI file and populate them to the Data Source
property for the Access and text file connections. To make a Dynamic
Properties task that functions in this way, first add a Dynamic
Properties task from the toolbar. Right-click on the task and select
Properties. Click on the Add button to add a property to be set.
Next, select the Data Source property for the Biblio connection and
click on the Set button. In the Add/Edit Assignment dialog box that
appears, enter the information for the INI file to be read and click
on the OK button. You must treat the Data Source property of a text
file in the same fashion. In addition to reading from INI files, the
Dynamic Properties task can also retrieve values from query results,
data files, global variables, constants, and environment variables.
You determine the source of the data by your selection in the Source
dropdown listbox on the Add/Edit Assignment dialog box.
Besides the Dynamic Properties, the latest version of DTS has new
built-in tasks displayed on the task toolbar. Among these tasks are
the File Transfer Protocol task, which you use to download files or
data from remote locations using FTP, and the Message Queue task,
which you use for sending information between packages using message
queuing. It's worth noting you can now use DTS to transfer table
information such as keys and constraints, not just for transferring
data. Take some time to get familiar with the new version of DTS.
It's still a useful tool for both simple and complex data
transformation tasks, but its new features are definitely a step in
the right direction when it comes to improving the flexibility and
ease of use of DTS.
Mary V. Hooke is a
contributing editor for Visual Basic Programmer's Journal. In
addition to writing for VBPJ, Mary's the president of
Distinctive Development Ltd., a firm that develops custom Internet
and intranet applications in the New York City area. Reach Mary by
e-mail at mhooke@distinctivedevelopment.com.