EXPERT  SQL Service


Identify Rows Uniquely

Identity columns can help you manage your database, but use them with caution.

by Mary V. Hooke

  The ability to identify records uniquely in a table is critical in most databases. Sometimes your data's attributes make it easy for you, providing a unique identifier for each record automatically. For example, an Employees table might include a Social Security number.

What you need:
SQL Server 7.0
Sometimes your data's attributes aren't quite so accommodating. In such cases, you might use SQL Server's identity column, a field that contains a system-generated value that identifies rows in a table uniquely. In this column, I'll explain how to create and work with identity columns in SQL Server, as well as provide a rundown on the pros and cons of using this feature.

Identity columns in SQL Server work much as the AutoNumber datatype works in Microsoft Access. SQL Server automatically populates a column you mark as an identity column with a unique sequential value whenever a new record is inserted into the table. Developers typically use the identity column as the primary key for a table, an approach that can serve as an easy, low-effort way of ensuring the integrity of your data.

You need to make sure a field meets certain criteria before you decide to use it as an identity column. For example, you can use only columns that contain numeric values as identity columns. Specifically, a column must be one of five datatypes: int, decimal, smallint, tinyint, or numeric. You also need to make sure you declare the identity column as a field that doesn't allow Null values or have default definitions.

Marking a column as an identity column using SQL Server 7.0's Enterprise Manager requires several steps. First, open the Tables node of the treeview for your database, then right-click on the table and select the Design Table menu. In the table designer, place a checkbox in the identity column next to the desired field. Identity columns have two important attributes: the identity seed value and the identity increment value. The seed value is the starting number SQL Server uses for the first row inserted into the table. The increment is the value SQL Server adds to the previous record's identity value to arrive at the next identity value to be used. Both the identity seed and identity increment properties are set to 1 by default; you can modify these settings by changing the values in the appropriate column of the table designer.

SQL Server 7.0 introduces the ability to add, change, or remove the identity column through the Enterprise Manager after the table has been created. However, it's interesting to note that—technically—you still can't add, remove, or modify identity columns on existing tables. Enterprise Manager creates a new, temporary table with the new identity column properties when you use the Enterprise Manager in SQL Server 7.0 to modify the identity column of an existing table. Next, it imports the data from your existing table into the new table. Finally, it drops the existing table and renames the new, temporary table to the name of the original table.

Note that you must exercise caution if you modify a column in an existing table for use as an identity column. SQL Server won't modify the existing data if the column already contains data. It's possible your existing data might not contain unique values, in which case the values of your identity column can only be guaranteed to be unique for new records added to the table.

Work With Identity Columns
Transact SQL (T-SQL) includes several commands, keywords, and built-in database functions that can help you when you work with a table that has an identity column (see Table 1). For example, executing the DBCC CheckIdent command in the Query Analyzer not only lets you return the current identity value of a given table, it helps you reset the identity column value if it somehow becomes corrupted and contains an incorrect value. Another useful tool is the @@identity system function, which returns the last identity value used.

This proves especially useful in situations where an application needs immediate access to a newly inserted record's value. For example, you might create a stored procedure that inserts a record into the Employees table of the Northwind database (see Listing 1). You declare the parameter containing the employee id (@EmployeeId) as an output parameter; the @EmployeeId parameter receives a value of NULL when the calling application invokes this stored procedure. The @EmployeeId parameter is populated with the value retrieved by @@identity after a new record is created. The newly saved record's identity value is now available to the calling application. Microsoft has reported a bug with @@identity in SQL Server 7.0 that seems to occur only after an insert into a table prompts a trigger to insert data into another table that doesn't have an identity column. SQL Service Pack 1 fixed this bug; see Knowledge Base Article ID Q199688 for more information about this bug.

You should also be aware of the Identity_Insert property when you work with tables that have identity columns. You cannot specify an identity column's value manually when inserting new records if a table's Identity_Insert property is set to OFF. In this case, you must rely on SQL Server to generate the value of the identity column for you automatically. However, you can specify the value of an identity column manually when inserting a new record if a table's Identity_Insert property is set to ON. This is especially useful when you import data into a table with an identity column, and you want to preserve the original values of your data because the column is used as a foreign key in another table.

If you use Data Transformation Services (DTS) to transfer your data to a table with an identity column, DTS sets the Identity_Insert property to ON by default to preserve the column's original values. You can overwrite this default by changing the Enable Identity Insert property in the Column Mappings and Data Transformations window. Use this T-SQL command in the Query Analyzer to modify the Identity_Insert property outside DTS, where TableName specifies the name of the table you wish to modify the Identity_Insert property for:

Set Identity_Insert TableName ON/OFF

It's important to keep in mind when you modify the Identity_Insert property that its value is valid only for a given connection. The insert fails if you use the Query Analyzer to set the Identity_Insert property for a given table to ON, then open a new query window and attempt to execute an insert into the table by specifying the value for the identity column manually. You must perform table inserts on the same connection on which you modify the Identity_Insert property. Note that only one table in a database can have the Identity_Insert property set to ON at a given time on a given connection. You receive an error message if you attempt to set a table's Identity_Insert property to ON when another table in the same database already has the Identity_Insert property set to ON on the same connection.

When to Use an Identity Column
The identity column can be a helpful tool, but there are times when it isn't the right choice. For example, using an identity column probably isn't your best bet if you have a column that always contains sequential numbers, without any gaps. This is because identity columns don't reuse numbers. If you have a table that has an identity column from which you must delete records frequently, you end up with gaps in your identity values.

You should also be aware that identity columns produce numbers that are unique for a given table only. This isn't a problem in most cases, but using an identity column proves insufficient if your identity column's values must be unique throughout the entire database or throughout other databases on a network. Databases set up to use replication are likely to fall into this category. You need to look into using a UniqueIdentifier datatype if you intend to use a globally unique identifier. You can find more information about the UniqueIdentifier datatype in SQL Server Books Online.

I haven't heard about any problems when using the identity column in SQL Server 7.0, but I have observed several problems when using the identity column in earlier versions of SQL Server. A known bug exists in SQL Server 6.5 where the value of the identity column can get out of sync if the database server terminates abnormally and shuts down without being allowed to run a Checkpoint (see Resources). Microsoft claims that running the DBCC CheckIdent command corrects this problem if it occurs in your database, but I've seen multiple cases firsthand where identity columns in SQL Server 6.5 databases got out of sync for reasons that couldn't be identified. Worse, these problems couldn't always be fixed by running DBCC CheckIdent. Although these problems were seemingly random and could not be explained or reproduced by Microsoft, I know I'm not the only one who has encountered these kinds of issues (see the reference for "Avoid Losing Your Identity" in Resources). I recommend using extreme caution when deciding whether to use identity columns in your database. They can prove an effective tool in database management and design, but only if you're cautious of their shortcomings.

The alternative to using an identity column is to generate a new, unique value manually when inserting new records into a table. This isn't hard to do. For example, you might create a stored procedure that inserts a new record into the Region table of the Northwind database (see Listing 2). You retrieve the next available unique value for the RegionID field using a SELECT Max(RegionID) + 1 query. Note that the SELECT statement performs the retrieval with the TABLOCKX option. This places an exclusive lock on the table, ensuring that additional records cannot be inserted into the table until the end of the transaction.


Mary Hooke is the President of Distinctive Development, a software consulting firm located in New York City.  Distinctive Development specializes in the design and development of interactive web applications. In addition, Mary is a regular author for Visual Basic Programmer's Journal. Contact Mary at mhooke@DistinctiveDevelopment.com.

 
Resources
• "Avoid Losing Your Identity" by William Storage, a sidebar from "Design Your Database Efficiently," by Dianne Siebold [VBPJ September 1999]

• Knowledge Base Article ID Q199688: "FIX: @@IDENTITY Returns NULL If Trigger Inserts into Table Without IDENTITY Column"

• Knowledge Base Article ID Q198572: "BUG: IDENTITY Out of Sync If Server Is Shut Down w/o Checkpoint"
Get the Code
  Get the code for this article here.