EXPERT  Database Design


Maximize Performance With Indexes

Get the most out of your database by mastering the art of implementing indexes properly.

by Mary V. Hooke

  Database design and configuration become increasingly important as a database's size and number of concurrent users grows. An app that begins small might grow significantly over time, and its database must grow as well. Sometimes the app can grow so large that you need to migrate it from an existing database such as Microsoft Access to a more robust database engine such as Microsoft SQL Server. In other cases, you might need to reanalyze your database design and configuration to resolve emerging performance problems that might not have been apparent when your database had less data and fewer concurrent users.

What you need:
Microsoft SQL Servera 7.0
Using indexes properly can help resolve these performance problems. Indexes are database objects that optimize searching and sorting data in tables. They aren't required, but the task of processing data becomes much slower without them. Deciding when it is appropriate to use an index and determining what kind of index to use can be confusing. In this article, I'll provide an overview of indexes, suggest some general guidelines to help you decide when and where to use them, summarize the various indexing types and options, and explain how to create indexes using Microsoft SQL Server 7.0.

Index Properly
You create indexes on columns within database tables. The first step in indexing a database properly is understanding which table columns need to be indexed (see Table 1). This is the most difficult step, because the columns that should be indexed vary on a case-by-case basis. In other words, if you have two databases identical in structure (containing the same tables) but being used for two different purposes, the indexing needs for the two databases would probably be different. You solve this problem by determining the user's processing requirements. To do this, you must understand the characteristics of the data being stored in each table and how the database's user will utilize this data in your application.

You can categorize user interaction with database information into three primary groups: inserting or updating data, deleting data, and selecting data. You should examine each table in your database with these three categories in mind. When thinking about the first two categories, ask yourself which column or columns the user will refer to most often in the WHERE clause of UPDATE and DELETE SQL statements. In this example, the user references the au_id column in the Pubs database to indicate the row to update:

Update authors set contract = 1 where 
au_id = '123-34-4565'

The columns that appear most often within the WHERE clause of update or delete actions are primary candidates for indexes. As I mentioned, an index allows the database engine to locate table rows quickly, without necessarily having to search through the table itself.

When thinking about the third category of user interaction with the database—selecting data from a table—you must consider several issues. First, try to determine what columns a user is likely to retrieve in a sorted order. For example, imagine you have an application that runs off the Pubs database, and your user's business requirements dictate records in the authors table display in alphabetical order by the author's last name. You might see this query:

Select * from authors order by au_lname

In this example, the au_lname column proves a good candidate for an index because it's referenced by the ORDER BY clause of a SELECT query (see Table 1).

Now determine which columns will be regularly used in table joins. Often, these columns will be either primary or foreign keys. When you designate a column or columns as the primary key, SQL Server creates a unique index on that column or columns automatically to enforce the referential integrity of the primary key. Foreign key columns are generally good candidates for additional indexes.

Next, determine which columns the user might search to find a range of values. To use an example from the scenario used earlier, the ytd_sales column would be a good index candidate if your app's users often run a query like this:

Select * from titles where ytd_sales between 4000 and 6000

An index on the ytd_sales and au_lname columns allows the system to locate the needed records and present them in the proper order more easily.

Finally, consider whether your user will use the LIKE operator to access columns. If your user's business rules dictate that the LIKE operator often access a column using a comparison string that starts with specific characters, an index on the column will increase the query's performance:

Select * from titles where title like 'abc%'

However, if your user's business rules dictate that the LIKE operator often access a column using a comparison string starting with a wild-card pattern, an index on the column will do little to improve the query's performance:

Select * from titles where title like '%abc'

When you analyze your database tables for index candidates, you might find two or more columns often used and accessed the same way. For example, the user might need to search and sort on the author's first and last names in the Pubs database's Authors table. In this scenario, you can create separate indexes on both the au_lname and au_fname columns. However, it might be a better idea to create a composite index on both of the columns together. In SQL Server 7.0, a composite index can contain up to 16 columns or 900 bytes. In the proper situation, you should create a composite index rather than two or more separate indexes because it reduces the total number of indexes on a table. You use less disk space, have fewer indexes to maintain, and reduce the impact on the table's data modifications caused by all indexes.

Determining which columns should not contain indexes proves just as important as deciding which ones should. Indexes take up disk space and can slow down modifications to table data because they must be updated when you insert or modify rows. If an index isn't improving performance, the cost of maintaining the index could be counterproductive. Columns that can contain only three or fewer unique values—such as Boolean information or an individual's gender—are bad candidates for indexes because at least 33 percent of the records might share the same value for this column. An index can't do much to help order or distinguish between records when a column contains so many duplicate values.

Large columns are also bad candidates for indexes: Such indexes take up disk space and force the system to spend precious time scanning the large values. SQL Server doesn't let you create an index on a Text datatype column, but any column approximately 20-25 bytes or larger could be too large to make an effective index.

 
Figure 1 Clustered Index. Click here.

Types of Indexes
Once you determine which columns should be indexed, choose the type of index to use on each column. SQL Server supports two primary types of indexes: clustered and nonclustered (see the sidebar, "Climbing B-Trees"). A clustered index sorts the table's actual data rows based on the value of the clustered key (see Figure 1). A nonclustered index features a structure separate from the table's data rows (see Figure 2).

When implemented properly, clustered indexes usually have the greatest impact on database performance because they result in the physical reordering of the table's data. But they are limited because there can only be one per table. A column that contains a high level of duplicate values and is frequently accessed in a sequence or used in joins proves a good candidate for a clustered index; records that contain similar or duplicate clustered key values are stored close together physically, so the database knows exactly where to look to find the needed values. Foreign key columns often meet these criteria. Another good candidate for a clustered index is a column frequently updated or often searched for by a range of values, such as a column queried using these operators: BETWEEN, >, <, >=, or <=.

 
Figure 2 Nonclustered Index. Click here.

Because a nonclustered index's structure is separate from the table's data rows, you can create multiple nonclustered indexes on a table if necessary to maximize performance. Nonclustered indexes are structured with slight differences depending on whether the index is built on a table where the data pages are clustered (a table that contains a clustered index), or a table where the data pages are stored in a heap (a table that doesn't contain a clustered index). If you decide that one of the columns in a table will be in a clustered index, then any other columns that need an index within the same table should probably contain nonclustered indexes.

After you decide whether an index will be clustered or nonclustered, you must determine one more important characteristic. Both clustered and nonclustered indexes can be either unique or nonunique. When you create a unique index on a column, no two records in the table can contain the same value within that column. Unique indexes enforce the integrity of a table's data. When you create a composite index as a unique index, any one column in the index can contain duplicate values, but together, all columns in the index must be unique.

Create and Maintain Indexes
After you decide where to put your indexes and what kind to use, you're ready to create them. Make sure you know what indexes the table already has before you create new ones. You might have to delete or modify existing indexes in order to establish your desired index configuration. Execute this command in the SQL Server Query Analyzer to retrieve a list of the names, properties, and columns for all the table's indexes specified by the table_name parameter:

sp_helpindex table_name

 
Figure 3 Index Management Made Easy. Click here.

Using the Manage Indexes window proves the easiest way to create a new index (see Figure 3). To invoke the window, select the Tables node in Enterprise Manager's treeview pane for the selected database. Right-click on a table and select the Manage Index option from the All Tasks menu. Click on the New button and enter the information for the new index. You can also use the CREATE INDEX SQL statement to create indexes; this topic in the Transact-SQL Help file (from SQL Server Query Analyzer's Help menu) describes your options. Familiarize yourself with all the options before creating or modifying any indexes.

You should see an immediate improvement in performance after creating your indexes. If not, you might want to start over and reanalyze your business rules. SQL Server 7.0 provides other tools to help you do this. The Index Tuning Wizard, which you access from Enterprise Manager's Tools menu, helps you find the optimal locations for indexes. You can use the Query Analyzer, also accessible through Enterprise Manager's Tools menu, to analyze specific queries and determine whether the use of table indexes has been successful. Use the Index Analyzer, accessible through the Query Analyzer, to recommend specific changes to your index configuration, optimizing your query.

Understanding how and when to use indexes is a complicated process that usually takes some trial and error to master. Taking the time familiarize yourself with all the indexing options, however, will definitely prove worthwhile. When implemented properly, indexes can make the difference between a successful and an unsuccessful database application.


Mary Hooke is the President of Distinctive Development, a software consulting firm located in New York City.  Distinctive Development specializes in the design and development of interactive web applications. In addition, Mary is a regular author for Visual Basic Programmer's Journal. Contact Mary at mhooke@DistinctiveDevelopment.com.