Friday, 30 March 2012

Alternatives To SQL Server Cursors


Introduction

Anyone with more than a little exposure to SQL Server will soon come to the conclusion that the use of cursors is best avoided. SQL Server is a Relational Database Management System, and as such is very proficient at processing sets of data. The use of a cursor effectively repeats an operation once for each row being processed, thereby consuming far more server resources than would be necessitated by a set-based operation. In addition, since SQL cursors are processed much more slowly than set-based operations, any locks generated by the use of said cursor are held longer than would otherwise be necessary. In this way, the use of a cursor can affect not only the performance of a given batch of work, but also the performance of any other batches which need to make use of the same data resources.

In order to effectively avoid using a SQL cursor, one must know why a cursor is being used in the first place. The list below details the most common reasons why people use cursors, and some possible solutions for each problem.


Problem: The Developer or Designer is new to SQL Server
Problem: Need to update a table with incrementing column values for records already in the table
Problem: Need to use a custom identity function to perform an insert
Problem: Need to compare a row to adjacent rows in a table


Problem: The Developer or Designer is New to SQL Server
By far, this is the most common reason for cursor usage. If a programmer is already well versed in an event-driven programming language such as Visual Basic, they will possess a great wealth of experience in dealing with data items one at a time. As such, their natural inclination when moving into SQL is to do the same. Unfortunately, what is appropriate in one language may be (and in this case is) entirely inappropriate in another.

Resolution
The primary weapon against cursor usage in this scenario is to develop a strong understanding of the basics of Transact-SQL. There are countless courses, books, and websites dedicated to the topic. In addition, you can review the examples below, which present set-based solutions to problems which may otherwise necessitate a cursor.
If you’re still struggling with finding a set-based solution, post a question in the MSDN Transact-SQL Forum

Problem: Need to update a table with incrementing column values for records already in the table
This problem is most often encountered when importing data from an external data source. The external data source may not have a key field, for instance, and for one of various reasons, an identity column cannot be used while importing. 

Resolution
A number of methods may be employed to resolve this issue. The most efficient and straightforward method, available in SQL 2005 and greater, is to use theROW_NUMBER() function. 

Using the AdventureWorks database:

SELECT LoginID, ROW_NUMBER() OVER (ORDER BY LoginID ASC) AS RowKey
FROM  HumanResources.EmployeeLogins
WITH RowKeys (LoginID, RowKey) AS 
  (SELECT LoginID, 
    ROW_NUMBER() OVER (ORDER BY LoginID ASC) AS RowKey
  FROM HumanResources.EmployeeLogins)
UPDATE HumanResources.EmployeeLogins
SET  RowKey = rk.RowKey
FROM  HumanResources.EmployeeLogins el
JOIN  RowKeys rk ON el.LoginID = rk.LoginID


ROW_NUMBER() can also be used to number rows based upon logical groupings, or partitions in data, although this is less applicable to cursor replacement. For instance, the test data contains logins from multiple domains, and you may want to number the rows in the table based upon their position within a grouping of rows from the same domain.

--Use ROW_NUMBER in conjunction with PARTITION to create row numbers grouped by domain
SELECT LoginID, 
  ROW_NUMBER() OVER 
   (PARTITION BY LEFT(LoginID, CHARINDEX('\', LoginID) - 1) 
   ORDER BY LoginID ASC) AS RowKey
FROM  HumanResources.EmployeeLogins
Also worth noting is that the sort order of the OVER clause does not have to match the sort order of the outermost query. 
In the event that the numbering has to start at a number other than zero, that number can be added to ROW_NUMBER() in order to obtain the desired row numbers. 
For instance:

SELECT LoginID, 
  100 + ROW_NUMBER() OVER 
   (PARTITION BY LEFT(LoginID, CHARINDEX('\', LoginID) - 1) 
   ORDER BY LoginID ASC) AS RowKey
FROM  HumanResources.EmployeeLogins
The same function can be performed, albeit far less efficiently, in SQL 2000. The following solution uses an inline subquery in order to COUNT the number of rows on or before a given LoginID:

SELECT el1.LoginID,
  (SELECT COUNT(LoginID)
  FROM HumanResources.EmployeeLogins el2
  WHERE el2.LoginID <= el1.LoginID) AS RowKey
FROM HumanResources.EmployeeLogins el1
This solution is subject to error if duplicate values are present in the columns being used for comparison. If there were two identical LoginIDs, they would both be granted the same RowKey. When dealing with duplicate fields such as this, a local temporary table can be used in order to ensure unique row numbering:

CREATE TABLE #EmployeeLoginsTemp
 (RowKey int IDENTITY(1,1) NOT NULL,
 LoginID nvarchar(256))
 
INSERT #EmployeeLoginsTemp (LoginID)
SELECT LoginID FROM HumanResources.EmployeeLogins
 
SELECT * FROM #EmployeeLoginsTemp
Problem: Need to use a custom identity function to perform an insert
Many databases employ custom identity generation functions. There are VERY FEW good reasons for using custom identity functions in SQL Server, given the presence of IDENTITY columns and the UniqueIdentifier data type. That is not to say that you will never encounter such a situation - just that all possible alternatives should be vetted prior to deciding upon a custom identity solution.
A custom identity function commonly resembles the following:

CREATE TABLE Identities (TableName sysname PRIMARY KEY CLUSTERED,
      NextKey int NOT NULL)
 
CREATE PROC GetNextKey( @TableName sysname,
      @NextKey int OUTPUT)
AS
BEGIN
 --Fetch the NextID, if there is a row for the passed-in TableName in the Identities.
 SELECT @NextKey = NextKey
 FROM Identities
 WHERE TableName = @TableName
 
 --If there wasn't a row, @NextID will be NULL.
 IF @NextKey IS NULL
 BEGIN
  INSERT Identities (TableName, NextKey)
  SELECT @TableName, 1
 
  SELECT @NextKey = NextKey
  FROM Identities
  WHERE TableName = @TableName
 END
 
 --Advance the NextID value in Identities
 UPDATE Identities
 SET  NextKey = NextKey + 1
 WHERE TableName = @TableName
END

Resolution
The biggest problem with custom identity functions is that they serialize inserts. While the following solution (for SQL 2000 and greater) does not use a cursor, it does use a WHILE loop to call the GetNextKey proc once for each row in HumanResources.EmployeeLogins:

WHILE EXISTS (SELECT 1 FROM HumanResources.EmployeeLogins WHERE RowKey IS NULL)
BEGIN
 DECLARE @NextKey int
 EXEC dbo.GetNextKey 'HumanResources.EmployeeLogins', @NextKey OUTPUT
 
 UPDATE HumanResources.EmployeeLogins
 SET  RowKey = @NextKey
 WHERE LoginID = (SELECT TOP 1 LoginID FROM HumanResources.EmployeeLogins WHERE RowKey IS NULL)
END


Problem: Need to compare a row to adjacent rows in a table
A typical example of this type of problem is “I need to know by how much the latest price increase changed the price of each item”. The programmer often resorts to a cursor because they cannot understand how to compare each row with the previous row for the same item, where the previous row’s price was not equal to the current row’s price.

Resolution
In this situation, an inline query can be very useful. The following solution works for SQL 2000 and higher, and uses a different AdventureWorks table for illustration. In AdventureWorks, the Production.ProductListPriceHistory table stores pricing history for each item. The current row is identified by a NULL EndDate, so we must compare the current row to the most recent row for the same item.

SELECT plph1.ProductID,  plph1.ListPrice - 
  COALESCE((SELECT  TOP 1 ListPrice
    FROM  Production.ProductListPriceHistory plph2
    WHERE  EndDate IS NOT NULL
    AND   plph2.ProductID = plph1.ProductID
    ORDER BY EndDate DESC), 
    plph1.ListPrice) AS LastPriceChangeAmount
FROM Production.ProductListPriceHistory plph1
WHERE plph1.EndDate IS NULL




No comments:

Post a Comment