- Loading a Table and Adding a Surrogate Primary Key. The goal of this section is to configure the Surrogate Key Generator transformation. In this sample job, the surrogate key is generated using the default settings. By default, the transformation generates key values based on the largest value in the key.
- The added key column is known as a surrogate key. The surrogate key in the target replaces the primary key that is loaded into the target from the source. The surrogate key is required because the target contains multiple instances of the primary key in the source.
- The PROC SQL query selects the current max value of the surrogate key field in the table into a macro variable (&highValue in this case). The expression above creates a unique surrogate key value during each iteration of the Table Loader read loop.
- Jun 24, 2012 A surrogate key is an auto generated value, usually integer, in the dimension table. It is made the primary key of the table and is used to join a dimension to a fact table. Among other benefits, surrogate keys allow you to maintain history in a.
- Now you need to create the FK's. Alter the table to add the FK column. Again, it must be nullable. Write a small program to set the FK column. This is a SELECT (to get the PK row based on non-surrogate keys), and an UPDATE to the referencing table. If necessary, alter the table to make the FK non-null.
- Stored Proc For Generating Surrogate Keys Repeatable Pdf
- Stored Proc For Generating Surrogate Keys Repeatable Program
- Stored Proc For Generating Surrogate Keys Repeatable In Science
- Stored Proc For Generating Surrogate Keys Repeatable Video
- Stored Proc For Generating Surrogate Keys Repeatable Test
- Stored Proc For Generating Surrogate Keys Repeatable Chart
A surrogate key (or synthetic key, entity identifier, system-generated key, database sequence number, factless key, technical key, or arbitrary unique identifier[citation needed]) in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural (or business) key which is derived from application data.[1]
Definition[edit]
There are at least two definitions of a surrogate:
Sep 11, 2012 Surrogate keys also provide uniformity and compatibility. If you are using several different database application development systems, drivers, and object-relational mapping systems it can be simpler to use an integer for surrogate keys for every table instead of natural keys to support object-relational mapping.
- Surrogate (1) – Hall, Owlett and Todd (1976)
- A surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible to the user or application.[2]
- Surrogate (2) – Wieringa and De Jonge (1991)
- A surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user or application.
The Surrogate (1) definition relates to a data model rather than a storage model and is used throughout this article. See Date (1998).
An important distinction between a surrogate and a primary key depends on whether the database is a current database or a temporal database. Since a current database stores only currently valid data, there is a one-to-one correspondence between a surrogate in the modeled world and the primary key of the database. In this case the surrogate may be used as a primary key, resulting in the term surrogate key. In a temporal database, however, there is a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is required, in addition to the surrogate, to uniquely identify each object.
Although Hall et al. (1976) say nothing about this, others[specify] have argued that a surrogate should have the following characteristics:
- the value is unique system-wide, hence never reused
- the value is system generated
- the value is not manipulable by the user or application
- the value contains no semantic meaning
- the value is not visible to the user or application
- the value is not composed of several values from different domains.
Surrogates in practice[edit]
Stored Proc For Generating Surrogate Keys Repeatable Pdf
In a current database, the surrogate key can be the primary key, generated by the database management system and not derived from any application data in the database. The only significance of the surrogate key is to act as the primary key. It is also possible that the surrogate key exists in addition to the database-generated UUID (for example, an HR number for each employee other than the UUID of each employee).
A surrogate key is frequently a sequential number (e.g. a Sybase or SQL Server 'identity column', a PostgreSQL or Informix
serial
, an Oracle or SQL ServerSEQUENCE
or a column defined with AUTO_INCREMENT
in MySQL). Some databases provide UUID/GUID as a possible data type for surrogate keys (e.g. PostgreSQL UUID
or SQL Server UNIQUEIDENTIFIER
).Having the key independent of all other columns insulates the database relationships from changes in data values or database design (making the database more agile) and guarantees uniqueness.
In a temporal database, it is necessary to distinguish between the surrogate key and the business key. Every row would have both a business key and a surrogate key. The surrogate key identifies one unique row in the database, the business key identifies one unique entity of the modeled world. One table row represents a slice of time holding all the entity's attributes for a defined timespan. Those slices depict the whole lifespan of one business entity. For example, a table EmployeeContracts may hold temporal information to keep track of contracted working hours. The business key for one contract will be identical (non-unique) in both rows however the surrogate key for each row is unique.
SurrogateKey | BusinessKey | EmployeeName | WorkingHoursPerWeek | RowValidFrom | RowValidTo |
---|---|---|---|---|---|
1 | BOS0120 | John Smith | 40 | 2000-01-01 | 2000-12-31 |
56 | P0000123 | Bob Brown | 25 | 1999-01-01 | 2011-12-31 |
234 | BOS0120 | John Smith | 35 | 2001-01-01 | 2009-12-31 |
Some database designers use surrogate keys systematically regardless of the suitability of other candidate keys, while others will use a key already present in the data, if there is one.
Some of the alternate names ('system-generated key') describe the way of generating new surrogate values rather than the nature of the surrogate concept.
Approaches to generating surrogates include:
- Universally Unique Identifiers (UUIDs)
- Globally Unique Identifiers (GUIDs)
- Object Identifiers (OIDs)
- Sybase or SQL Server identity column
IDENTITY
ORIDENTITY(n,n)
- Oracle
SEQUENCE
, orGENERATED AS IDENTITY
(starting from version 12.1)[3] - SQL Server
SEQUENCE
(starting from SQL Server 2012)[4] - PostgreSQL or IBM Informix serial
- MySQL
AUTO_INCREMENT
- SQLite
AUTOINCREMENT
- AutoNumber data type in Microsoft Access
AS IDENTITY GENERATED BY DEFAULT
in IBM DB2- Identity column (implemented in DDL) in Teradata
- Table Sequence when the sequence is calculated by a procedure and a sequence table with fields: id, sequenceName, sequenceValue and incrementValue
Advantages[edit]
Immutability[edit]
Surrogate keys do not change while the row exists. Windows 10 pro activation key 2017 generator download. This has the following advantages:
- Applications cannot lose their reference to a row in the database (since the identifier never changes).
- The primary or natural key data can always be modified, even with databases that do not support cascading updates across related foreign keys.
Requirement changes[edit]
Attributes that uniquely identify an entity might change, which might invalidate the suitability of natural keys. Consider the following example:
- An employee's network user name is chosen as a natural key. Upon merging with another company, new employees must be inserted. Some of the new network user names create conflicts because their user names were generated independently (when the companies were separate).
In these cases, generally a new attribute must be added to the natural key (for example, an original_company column).With a surrogate key, only the table that defines the surrogate key must be changed. With natural keys, all tables (and possibly other, related software) that use the natural key will have to change.
Some problem domains do not clearly identify a suitable natural key. Surrogate keys avoid choosing a natural key that might be incorrect.
Performance[edit]
Surrogate keys tend to be a compact data type, such as a four-byte integer. This allows the database to query the single key column faster than it could multiple columns. Furthermore, a non-redundant distribution of keys causes the resulting b-tree index to be completely balanced. Surrogate keys are also less expensive to join (fewer columns to compare) than compound keys.
Compatibility[edit]
While using several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate, it is much easier to use an integer or GUID surrogate keys for every table instead of natural keys in order to support database-system-agnostic operations and object-to-row mapping.
Uniformity[edit]
Stored Proc For Generating Surrogate Keys Repeatable Program
When every table has a uniform surrogate key, some tasks can be easily automated by writing the code in a table-independent way.
Validation[edit]
It is possible to design key-values that follow a well-known pattern or structure which can be automatically verified. For instance, the keys that are intended to be used in some column of some table might be designed to 'look differently from' those that are intended to be used in another column or table, thereby simplifying the detection of application errors in which the keys have been misplaced. However, this characteristic of the surrogate keys should never be used to drive any of the logic of the applications themselves, as this would violate the principles of Database normalization.
Disadvantages[edit]
Disassociation[edit]
The values of generated surrogate keys have no relationship to the real-world meaning of the data held in a row. When inspecting a row holding a foreign key reference to another table using a surrogate key, the meaning of the surrogate key's row cannot be discerned from the key itself. Every foreign key must be joined to see the related data item. If appropriate database constraints have not been set, or data imported from a legacy system where referential integrity was not employed, it is possible to have a foreign-key value that does not correspond to a primary-key value and is therefore invalid. (In this regard, C.J. Date regards the meaninglessness of surrogate keys as an advantage. [5])
To discover such errors, one must perform a query that uses a left outer join between the table with the foreign key and the table with the primary key, showing both key fields in addition to any fields required to distinguish the record; all invalid foreign-key values will have the primary-key column as NULL. The need to perform such a check is so common that Microsoft Access actually provides a 'Find Unmatched Query' wizard that generates the appropriate SQL after walking the user through a dialog. (It is, however, not too difficult to compose such queries manually.) 'Find Unmatched' queries are typically employed as part of a data cleansing process when inheriting legacy data.
Surrogate keys are unnatural for data that is exported and shared. A particular difficulty is that tables from two otherwise identical schemas (for example, a test schema and a development schema) can hold records that are equivalent in a business sense, but have different keys. This can be mitigated by NOT exporting surrogate keys, except as transient data (most obviously, in executing applications that have a 'live' connection to the database).
When surrogate keys supplant natural keys, then domain specific referential integrity will be compromised. For example, in a customer master table, the same customer may have multiple records under separate customer IDs, even though the natural key (a combination of customer name, date of birth, and E-mail address) would be unique. To prevent compromise, the natural key of the table must NOT be supplanted: it must be preserved as a unique constraint, which is implemented as a unique index on the combination of natural-key fields.
Query optimization[edit]
Relational databases assume a unique index is applied to a table's primary key. The unique index serves two purposes: (i) to enforce entity integrity, since primary key data must be unique across rows and (ii) to quickly search for rows when queried. Since surrogate keys replace a table's identifying attributes—the natural key—and since the identifying attributes are likely to be those queried, then the query optimizer is forced to perform a full table scan when fulfilling likely queries. The remedy to the full table scan is to apply indexes on the identifying attributes, or sets of them. Where such sets are themselves a candidate key, the index can be a unique index.
These additional indexes, however, will take up disk space and slow down inserts and deletes.
Normalization[edit]
Surrogate keys can result in duplicate values in any natural keys. To prevent duplication, one must preserve the role of the natural keys as unique constraints when defining the table using either SQL's CREATE TABLE statement or ALTER TABLE ..ADD CONSTRAINT statement, if the constraints are added as an afterthought.
Business process modeling[edit]
Because surrogate keys are unnatural, flaws can appear when modeling the business requirements. Business requirements, relying on the natural key, then need to be translated to the surrogate key. A strategy is to draw a clear distinction between the logical model (in which surrogate keys do not appear) and the physical implementation of that model, to ensure that the logical model is correct and reasonably well normalised, and to ensure that the physical model is a correct implementation of the logical model.
Inadvertent disclosure[edit]
Proprietary information can be leaked if sequential key generators are used. By subtracting a previously generated sequential key from a recently generated sequential key, one could learn the number of rows inserted during that time period. This could expose, for example, the number of transactions or new accounts per period. There are a few ways to overcome this problem:
- Increase the sequential number by a random amount.
- Generate a random key such as a UUID
Inadvertent assumptions[edit]
Sequentially generated surrogate keys can imply that events with a higher key value occurred after events with a lower value. This is not necessarily true, because such values do not guarantee time sequence as it is possible for inserts to fail and leave gaps which may be filled at a later time. If chronology is important then date and time must be separately recorded.
See also[edit]
References[edit]
Citations[edit]
- ^'What is a Surrogate Key? - Definition from Techopedia'. Techopedia.com. Retrieved 2020-02-21.
- ^P A V Hall, J Owlett, S J P Todd, 'Relations and Entities', Modelling in Data Base Management Systems (ed GM Nijssen),North Holland 1976.
- ^http://docs.oracle.com/database/121/SQLRF/statements_7002.htm#SQLRF01402
- ^https://msdn.microsoft.com/en-us/library/ff878091.aspx
- ^ C.J. Date. The primacy of primary keys. From 'Relational Database Writings, 1991-1994. Addison-Wesley, Reading, MA.
Sources[edit]
- This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the 'relicensing' terms of the GFDL, version 1.3 or later.
- Nijssen, G.M. (1976). Modelling in Data Base Management Systems. North-Holland Pub. Co. ISBN0-7204-0459-2.
- Engles, R.W.: (1972), A Tutorial on. CiteSeerX10.1.1.16.3195.Cite journal requires
|journal=
(help) - Date, C. J. (1998). 'Chapters 11 and 12'. Relational Database Writings 1994–1997. ISBN0201398141.
- Carter, Breck. 'Intelligent Versus Surrogate Keys'. Retrieved 2006-12-03.
- Richardson, Lee. 'Create Data Disaster: Avoid Unique Indexes – (Mistake 3 of 10)'. Archived from the original on 2008-01-30. Retrieved 2008-01-19.
- Berkus, Josh. 'Database Soup: Primary Keyvil, Part I'. Retrieved 2006-12-03.
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Surrogate_key&oldid=949211050'
By: Daniel Farina | Updated: 2019-09-12 | Comments (1) | Related: More >T-SQL
Problem
You have always heard that you should avoid cursors in your T-SQL code as a SQL Server best practice, because cursors are detrimental to performance and sometimes cause issues. But sometimes there is a need to loop through the data one row at a time, so in this tip we will look at a comparison of how to do a loop without using cursor.
Solution
We all know that SQL Server, like every relational database allows the user to perform set based operations. Also, as many database vendors do, SQL Server includes a procedural extension which is the T-SQL language. It adds constructs found in procedural languages allowing a more straightforward coding to developers. These constructs were added for a reason and sometimes this is the only approach to the task at hand.
Using a While Loop Instead of Cursors in SQL Server
If you have ever worked with cursors, you may find this title a bit confusing because after all, cursors uses while constructs to iterate between rows. But besides that, I want to show you that in some circumstances when we use a cursor to iterate over a set of rows we can change it to a while loop. In such cases, the only challenge will be to choose a proper exit condition.
Pros and Cons of Using Cursors to Iterate Through Table Rows in SQL Server
Not everything is wrong with cursors, they also have some advantages over other looping techniques.
- Cursors are updatable: When you create a cursor, you use a query to define it using the DECLARE CURSOR instruction. By using the UPDATE option in the cursor creation statement, you can update the columns within the cursor.
- You can move forward and backward in a cursor: By using the SCROLL option in the DECLARE CURSOR statement you can navigate across the cursor records in both directions with the fetch options FIRST, LAST, PRIOR, NEXT, RELATIVE and ABSOLUTE. Keep in mind that the SCROLL option is incompatible with the FORWARD_ONLY and FAST_FORWARD options.
- Cursors can be passed to stored procedures: If you use the GLOBAL option to create a cursor, it can be used in any stored procedure or batch executed in the same connection. This allows you to use cursors on nested stored procedures.
- Cursors have a lot of different options: With cursors you have the chance to use different options that affects how they will behave in regards to locking.
- Cursors don’t need a condition: By using cursors, you are handling a set of rows as a record. This allows you to move across the cursor without the need of having a Boolean condition. For example, you can create a cursor with the name of the databases residing on a SQL Server instance without the need of a surrogate key to work as a test condition like on a WHILE loop.
There are also some negative aspects that you should be aware when using cursors instead of other looping options.
- If you use global cursors in your code you are taking the risk of facing errors due to a cursor being closed by some stored procedure nested in your code.
- Usually cursors have less performance than an equivalent loop using a WHILE loop or CTE.
Pros and Cons of Using a While Loop to Iterate Through Table Rows in SQL Server
There are also benefits to use a WHILE loop compared to a cursor.
- While loops are faster than cursors.
- While loops use less locks than cursors.
- Less usage of Tempdb: While loops don’t create a copy of data in tempdb as a cursor does. Remember that cursors, depending on the options you use to create them can cause the temp tables to be created.
The next list details the negative aspects of WHILE loops.
- Moving forward or backward is complex: To move forward or backward in a loop you need to dynamically change the iteration condition inside the loop. This requires extra care; otherwise you can end up in an infinite loop.
- The risk of an infinite loop: Compared to a cursor, you don’t have a fixed set of data to loop (i.e. the data returned by the SELECT statement in the cursor declaration), instead when using a WHILE loop you have to define a boundary with an expression that is evaluated to true or false.
Building the Test Environment for Cursors and Loops
To test this, I will use a table with an identity column (CursorTestID), a varchar column (Filler) and a bigint column (RunningTotal).
Stored Proc For Generating Surrogate Keys Repeatable In Science
The idea is to loop trough the table rows ordered by the CursorTestID column and update the RunningTotal column with the sum of the CursorTestID column value and the value of the RunningTotal column of the previous row.
But before starting, first we need to generate some test rows with the next script.
On the script above you will notice that I only used a single insert statement and I took advantage of the batch separator (the GO 500000 command) as a shortcut to execute this insert statement 500000 times. You can read more about this method to repeat batch execution on this tip: Executing a T-SQL batch multiple times using GO.
Example of a Basic Cursor to Loop through Table Rows in SQL Server
Stored Proc For Generating Surrogate Keys Repeatable Video
Let’s create a cursor to fill the RunningTotal column. Notice on the next script that I declared the cursor with the option FAST_FORWARD. This is done in order to enhance the performance of the cursor because according to Microsoft the FAST_FORWARD argument “Specifies a FORWARD_ONLY, READ_ONLY cursor with performance optimizations enabled”. In other words, we are instructing SQL Server to use a read only cursor that can only move forward and be scrolled from the first to the last row.
Stored Proc For Generating Surrogate Keys Repeatable Test
The next image is a screen capture showing the execution of the script above. As you can see, it took three minutes and five seconds to update the 500,000 rows of our test table.
Example of a Basic While Loop to Cycle through Table Rows in SQL Server
Now I will rewrite the previous script avoiding the use of a cursor. You will notice that it contains a While loop which is almost identical to the one in the cursor script. This is, as I previously said, because even when working with cursors you need to use an iterative control structure.
The next image is a screen capture of the execution of the script above. It took less time to run the while loop than the cursor.
Another SQL Server Cursor Example
Let’s take for example the cursor in the tip Standardize SQL Server data with text lookup and replace function. A word of advice, in order to run this code, you should follow the steps in the tip to create the test environment.
Stored Proc For Generating Surrogate Keys Repeatable Chart
And here is the cursor code:
If we dissect this code, we can see that there is one cursor that goes through the table products which I copied below.
SQL Server Cursor Example Converted to a While Loop
In order to replace this cursor with a WHILE LOOP, we need to create a temporary table to implement a tally table. For all of you who don’t know what a tally table is, we can define it as a table that contains a pair of columns consisting of a key and its value. In our particular case we will use a sequential integer key starting from 1, so we can use it as an iterator. This key will be associated to a ProductID from the Products table.
At first, since the Products table has the ProductID key defined as an identity you may be tempted to bypass this step, but you have to consider that in a real case a row could have been deleted, therefore you won’t be able to use the identity column as an iterator. Additionally a row can be deleted while we are running our code and it could lead to execution errors. To avoid this we are going to add a TRY-CATCH block. I will go into this further on.
Before starting the WHILE loop, we need to set its start and stop condition. For this matter I added two new integer variables named @Iterator and @MaxIterator. The @MaxIterator variable is used to keep the number of items in the #TallyTable table and we set its value only once before starting the loop. The @Iterator variable is initialized to 1, as we defined it as the starting number on the sequence and we are going to increment its value at each iteration.
Next Steps
- Are you new to cursors and need some practice? In the next tip you will find an explanation, an easy to understand cursor example and more recommended readings: SQL Server Cursor Example.
- If you want to convert the existing cursors in your code to set based queries, take a look at this chapter SQL Server Convert Cursor to Set Based from the SQL Server Database Design Best Practices Tutorial.
- Do you need another example on using a While loop? Take a look at this tip that will show you how to split DML statements in batches: Optimize Large SQL Server Insert, Update and Delete Processes by Using Batches.
- In case you don’t know how to use TRY..CATCH exception handling, take a look at this tip: SQL Server Try and Catch Exception Handling.
- Stay tuned to the SQL Server T-SQL Tips category to get more coding ideas.
Last Updated: 2019-09-12
About the author
Daniel Farina was born in Buenos Aires, Argentina. Self-educated, since childhood he showed a passion for learning.
View all my tips
View all my tips