Data Science and Databases
8 minute read

Oracle to SQL Server and SQL Server to Oracle Migration Guide - Pt. 2

Leonid has 20+ years as a multi-platform developer and DBA, with special expertise in Oracle and SQL Server cross-platform migrations.

The first part of this series discussed the differences between Oracle Database and Microsoft SQL Server in their implementation of transactions, with a focus on pitfalls one may encounter during Oracle to SQL Server migrations and vice-versa. This next installment will cover a number of commonly used SQL syntax elements that have no match or have a completely different meaning or usage across the Oracle–SQL Server divide.

Sequences in Oracle and Identity Columns in SQL Server

There’s a long-standing divide in the database community between two camps: the patriots of natural keys and the proponents of artificial (or “surrogate”) keys.

I myself defend natural keys but often find myself creating surrogates for one reason or another. But leaving the substance of this debate aside, let’s look at the standard mechanisms for generating artificial keys: Oracle sequences and SQL Server identity columns.

An Oracle sequence is a first-class database-level object. In contrast, an SQL Server identity column is a column type, not an object.

When an Oracle sequence is used to generate a table key—usually a primary key—it’s guaranteed to increment, and therefore be unique. But it’s not guaranteed to be consecutive. In fact, even in well-designed implementations, it’s most likely to have some gaps. So no Oracle implementation should ever rely on sequence-generated values to be consecutive.

Also, a sequence is managed through an Oracle database’s data dictionary, so it would be too resource-consuming (and cumbersome) to create a dedicated sequence to support each and every surrogate key. A single sequence object can support multiple or even all surrogate keys.

On the other hand, when multiple processes need to access NEXTVAL (the next incremental value) from a sequence, the sequence will become a critical, single-access resource. It will effectively make all processes accessing it strictly sequential, turning any multi-threaded (single or multi-server) implementation into a single-threaded process, with long wait times and high memory/low CPU use.

Such implementations do actually happen. The solution for this issue is to define the sequence object in question with reasonable cache value—meaning that a defined range of values (be it 100 or 100 thousand) are selected into a cache for a calling process, recorded in the data dictionary as used, and become available for this particular process without the need to access data dictionary every time the NEXTVAL is called.

But this is exactly why gaps will be created as not all cached values are likely to be used. It also means that across multiple processes in parallel sessions, some recorded sequence values can be chronologically inverted. This inversion can’t happen within a single process unless a sequence value was reset or backshifted. But this last scenario amounts to looking for trouble: It should be unnecessary, and if implemented incorrectly, it can result in generating duplicate values.

So, the only correct way to use Oracle sequences is for the generation of surrogate keys: keys that are unique but are not assumed to hold any other reliably usable information.

Identity Columns in SQL Server

What about SQL Server? While sequences with very similar functionality and implementation to their Oracle counterpart have been introduced in SQL Server 2012, they are not a first-rate, go-to technique. Like other feature additions, they make sense for conversion from Oracle, but when implementing surrogate keys from scratch on SQL Server, IDENTITY is a much better option.

IDENTITY is a “child” object of a table. It accesses no resources outside of a table and is guaranteed to be sequential unless deliberately manipulated. And it’s specifically designed for this very task, rather than for semantic compatibility with Oracle.

With Oracle having implemented IDENTITY functionality in version 12.1, it’s natural to wonder how it did without it before, why it implemented it now, and why SQL Server needed it from the very beginning (from its Sybase SQL Server origins).

The reason is that Oracle always had an identity key feature: the ROWID pseudocolumn, having a datatype of ROWID or UROWID. This value is non-numeric (ROWID and UROWID are proprietary Oracle datatypes) and uniquely identifies a data record.

Unlike SQL Server’s IDENTITY, Oracle’s ROWID can’t be easily manipulated (it can be queried, but not inserted or modified), and it’s created in the background for every row in every Oracle table. Also, the most efficient way to access any data row in an Oracle database is by its ROWID, so it’s used as a performance optimization technique. Lastly, it defines the default query output sorting order, since it effectively indexes the low-level storage of row data.

If Oracle’s ROWID is so important, how did SQL Server survive all those years and releases without it? By using IDENTITY columns as primary (surrogate) keys.

It’s important to note the difference in index structure implementation between Oracle and SQL Server.

In SQL Server, the first index—the primary key, more often than not—is clustered; this means that most commonly, data in the primary data file is ordered by this key. On the Oracle side, the equivalent of a clustered index is an index-organized table. This is an optional construct in Oracle that is used sporadically, only as needed—for read-only lookup tables, for example.

All design patterns in Oracle that are based on the use of ROWID (such as data deduplication) should be implemented based on IDENTITY columns when migrating to SQL Server.

While migrating from using IDENTITY on SQL Server to using IDENTITY on Oracle could produce functionally correct code, it’s not optimal, because on the Oracle side, ROWID will perform much more efficiently.

The same is true when doing a simple SQL syntax conversion to move Oracle sequences into SQL Server: The code will run, but using IDENTITY is the much-preferred option in terms of both code simplicity and performance.

Filtered Indexes in Microsoft SQL Server

Years ago, Microsoft SQL Server 2008 introduced a number of significant features that turned it into a truly first-rate enterprise database. One that has saved my day more than once has been filtered indexes.

A filtered index is a non-clustered index (i.e., one that exists as its own data file) that has a WHERE clause. It means that the index file contains only data records relevant to the clause. To take full advantage of filtered indexes, it should also have an INCLUDE clause that lists all the columns that are needed when returning a dataset. When your query is optimized to use a specific filtered index that includes all the needed data points, the database engine only needs to access a (small) index file without even looking at the primary table data file.

This was particularly valuable for me a few years back when working with a terabyte-sized table. The client in question frequently needed to access only a fraction of a percent of records active at any given time. The initial implementation of this access (triggered by end-user UI actions) was not merely painfully slow—it was just plain unusable. When I added a filtered index with the needed INCLUDEs, it became a sub-millisecond search. The time I spent on this optimization task was only an hour.

Sure, filtered indexes have some limitations. They can’t include LOB columns, there are limits on what conditions the WHERE clauses the indexes themselves can include, and they add to a database’s storage footprint. But provided a use case fits within these parameters, the storage tradeoffs are usually quite minor compared to the significant performance boost filtered indexes can provide.

What About Filtered Indexes in Oracle Database?

Later I found myself in a large team at a Fortune 500 company as a developer/DBA on an SQL Server–to-Oracle migration project. The code surrounding the source database—SQL Server 2008—was poorly implemented, with meager performance that made conversion imperative: The daily back-end synchronization job was running longer than 23 hours. It had no filtered indexes, but in the new—Oracle 11g—system, I saw multiple cases where filtered indexes would be very beneficial. But Oracle 11g has no filtered indexes!

Nor are filtered indexes implemented in the latest Oracle 18c.

But our task as technical professionals is to make the best use of what we have. So I implemented the equivalent of filtered indexes in my Oracle 11g system (and the same technique I used later in 12c). The idea is based on how Oracle handles NULLs, and can be used in any version of Oracle.

Oracle doesn’t treat a NULL value the same way as regular data. A NULL in Oracle is nothing—it doesn’t exist. As a result, if you define your indexed column as NULLABLE and you’re searching by non-NULL values, your index data file will only contain records of interest. As an Oracle index definition has no INCLUDE clause, you’ll need to create a composite index with all columns that need to be included in a resultset. (This technique has some overhead compared with SQL Server’s INCLUDE clause, but it’s reasonably insignificant.)

Such a workaround implementation does add a limitation: The leading index column must allow NULLs and, therefore, can’t be the table’s primary key. However, it can be a derived or calculated column created specifically to support this performance optimization method. In some sense, the index’s leading column is logically binary: non-NULL values for data included in your search, and NULL for any data that should be “invisible.”

The other possible option in migrating SQL Server filtered index logic to Oracle is to implement an index (or the table in full) as partitioned. In this case, only the relevant index partition will be accessed by the database engine—provided queries are implemented correctly by using the exact partitioning condition in their WHERE clauses.

This will work well, even at scale, on relatively static data, but can place a high maintenance load on a DBA team if applied to frequently changing data. An example would be when optimizing access to today’s data in a time-centric application: The DBA team will need to redefine partitions on a daily basis. While this redefinition can be scripted in a nightly maintenance job, it does make your system more complex and introduces new potential systemic failure points.

So, one needs to be very specific and careful whenever SQL Server filtered index logic needs to be migrated to Oracle.

How To Handle Conversions

With an Oracle to SQL Server migration, look for opportunities for optimization using filtered indexes. You won’t see filtered indexes in Oracle, but you may see indexes that include NULL values. Don’t copy them as-is: It may be the best place where you can get a performance boost and design improvement in your conversion.

For SQL Server to Oracle migrations, if you see filtered indexes, look for how to avoid a performance bottleneck in your corresponding Oracle code. See how you can redesign data flow to compensate for the missing performance boost that filtered indexes had given in the source implementation.

SQL Server to Oracle / Oracle to SQL Server Migration Challenges Demystified

For migration projects between Oracle and SQL Server in either direction, it’s important to have a deeper understanding of the mechanics involved. When the current releases of the respective databases (Oracle 18c and Microsoft SQL Server 2017*) contain lexical equivalents of each other’s functionality—e.g., in sequences and identity—it might seem like an easy win. But copying a good design on one RDBMS directly across to the other can result in an unnecessarily complicated and poorly performing code.

In the next and final part of this series, I’ll be covering read consistency and the use of migration tools. Stay tuned!

* SQL Server 2019 (or “15.x”) hasn’t been out long enough for widespread enterprise adoption.

Understanding the basics

What is an Oracle sequence cycle?

An Oracle sequence cycle allows a sequence to restart when the whole range of values is used; this implies that the sequence can create duplicate values overall. NOCYCLE disallows duplication but gives an error if all values have been used.

How does an identity column work in SQL Server?

An SQL identity column in SQL Server has a value that's automatically incremented during a new row insert. An SQL identity column value can’t be edited.

What is an identity column in a database?

An SQL identity column is a column with automatically incremented big-integer ("bigint") values.

What is a filtered index in SQL Server?

A filtered index in SQL Server is an index with a WHERE clause.

What is SQL data conversion?

Data conversion is an operation that changes the data type of a value. Example: conversion of string/character data into a number or a date.

What is an SQL Server key?

A key is a value that uniquely identifies a row in an SQL Server table.