Marian Paul
Marian is a senior developer, expert in data migrations and back-end solutions. He has Oracle SQL and Microsoft SQL expert certifications.
Nobody wants to leave valuable customer data behind. Unfortunately, though, the hardest part of data migration to a complex CRM system, such as Salesforce, is the handling of legacy data.
In this article, Toptal Software Engineer Marian Paul provides 10 tips for successful legacy data migration to Salesforce.
Nobody wants to leave valuable customer data behind. Unfortunately, though, the hardest part of data migration to a complex CRM system, such as Salesforce, is the handling of legacy data.
In this article, Toptal Software Engineer Marian Paul provides 10 tips for successful legacy data migration to Salesforce.
Marian is a senior developer, expert in data migrations and back-end solutions. He has Oracle SQL and Microsoft SQL expert certifications.
Migrating legacy data is hard.
Many organizations have old and complex on-premise business CRM systems. Today, there are plenty of cloud SaaS alternatives, which come with many benefits; pay as you go and to pay only for what you use. Therefore, they decide to move to the new systems.
Nobody wants to leave valuable data about customers in the old system and start with the empty new system, so we need to migrate this data. Unfortunately, data migration is not an easy task, as around 50 percent of deployment effort is consumed by data migration activities. According to Gartner, Salesforce is the leader of cloud CRM solutions. Therefore, data migration is a major topic for Salesforce deployment.
So, how can we ensure a successful transition of legacy data into a shiny new system and ensure we will preserve all of its history? In this article, I provide 10 tips for successful data migration. The first five tips apply to any data migration, regardless of the technology used.
In the software deployment checklist, data migration is not an “export and import” item handled by a clever “push one button” data migration tool that has predefined mapping for target systems.
Data migration is a complex activity, deserving a separate project, plan, approach, budget, and team. An entity level scope and plan must be created at the project’s beginning, ensuring no surprises, such as “Oh, we forgot to load those clients’ visit reports, who will do that?” two weeks before the deadline.
The data migration approach defines whether we will load the data in one go (also known as the big bang), or whether we will load small batches every week.
This is not an easy decision, though. The approach must be agreed upon and communicated to all business and technical stakeholders so that everybody is aware of when and what data will appear in the new system. This applies to system outages too.
Do not underestimate the complexity of the data migration. Many time-consuming tasks accompany this process, which may be invisible at the project’s beginning.
For example, loading specific data sets for training purposes with a bunch of realistic data, but with sensitive items obfuscated, so that training activities do not generate email notifications to clients.
The basic factor for estimation is the number of fields to be transferred from a source system to a target system.
Some amount of time is needed in different stages of the project for every field, including understanding the field, mapping the source field to the target field, configuring or building transformations, performing tests, measuring data quality for the field, and so on.
Using clever tools, such as Jitterbit, Informatica Cloud Data Wizard, Starfish ETL, Midas, and the like, can reduce this time, especially in the build phase.
In particular, understanding the source data – the most crucial task in any migration project – cannot be automated by tools, but requires analysts to take time going through the list of fields one by one.
The simplest estimate of the overall effort is one man-day for every field transferred from the legacy system.
An exception is data replication between the same source and target schemas without further transformation – sometimes known as 1:1 migration – where we can base the estimate on the number of tables to copy.
A detailed estimate is an art of its own.
Do not overestimate the quality of source data, even if no data quality issues are reported from the legacy systems.
New systems have new rules, which may be violated with legacy data. Here’s a simple example. Contact email can be mandatory in the new system, but a 20-year-old legacy system may have a different point of view.
There can be mines hidden in historical data that have not been touched for a long time but could activate when transferring to the new system. For example, old data using European currencies that do not exist anymore need to be converted to Euros, otherwise, currencies must be added to the new system.
Data quality significantly influences effort, and the simple rule is: The further we go in history, the bigger mess we will discover. Thus, it is vital to decide early on how much history we want to transfer into the new system.
Business people are the only ones who truly understand the data and who can therefore decide what data can be thrown away and what data to keep.
It is important to have somebody from the business team involved during the mapping exercise, and for future backtracking, it is useful to record mapping decisions and the reasons for them.
Since a picture is worth more than a thousand words, load a test batch into the new system, and let the business team play with it.
Even if data migration mapping is reviewed and approved by the business team, surprises can appear once the data shows up in the new system’s UI.
“Oh, now I see, we have to change it a bit,” becomes a common phrase.
Failing to engage subject matter experts, who are usually very busy people, is the most common cause of problems after a new system goes live.
Data migration is often viewed as a one-time activity, and developers tend to end up with solutions full of manual actions hoping to execute them only once. But there are many reasons to avoid such an approach.
Thus, even if migration is one-time activity by nature, having manual actions can significantly slow down your operations.
Next we will cover five tips for a successful Salesforce migration. Keep in mind, these tips are likely applicable to other cloud solutions as well.
Performance is one of, if not the biggest, tradeoff when moving from an on-premise to a cloud solution – Salesforce not excluded.
On-premise systems usually allow for direct data load into an underlying database, and with good hardware, we can easily reach millions of records per hour.
But, not in the cloud. In the cloud, we are heavily limited by several factors.
As a result, load performance can be thousands of accounts per hour.
It can be less, or it can be more, depending on things, such as the number of fields, validations and triggers. But it is several grades slower than a direct database load.
Performance degradation, which is dependent on the volume of the data in Salesforce, must also be considered.
It is caused by indexes in the underlying RDBMS (Oracle) used for checking foreign keys, unique fields, and evaluation of duplication rules. The basic formula is approximately 50 percent slowdown for every grade of 10, caused by O(logN) the time complexity portion in sort and B-tree operations.
Moreover, Salesforce has many resource usage limits.
One of them is the Bulk API limit set to 5,000 batches in 24-hour rolling windows, with the maximum of 10,000 records in each batch.
So, the theoretical maximum is 50 million records loaded in 24 hours.
In a real project, the maximum is much lower due to limited batch size when using, for example, custom triggers.
This has a strong impact on the data migration approach.
Even for medium-sized datasets (from 100,000 to 1 million accounts), the big bang approach is out of the question, so we must split data into smaller migration waves.
This, of course, impacts the entire deployment process and increases the migration complexity because we will be adding data increments into a system already populated by previous migrations and data entered by users.
We must also consider this existing data in the migration transformations and validations.
Further, lengthy loads can mean we cannot perform migrations during a system outage.
If all users are located in one country, we can leverage an eight-hour outage during the night.
But for a company, such as Coca-Cola, with operations all over the world, that is not possible. Once we have U.S., Japan, and Europe in the system, we span all time zones, so Saturday is the only outage option that doesn’t affect users.
And that may not be enough, so, we must load while online, when users are working with the system.
Application components, such as validations and triggers, should be able to handle data migration activities. Hard disablement of validations at the time of the migration load is not an option if the system must be online. Instead, we have to implement different logic in validations for changes performed by a data migration user.
Another trick is using field Legacy ID or Migration ID in every migrated object. There are two reasons for this. The first is obvious: To keep the ID from the old system for backtracking; after the data is in the new system, people may still want to search their accounts using the old IDs, found in places as emails, documents, and bug-tracking systems. Bad habit? Maybe. But users will thank you if you preserve their old IDs. The second reason is technical and comes from the fact Salesforce does not accept explicitly provided IDs for new records (unlike Microsoft Dynamics) but generates them during the load. The problem arises when we want to load child objects because we have to assign them IDs of the parent objects. Since we will know those IDs only after loading, this is a futile exercise.
Let’s use Accounts and their Contacts, for example:
We can do this more simply by loading Accounts with their Legacy IDs stored in a special external field. This field can be used as a parent reference, so when loading Contacts, we simply use the Account Legacy ID as a pointer to the parent Account:
The nice thing here is that we have separated a generation and a loading phase, which allows for better parallelism, decrease outage time, and so on.
Like any system, Salesforce has plenty of tricky parts of which we should be aware in order to avoid unpleasant surprises during data migration. Here are handful of examples:
Created Date
, Created By ID
, Last Modified Date
, Last Modified By ID
, can be explicitly written only after granting a new system permission “Set Audit Fields upon Record Creation.”The list goes on, but the bottom line is: Get familiar with the system, and learn what it can do and what it cannot do before you make assumptions. Do not assume standard behavior, especially for core objects. Always research and test.
It is very tempting to use Salesforce as a platform for building a data migration solution, especially for Salesforce developers. It is the same technology for the data migration solution as for the Salesforce application customization, the same GUI, the same Apex programming language, the same infrastructure. Salesforce has objects which can act as tables, and a kind of SQL language, Salesforce Object Query Language (SOQL). However, please do not use it; it would be a fundamental architectural flaw.
Salesforce is an excellent SaaS application with a lot of nice features, such as advanced collaboration and rich customization, but mass processing of data is not one of them. The three most significant reasons are:
Instead, build a data migration solution in a separate instance (it could be a cloud or on-premise) using an RDBMS or ETL platform. Connect it with source systems and target the Salesforce environments you want, move the data you need into your staging area and process it there. This will allow you to:
At the project beginning, we usually grab a list of Salesforce fields and start the mapping exercise. During the project, it often happens that new fields are added by the application development team into Salesforce, or that some field properties are changed. We can ask the application team to notify the data migration team about every data model change, but doesn’t always work. To be safe, we need to have all data model changes under supervision.
A common way to do this is to download, on a regular basis, migration-relevant metadata from Salesforce into some metadata repository. Once we have this, we can not only detect changes in the data model, but we can also compare data models of two Salesforce environments.
What metadata to download:
creatable
or updatable
.How to download metadata from Salesforce? Well, there is no standard metadata method, but there are multiple options:
Setup / API
menu and download strongly typed Enterprise WSDL, which describe all the objects and fields in Salesforce (but not picklist values nor validations).describeSObjects
web service, directly or by using Java or C# wrapper (consult Salesforce API). This way, you get what you need, and this the recommended way to export the metadata.Cloud solutions, such as Salesforce, are ready instantly. If you are happy with the built-in functionalities, just log in and use it. However, Salesforce, like any other cloud CRM solution, brings specific problems to data migration topics to be aware of, in particular, regarding the performance and resources limits.
Moving legacy data into the new system is always a journey, sometimes a journey to history hidden in data from past years. In this article, based on a dozen migration projects, I’ve presented ten tips on how to migrate legacy data and successfully avoid the most pitfalls.
The key is to understand what the data reveals. So, before you start the data migration, make sure your Salesforce development team is well-prepared for the potential problems your data may hold.
Located in Pezinok, Bratislava Region, Slovakia
Member since June 18, 2020
Marian is a senior developer, expert in data migrations and back-end solutions. He has Oracle SQL and Microsoft SQL expert certifications.
World-class articles, delivered weekly.
World-class articles, delivered weekly.
Join the Toptal® community.