Migrating Data from SQL Server to Salesforce.com using Talend – Part 1

At Third Sector IT we’re often asked to migrate data from a client’s existing system to a new Salesforce.com org.

Sometimes this is because they are changing databases, and need all of their old data in Salesforce. Other times, they are considering adopting Salesforce and want to use the system with their own data in it. Sometimes customers have a second databases or web service, and want a daily or near-live synchronisation between the two.

If the source is something simple like an Excel spread this can often be achieved using spreadsheets, the Salesforce Data Loader and a lot of patience!

When migrating from a more complex source with multiple relational tables such as Access, MySQL, or MS SQL Server, using data loader can be time consuming and error prone. This is where Talend comes to the rescue! At Third Sector IT we love Talend and use it for virtually all of our complex data migration work both in house and for clients. Here are just a few of its many advantages for this type of work:

  • Once a Talend job is set-up you can run it as many times as you like and each run takes a matter of minutes. This allows an iterative and flexible approach to data migration projects.
  • Talend comes with built-in connectors for pretty much any data source you can think of (including SQL Server.)
  • Talend natively supports the Salesforce.com Bulk Data API.
  • Talend is open source and completely free!

To expand on the first bullet point, sometimes a ‘single’ migration will take dozens of migrations. A recent project we did involving an off-the-shelf database had over 50 tables and dirty data within it. After each migration, we would check with the client, ask them to clean the data based on errors, and add a few more tables and fields to the migration.

Jeff Douglas wrote a great blog post on the basics of migrating data from a .csv file to Salesforce. Jeff’s post really helped me and it’s a great place to start if you’re new to Talend.

My blog post today is the first in a series I’ll be writing on using Talend to migrate data to Salesforce from SQL Server. This method has a couple of advantages over using .csv files:

  • You won’t have to prepare a new .csv file every time you want to do an upload of data. Talend will pull data directly from your SQL tables and views.
  • You can automate the upload, so it runs nightly or even hourly
  • Talend can retrieve schemas for objects in your SQL database and keep them in sync with your Talend job.

When the series is finished your Talend job will look something like the one below (click to enlarge images). In this post I’ll cover the first section highlighted in red, creating a connection and retrieving the schema of a SQL Server data source.

 

 

 

 

 

Connecting to SQL Server
Create a new Talend job and expand the Metada menu in the left sidebar. Right-click Db Connections and select Create connection.

 

 

 

 

Give your connection a helpful name, fill in your SQL Server credentials and hit Check. If everything’s ok click finish to create your database connection.

 

 

 

 

 

 

Retrieving the Schema
The next step is to retrieve a schema for each of the SQL objects you want to import from. Handily Talend can do this for you. Right click your Db Connection and select Retrieve Schema.

 

 

 

 

 

 

 

Select which types of objects you want to work with (tables, views and synonyms are supported). On the next screen select the objects you want to import from. For the purposes of this demo I’m going to select a single table. If you’re working on a more complex migration you can select as many objects as you like.

 

 

 

 

 

 

Click the Retrieve Schema button to, you guessed it, retrieve the schema! Talend will pull in all the fields, keys, data types etc for the objects you selected.

Note – You may have to modify data types here. Salesforce doesn’t allow all of the data types available in SQL e.g. changing Big Integer to Integer.

 

 

 

 

 

 

 

You should now have a connection to your SQL database and a schema for each object you want to import from. To get this into your Talend job, drag it from the left sidebar onto the pallet. When you do this you’ll be asked what type of component you want to create, select tMSSQLInput. You should now have a SQL input component that looks something like this.

 

 

 

 

 

Next Steps

This input isn’t connected to anything at the moment. In the next blog post I’ll cover how to filter your data and map it onto the fields of a Salesforce.com object. A future post will also cover uploading the data to Salesforce via the Bulk API.

 

Be Sociable, Share!

4 Comments

  1. Hi,
    I am doing the same activity ( sql server to SFDC data migration ). And often ending up with wrong logic or new column addition after a complete load.

    So , i am forced to delete the existing data and reload the entire stuff again . Being a ETL Guy , i am not worried about the load time or rerun , but deleting the records from salesforce really kills my patience.

    The delete process takes 2 days to complete for 150k records , which is not at a good sign for the project during testing phase.

    So,Do you have any good idea about deleting the records from salesforce in a bulk manner or in a faster way . i tried Talend Bulk option , it is still the same.

    -Naveen

    • Hi Naveen,

      Thank you for your post, as I am sure you are aware Talend provides many features to make data migrations quick and manageable.

      In terms of your input testing I would suggest adding a tSampleRow which will allow you to subject your job to specific records you identify or a limited number of records, this way you can preview any amendments much quicker.

      You mention that it is taking 2 days to delete 150k records, this does seem like an awfully long time and my immediate thought is that this is due to the heap size of your web calls, there are two main methods I use to delete records from Salesforce, the first method is to use the external ID in the source structure i.e. MSSQL and perform a left inner join lookup based on that (providing you are storing the external ID in Salesforce (this is a must if you plan on making accurate references from the source at a later date)). The second method is to simply create a Salesforce input with a tMap and have the output direct to a tSalesforceOutputBulkExec with the concurrency mode in advanced settings set to ‘Parallel’ and action set to delete, I prefer to use tSalesforceOutputBulkExec as when you make changes in the tMap the propagation’s flow throughout the remainder of the job.

      In either of these cases we have not yet addressed the heap size to do this we need to open the settings to our Salesforce input, we then need to ensure the query mode is set to ‘Bulk Query’ and check the box for manual entry of SOQL as a default the query will be “select FirstName, LastName from Contact” depending on what method you use you want to amend the query to reflect the following “Select Id, EXTERNAL_ID__c from SpecificObject” (when using with the inner join) or alternatively “Select Id from SpecificObject”, all we need to delete records is the Id and once you have mapped them your job should run much faster.

      I hope this helps you, if you have any other queries please let me know.

      Thank you

      Brendan

    • Hello All,

      We are doing CRM migration from Siebel to Salesforce. There are quote records which need to be migrate as well. I don’t find Quote Module in the drop down for tSalesforceBulkExec component. Same for tSalesforceOutput and tSalesforceOutputBulkExec components too.
      Could some one please tell me what to do? is there any work around for this?

      Thanks,
      M

      • Hi Manoj, thank you for your comment. One of the features of Talend with regard to the module set it returns is that it will only retrieve modules that have been activated from within the destination instance of Salesforce. My first step of guidance would be to ensure that you have enabled quotes within the org this is achieved by clicking setup=>customise=>quotes=>settings=>enable quotes, once complete you need to add the related listed to the opportunity or relative opportunity record type layout pages.

        Once you have completed this step if you go back to Talend and refresh the module list you should notice that three new objects will appear; Quote, Quote line item and Quote document. It should be noted that an opportunity must first exist for the relationship between itself and a quote to exist and the quote must be created before a quote line item and a quote line item has direct reference to products therefore products should be the first phase of the migration.

        As for the tSalesforceBulkExec, you will want to select ‘custom module’ and enclose “Quote” or “QuoteLineItem” into the section for the name of the custom module then copy the relevant schema across from your list of modules.

        If you have any trouble please do not hesitate to get in touch.

        Many Thanks

Submit a Comment

Your email address will not be published. Required fields are marked *