How to read data from ElasticSearch using SSIS

Introduction

In our previous article, we see how to load data into Elastic Search using SSIS. Now let’s look at how to read data from ElasticSearch using SSIS and load response into SQL Server.

Elasticsearch is a powerful engine that allows you to store, aggregate and, most importantly, search data in a very analytical way. In this tutorial, you will learn how to load Elasticsearch data to SQL Server with SSIS (part of SQL Server) and ZappySys PowerPack. We also see how to set the maximum result window of the index using the Rest API Task.

In nutshell, this post will focus on how to Make Elasticsearch Search API call using SSIS.

 

Prerequisites

Before we perform steps listed in this article, you will need to make sure following prerequisites are met:
  1. SSIS designer installed. Sometimes it is referred as BIDS or SSDT (download it from Microsoft site).
  2. Basic knowledge of SSIS package development using Microsoft SQL Server Integration Services.
  3. Make sure ZappySys SSIS PowerPack is installed (download it).
  4. Optional (If you want to Deploy and Schedule ) - Deploy and Schedule SSIS Packages

Step-By-Step to Call Elasticsearch API call using SSIS

Read from Elasticsearch Search API call

Let’s make the Elasticsearch Search API call using the JSON source to get all the records of the index with pagination.

  1. First of All, Drag and drop Data Flow Task from SSIS Toolbox and double click it to edit.

    Dragging and dropping Data Flow Task into Control Flow

  2. From the SSIS toolbox drag and drop JSON Source on the data flow designer surface.
    Drag and Drop JSON Source Component

    Drag and Drop JSON Source Component

  3. Double click JSON Source and enter the following URL as below to make Search API Call on the index.
  4. Now check Use credentials and Select Connection section press <New ZS-HTTP Connection>.
    SSIS JSON Source - Passing Basic Credentials (Supply UserID / Password) using HTTP Connection

    SSIS JSON Source – Passing Basic Credentials (Supply UserID / Password) using HTTP Connection

  5. Once the HTTP Connection Manager window opens configure the connection to your Elasticsearch instance:
    • Set Web Url, which points to your Elasticsearch instance.
    • Set Credentials Type to Basic - UserID/Password (or other appropriate authentication methods).
    • Finally, set User Name and Password:

      Configure SSIS HTTP Connection to connect to Elasticsearch

      Configure SSIS HTTP Connection to connect to Elasticsearch

  6. Now, select method as POST and set for and size value in the body to get data using pagination. Set the content type as JSON (application/json).
  7. Furthermore, go to the pagination tab and configure pagination like the below screenshot makes sure we need to enter increment size the same as given in body size value therefor 100.
    Pagination Mode : POST data in body

    Pagination Mode: POST data in the body

  8. That’s it now select the desired filter and click on the Preview button.
  9. That’s it we are ready to load the data in the SQL Server.

Loading Elasticsearch API data into SQL Server / Other Target

Now let's look at how to load data into target like SQL Server, Oracle or Flat File. In below example we will see loading data into SQL Server database but steps may remain same for other targets which can be accessed using OLEDB Drivers (e.g. Oracle).
  1. Inside Data Flow, Drag OLEDB Destination from SSIS Toolbox
  2. Connect our Source component to OLEDB Destination
  3. Double click OLEDB Destination to configure it
  4. Select Target Connection or click NEW to create new connection
    Configure SSIS OLEDB Destination Connection - Loading data (REST / SOAP / JSON / XML /CSV) into SQL Server or other target using SSIS

    Configure SSIS OLEDB Destination Connection - Loading data (REST / SOAP / JSON / XML /CSV) into SQL Server or other target using SSIS

  5. Select Target Table or click NEW to create new table based on source columns
  6. Click on Mappings Tab to Auto map columns by name. You can change mappings as you need
    Create NEW Table SSIS OLEDB Destination

    Create NEW Table SSIS OLEDB Destination

  7. Click OK to Save OLEDB Destination Settings
  8. That's it, You are now ready to run data flow. NOTE: If you wish to debug data flow and see records when you run, add data viewer by right click on blue arrow > Click Enable Data Viewer
  9. To execute data flow, Right click anywhere inside Data Flow Surface and click Execute Task
 

Enable Deep Pagination Setting

By default index, the maximum result window size is 10,000 means you can paginate upto 10000 rows. If you ever want to get more than 10000 result then you have to perform the following steps to change default setting from 10000 to 500000 (that’s max).

We need to send PUT request to below URL:

Need to pass below body as I am setting max result window size value as 500000.

And set content-type as JSON (application/json) and click on Test Request/Response and it will make the API call and set the max window size.

Rest API Task PUT Call

Rest API Task PUT Call

That’s it we have successfully made the call to set the maximum result window size of the index.

Common Errors

Truncation related error

The most common error you may face when you run an SSIS package is truncation error. During the design time only 300 rows are scanned from a source (a file or a REST API call response) to detect datatypes but at runtime, it is likely you will retrieve far more records. So it is possible that you will get longer strings than initially expected. For detailed instructions on how to fix common metadata related errors read an article "How to handle SSIS errors (truncation, metadata issues)".

Authentication related error

Another frequent error you may get is an authentication error, which happens when you deploy/copy a package to another machine and run it there. Check the paragraph below to see why it happens and how to solve this problem.

Deployment to Production

In SSIS package sensitive data such as tokens and passwords are by default encrypted by SSIS with your Windows account which you use to create a package. So SSIS will fail to decrypt tokens/passwords when you run it from another machine using another Windows account. To circumvent this when you are creating an SSIS package which uses authentication components (e.g. an OAuth Connection Manager or an HTTP Connection Manager with credentials, etc.), consider using parameters/variables to pass tokens/passwords. In this way, you won’t face authentication related errors when a package is deployed to a production server.

Check our article on how to configure packages with sensitive data on your production or development server.

Conclusion

After all, we saw you How to Make Elasticsearch Search API call using SSIS JSON Source and load response into SQL Server. In this article, we have seen that how to make a setting call to set the maximum result window of the index. To explore many other scenarios not discussed in this article download SSIS PowerPack from here (includes 70+ Components).

References

Finally, you can use the following links for more information:

Posted in REST API, REST API Integration, SSIS JSON Source (File/REST), SSIS REST API Task and tagged , , , .