Introduction
In this article you will how to Read Twitter data in SSIS using SSIS JSON Source and SSIS REST API Web Service Task. You will also learn about latest OAuth 2.0 Protocol to simplify REST API access.
Twitter REST API Authentication
In order to fetch any data from twitter using OAuth REST API calls you have to obtain two Keys (Consumer Key [Like UserID] and Consumer Secret [Like Password]). There are mainly two methods to read data from Twitter. Using Application-user authentication and Application Only authentication. Use correct method depending what type of information you want to pull from twitter. Most common method is Application-user authentication . If you want to read more about other methods then click here
Method 1 – Read Twitter data in SSIS using Application-user authentication
In this method you will need twitter user account to connect Twitter API. Once first time authorization is done you don’t have to re-authenticate. Certain type of API calls only allowed by this method (such as POST new tweet using API). For any API call in Twitter very first step is create OAuth App (i.e. Register new App in Twitter developer portal).
Using Default OAuth App Created by ZappySys
To make your life easy ZappySys provides default App for certain OAuth providers (e.g. Google, Twitter). If you decide to create your own app for whatever reason then check next section on how to register twitter OAuth Application. See below screenshot how to use Twitter OAuth connection using Default App. Once you click Generate Token it will prompt you to login using Twitter account and then you can grant permission.
Step-By-Step
- Download and Install SSIS PowerPack FREE Trial from this link
- Create new SSIS Project and add new package.
- Open Package. Go to control flow. Drag and drop Data Flow task from SSIS Toolbox
- Go to Data flow designer.Drag and drop ZS JSON Source Component and double on it to configure it.
- Enter any twitter API URL you want to call such as below
1https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=zappysys&count=4 - Now check use credentials option
- Select new ZS-OAuth connection from dropdown
- When new connection dialogbox pops up select Twitter from providers dropdown. Select Default App option for now.
- Click Generate Token. It may ask you to login using your Twitter account credentials. If prompted click Approve.
- If twitter OAuth grant approved then you will see Access Token and Access Token Secret both populated
- Click Test Connection and if it works then click OK to close connection
- On JSON Source you can now click Preview to see data from Twitter
- You can optionally specify Filter expression [Click Select Filter button] to select result from specific JSON Node. For example if you want to select all Hashtags used inside each Tweet then select Filter like $.entities.hashtags[*]
- Click OK to close the UI
- Drag ZS Trash Destination from Toolbox (Or you can use OLEDB Destination if you want to load inside SQL Server)
- Connect JSON Source to ZS Trash Destination
- Execute SSIS package
Using Custom OAuth App Created by you
For some reason if you dont want to use Default Twitter App then you can register Custom App. It requires few extra steps listed in next section but it wont take more than few minutes. In Custom App option you have to specify ClientId and Client Secret (in above screenshot). Once you click Generate Token it will prompt you to login using Twitter account and then you can grant permission.
Register/Create OAuth Application for twitter API Access
Now lets look at how to register OAuth Application for Twitter API access.
Goto https://apps.twitter.com/app/new and create new app by providing necessary information. Don’t get confused by calling it App, its just way in twitter to create multiple API Access Keys so you can grant different level of access to different users. Once app is created you will be taken to page where your Consumer Key and Secret Key will be listed.
Obtaining Consumer Key and Consumer Secret from Twitter
Once app is created you can go to Keys and Access Tokens tab. Here you will find Consumer Key and Consumer Secret.
Consumer Key can be Public but Consumer Secret must not be shared (think it like a password).
Method 2 – Read Twitter data in SSIS – Application-only authentication
For some reason if you don’t want to use Twitter user account to access data (e.g. Giving access to consultant so he can access you company Twitter Account via API). In this scenario you have to use Application Only method. In this method you don’t authorize application using Login form.
Here is our basic flow to access twitter data.
- Get Bearer Access Token by calling https://api.twitter.com/oauth2/token service (Use POST method and Pass BASE64 encoded ConsumerKey and Consumer Secret)
- Call any twitter service by passing Bearer AccessToken we retrieved in previous step.NOTE: Any access to twitter service is over HTTPS so automatically your tokens passed along request is encrypted before sending over wire unless someone know how to hack SSL 🙂
Application Only Authentication using REST API task – Get Access Token
Once we create SSIS Package – First step to access any twitter data will be get Access Token. As you can see from below screenshot we have called https://api.twitter.com/oauth2/token service URL with POST method. Notice how we have supplied POST data and 2 headers. Authorization header contains BASE64 encoded value of YourConsumerKey:YourConsumerSecret
Fetch data from Twitter using JSON Source – Deformalize nested JSON
Once we have Authentication Token we are ready pull twitter data using JSON Source. Check below screenshot how we have supplied Token in Authorization Header. JSON Source can make your JSON look like normal table (It also de-normalize nested JSON into flat dataset. If you want to extract subset of JSON then simply specify JSON Path expression e.g. $.data.users[*]
Load Twitter JSON data to SQL Server
You can easily connect your SSIS JSON Source to OLEDB Destination if you want to load Twitter Data to some RDBMS such as SQL Server, MySQL.
Handling paging of large REST API result set with twitter data – looping/cursoring
Most of REST API limit total data sent in single response. So if you wish to get all records then you have to loop through multiple results. Twitter provides looping mechanism using CursorClick here to read more
In our case SSIS JSON Source Supports Paging very well so we are covered. To loop through multiple result sets of twitter data simply configure following 3 properties. See below screenshot.
Word of caution about too many requests
Twitter does not allow you request too much data too quickly so be careful how many requests you make :). Check their official page on twitter Rate Limit for REST API. You can add delay after each request if you doing pagination. Go to Throttling tab of JSON/XML Source. You can change setting there based on API restriction. For example if API allows only 30 requests per minute then adding 2 seconds delay will make sure you wont exceed 30 requests in 1 minute.
To learn more about rate limit of Twitter API check this table
Handling Twitter API Date Format (Parse Twitter Date/Time)
Twitter returns date in the following format e.g. Fri May 03 15:22:09 +0000 2013 to parse this to correct date/time datatype you can use Date/Time Handling Tab of JSON/XML Source.
Change Custom Date Format as below and preview Twitter data.
Below is sample SSIS package
Twitter Demo SSIS 2012
Conclusion
So you have now seen how easy it is to access twitter data with OAuth 2.0 using SSIS JSON Source and SSIS REST API Web Service Task. We have also seen how to loop through large resultset using inbuilt Paging support of JSON Source.