Copy data from an OData source by using Azure Data Factory

 







Hey guys, time to share something new that I have learned in the past few days. 

I am going to share my thought process on how I viewed the problem statement and arrived at the solution. 

So, let's get rolling, as part of the requirements, we were given an API endpoint and were told that we need to ingest data and show it on the power bi report. 

I began brainstorming on how do we pull data from API either to ADLS storage in the form of CSV or txt files or to Azure SQL DB in the form of a managed table. 

Disclaimer -  I am new to extracting information from API endpoints using Azure Data Factory, so had to read quite a few blogs.

I discovered that we could either use Rest API connectors or HTTP connectors to talk to the API endpoint and pull out the information needed. 

So I tried using both of the above connectors and found that it is becoming challenging to extract the information in the CSV file. Although I did get some success in fetching only the first row from the API endpoint (maybe a topic for the future). 

Then I used postman to check whats the output format of the request coming in.

It was in XML format. 

Until this point, I had no idea what OData is.

After a few hours of struggling and figuring out my next move. I looked at the API endpoint URL.

It had OData written in it. I looked at the excel workbook that was shared with us by the client. There also saw a function which was being used to call the API. 


Then my attention shifted to getting to know more about OData. What is OData?

You could go through this link to learn for yourself. 

What is OData?

Essentially, it is a protocol to pull data from REST APIs in an intuitive manner. 

I also found that OData connectors are available in Azure Data Factory out of the box. 

So now the focus was to learn how to implement it. 

I went through the below blog to see how to read data using OData connectors and outputs it in a tabular format. 

Copy data from an OData source by using Azure Data Factory or Synapse Analytics

Listing the step by step solution for you to follow easily.

1. Create a linked service, type OData 


Before specifying the URL, username and password. Let's try to understand the format of the URL. 

https://myapi.xyz.ondemand.com/sap/opu/odata/xyzz/XYZZ/Path1?$select=Column1,Colun2&$filter=ID eq'001' 

so the actual URL that should go in the linked service configuration should be 

https://myapi.xyz.ondemand.com/sap/opu/odata/xyzz/XYZZ

The Path has to be selected in the data set configuration which is our next step. 

2. Create a dataset, type OData 

3. Drag copy data activity onto the canvas and configure the source 

Select the data set and enter the query details

For example 

$select=Column1,Colun2&$filter=ID eq'001'

Try to preview the data to confirm the connection, path and query are accurate.

4. After configuring the source successfully, time to enter the details for the sink. 

We are using ADLS as the destination using CSV format. 


If you have already worked with CSV files before, you would know how to set this up. Nothing complex here. Just a side-note - Make sure that the first row as header checkbox is ticked. 

5. Go to the mappings tab and make sure it is accurate. 

6. Time to debug and see the magic. 

7. After a successful run, you will find a new file in the ADLS container/folder location having all the records which the API endpoint was supposed to return. 

That's it then. Feel free to give feedback, suggest corrections and share the scenario you are working on. 

Comments

Popular posts from this blog

Issues Integrating Azure Data Factory with GITHUB | IN spite of admin rights on repository

SQL QUERY NIGHTMARE

Visual Studio Git Error | "terminal prompts disabled"