Getting Started with Azure Data Catalog REST API

What is Azure Data Catalog? Simply put, Azure Data Catalog is a SaaS application hosted within Azure’s Cloud Stack. With Azure Data Catalog, enterprise customers can store information about their enterprise data source assets. There’s the concept of catalogs, assets and annotations and for more information, go to: https://azure.microsoft.com/en-us/services/data-catalog/

We use Azure Data Catalog to organize, discover and understand all of our backend data sources. With that in mind, I needed to find a solution where we can automate data source creation (Databases, Tables, etc…) to Azure Data Catalog and don’t want to spend the time creating/registering assets manually. It is a tedious process to manually specially if you have to deal with lots of databases and stored procedures J.

Microsoft exposes an API for you to use and work with Azure Data Catalog. There are plenty of documentation out there but it really took me a while to get everything setup and working correctly. At least from searching on existing assets and registering a new one.

Most of Microsoft’s documentation around the Azure Data Catalog API is located here:

https://docs.microsoft.com/en-us/rest/api/datacatalog/

This guide will walk you through the steps on registering a catalog asset with additional information to properly authenticate against Azure AD and a modified schema version to include annotations when registering or updating a catalog asset. Note that sample below uses Native Client Authentication to Azure Active Directory.

Part 1

The first section talks about creating an Azure Active Directory client app registration. We will use this to authenticate either using OAuth2 or Federation

Note: As of writing this blog post, the screenshots below have been taken from the recent UI on azure portal.

Register a client app in Azure Active Directory. When you register a client app in Azure Active Directory, you give your app access to the Data Catalog APIs. To register a client app:

1. Go to http://portal.azure.com

2. Click on “Azure Active Directory

ADC1

3. Click on “App Registrations

ADC2

4. Click on “Add” and provide a “Name”, “Application Type” and “Redirect UI”. NOTE: The redirect URI is a unique identifier for the client to send the access token back. This doesn’t have to be a valid URI however; you need to keep track of this. You will need it later to authenticate against the catalog api.

ADC3

GRANT the app client access to the Azure Catalog API. To do this:

1. Click on “Settings” on the newly created app registration.

2. Click on “Required Permissions” then “Add”

3. On “Select an API”, pick “Microsoft Azure Data Catalog”

4. Take the defaults

5. IMPORTANT: Make sure you click on “GRANT PERMISSIONS” once you select “Microsoft Azure Data Catalog” as seen below. If you don’t do this, then your native client will not be able to authenticate properly on the Azure Data Catalog API.

ADC4

ADC5

Part 2

The second section talks about authenticating against Azure REST API. Particularly, authenticating against Azure Data Catalog API. The article below will guide you through steps on calling the Azure Data Catalog API via ADAL libraries for authentication. The information presented below from Microsoft’s site is accurate as of this writing.

Authenticate a client app

https://docs.microsoft.com/en-us/rest/api/datacatalog/authenticate-a-client-app

Couple of notes from the steps mentioned above:

· “Register a Client App”. You just did this in the preceding steps. Make sure to write down the Client ID (or APP ID of the newly created app in azure active directory)

· Don’t use HTTPWebRequest rather use HTTPClient to authenticate. HTTPClient has far more features that HTTPWebRequest. That said, refer to this Microsoft article for examples on HTTPClient.

Calling a Web API From a .NET Client (C#)

https://docs.microsoft.com/en-us/aspnet/web-api/overview/advanced/calling-a-web-api-from-a-net-client

Part 3

Changes to the request body when registering Data Assets. This is the part where I’ve spend most of my research modifying the schema for registering or updating assets. In this case, adding annotations during the registration process. Microsoft provides basic schema definitions when registering assets but doesn’t provide enough details on other schema values such as annotation experts, tags and description. Here’s a modified version of the schema when registering an asset to include annotations.

{
  "properties": {
    "fromSourceSystem": false,
    "name": "table name",
    "dataSource": {
      "sourceType": "Db2",
      "objectType": "Table"
    },
    "dsl": {
      "protocol": "db2",
      "authentication": "windows",
      "address": {
        "server": "ServerName",
        "database": "DatabaseName",
        "object": "NameOfTable",
        "schema": "dbo"
      }
    },
    "lastRegisteredBy": {
      "upn": "smtp@address.com",
      "firstName": "Don",
      "lastName": "Tan"
    },
    "containerId": "containers/<SomeGuid>"
  },
  "annotations": {
    "schema": {
      "properties": {
        "fromSourceSystem": true,
        "columns": [
          {
            "name": "identity",
            "isNullable": false,
            "type": "Int32",
            "maxLength": 0,
            "precision": 0
          },
          {
            "name": "Other Column",
            "isNullable": false,
            "type": "String",
            "maxLength": 0,
            "precision": 0
          },
          {
            "name": "short_desc",
            "isNullable": false,
            "type": "String",
            "maxLength": 0,
            "precision": 0
          }
        ]
      }
    },
    //Add Other Annotation Details
    "experts": [
      {
        "properties": {
          "expert": {
            "upn": "smtp@address.com",
            "objectId": "<SomeGuid>"
          },
          "key": "<SomeGuid>",
          "fromSourceSystem": false
        }
      }
    ],
    "descriptions": [
      {
        "properties": {
          "key": "<SomeGuid>",
          "fromSourceSystem": false,
          "description": "Some Descrption"
        }
      }
    ],
    "tags": [
      {
        "properties": {
          "tag": "Dtan",
          "key": "<SomeGuid>",
          "fromSourceSystem": false
        }
      }
    ]
  }
}

Part 4:

Putting it all together: Here’s a complete sample on how to invoke the Azure Data Catalog using HTTPClient in C#.

// The ResourceURI is used by the application to uniquely identify itself to Azure AD.
// The ClientId is used by the application to uniquely identify itself to Azure AD.
// The AAD Instance is the instance of Azure, for example public Azure or Azure China.
// The Authority is the sign-in URL (either the tenant or OAuth2 provider)
// The RedirectUri gives AAD more details about the specific application that it will authenticate.
// NOTE: Make sure that the ClientID has sufficient permissions against the resourceURI. In this case, Azure Data Catalog
//See article: https://docs.microsoft.com/en-us/rest/api/datacatalog/Register-a-client-app?redirectedfrom=MSDN#client

var ClientId = ConfigurationManager.AppSettings["ClientId"];
var ResourceUri = ConfigurationManager.AppSettings["ResourceUri"];
var RedirectUri = new Uri(ConfigurationManager.AppSettings["RedirectUri"]);
var Tenant = ConfigurationManager.AppSettings["Tenant"];
var AadInstance = ConfigurationManager.AppSettings["AADInstance"];
//OAuth2 provider
//private static readonly string Authority = String.Format(CultureInfo.InvariantCulture, "https://login.windows.net/common/oauth2/authorize");
//Tenant Authority
var Authority = String.Format(CultureInfo.InvariantCulture, AadInstance, Tenant);
var authContext = new AuthenticationContext(Authority);
var authResult =
    authContext.AcquireTokenAsync(ResourceUri, ClientId, RedirectUri,
        new PlatformParameters(PromptBehavior.RefreshSession)).Result;

using (var httpClient = new HttpClient())
{
    var requestbody = "{\"properties\":{\"fromSourceSystem\":false,\"name\":\"air_allowed\",\"dataSource\":{\"sourceType\":\"Db2\",\"objectType\":\"Table\"},\"dsl\":{\"protocol\":\"db2\",\"authentication\":\"windows\",\"address\":{\"server\":\"YourServerName\",\"database\":\"YourDatabase\",\"object\":\"YourTable\",\"schema\":\"dbo\"}},\"lastRegisteredBy\":{\"upn\":\"smtp@address.com \",\"firstName\":\"Don\",\"lastName\":\"Tan\"},\"containerId\":\"containers/42070252-e318-4a0a-8c73-a33c0dc8fd65\"},\"annotations\":{\"schema\":{\"properties\":{\"fromSourceSystem\":true,\"columns\":[{\"name\":\"Column1\",\"isNullable\":false,\"type\":\"String\",\"maxLength\":0,\"precision\":0},{\"name\":\"Column2\",\"isNullable\":false,\"type\":\"String\",\"maxLength\":0,\"precision\":0},{\"name\":\"Column3\",\"isNullable\":false,\"type\":\"String\",\"maxLength\":0,\"precision\":0}]}},\"experts\":[{\"properties\":{\"expert\":{\"upn\":\"smtp@address.com\",\"objectId\":\"fb7d1a8a-4ae6-4ee2-aaaa-9de5b4c598df\"},\"key\":\"52c4543b-ee75-42d7-95e7-3a01437fee58\",\"fromSourceSystem\":false}}],\"descriptions\":[{\"properties\":{\"key\":\"791bab95-428a-4941-b633-7d2d0cd9c75e\",\"fromSourceSystem\":false,\"description\":\"SomeDescription\"}}],\"tags\":[{\"properties\":{\"tag\":\"Dtan\",\"key\":\"a2a3f272-14a3-4a03-b85d-65af33022dc4\",\"fromSourceSystem\":false}}]}}";
    var url = "https://api.azuredatacatalog.com/catalogs/<yourcatalog> /views/tables?api-version=2016-03-30";
    httpClient.DefaultRequestHeaders.Add("Authorization", authResult.CreateAuthorizationHeader());
    httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
    var stringContent = new StringContent(requestbody);
    stringContent.Headers.ContentType = new MediaTypeHeaderValue("application/json");
    var response = httpClient.PostAsync(url, stringContent).Result;
}

ADC6