The “How I” Files #3 – Retrieving and Deserializing JSON Data from API

Hey, everyone, it’s time for another installment of the “How I” files.  In this file, we are looking at retrieving and deserializing JSON data from a web service API.  This post will be c# heavy so where needed I will try to offer as clear of explanations as I can, but bear in mind I am no c# expert.  I just make things work, and then explain why I think they work 🙂

High-Level Overview

In this file, our goal is to create a class library that connects to an API, authenticates, retrieves JSON formatted data, and deserializes to output for use in a SSIS package.  In this particular solution, I created a separate DLL for the class library which will require me to register it in the global assembly cache on the ETL server.  If your environment doesn’t allow for this, you can still use some of the code snippets here to work with JSON data.

Our order of operations will be to do the following tasks:  Create a web request, attach authentication headers to it, retrieve the serialized JSON data, and deserialize it into an object.  I use model-view-controller (MVC) architecture to organize my code, minus the views because I am not presenting the data to a user interface.

Create a Web Request and Retrieve the Data

In order to create a web request, I created a class to represent my vendor’s API system.  This class contains one member to store my API key for authenticating requests and is set when the class is instantiated.  There are three functions in the class.  One deals with formatting a DateTime value to a string representation of the value in the format that the API requires.  The second one creates the web request and attaches the authentication headers via the API key.  The last one downloads the serialized data using the web request created by the previous function and returning the serialized data to the calling object.  The last two functions are listed below:

public HttpWebRequest CreateRequest(string url)
{
        // Create a web request to pass to the API
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

        // Configure the user agent
        request.UserAgent = "SSIS Package";
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

        // Create the authentication header and attach to request
        string encoded = System.Convert.ToBase64String(Encoding.UTF8.GetBytes(String.Format("{0}:{1}", this.apiKey, "xxx")));
        request.Headers.Add("Authorization", "Basic " + encoded);

        return request;
}

Key Points:

  • The URL of the API is passed as a string variable.  The API I use in this example is a REST API where calling specific endpoints runs specific queries.
  • The User Agent property is required by my API so I set it with request.UserAgent = “SSIS Package”;  The actual value is not critical.
  • Connections are required to be encrypted so the security protocol is set to TLS 1.2
  • Authentication headers are encoded as UTF8 and the API requires the API key to be passed as the username with any non-null password
public string RetrieveSerializedData(HttpWebRequest request)
 {
   // This method passes the request to the API and returns serialized JSON data
   string encodedData;

   try
   {
      HttpWebResponse response = (HttpWebResponse)request.GetResponse();

      using (Stream stream = response.GetResponseStream())
      {
         StreamReader reader = new StreamReader(stream, Encoding.UTF8);
         encodedData = reader.ReadToEnd();
      }
   }
   catch
   {
      throw;
   }

   return encodedData;
 }

Key Points:

  • Using the request generated earlier, we get the response back from the API.  If we don’t get an HTTP code in the 200 range, a WebException is thrown.
  • Data in a web response is a stream, not a string, so we have to read it to a string variable before passing it back to the calling code
  • The try … catch block simply catches any exceptions and throws them back to the calling object to be dealt with by that object

Deserializing Data

At this point, we have a string variable holding our serialized data.  In order to be able to output this to SSIS Script Source component, we need to deserialize it.  The approach that I use is to use the .NET built-in objects to deserialize the data instead of a third party object.  I prefer this approach because then I am not dependent on yet another entity to make my solution work.  If Microsoft removes the code that I am using to deserialize data from .NET and I am relying on a third party object I have to wait for that object to be updated before I can update my solution.  No good.  Using native .NET objects puts a bit more burden on me but also allows me to fix any issues caused by future updates myself.

public List<Tickets> DeserializeTickets(string serializedData)
{
   try
   {
      DataContractJsonSerializer reader = new DataContractJsonSerializer(typeof(RootObjectTickets));
      MemoryStream data = new MemoryStream(Encoding.UTF8.GetBytes(serializedData));
      RootObjectTickets ticketMeta = (RootObjectTickets)reader.ReadObject(data);
      return ticketMeta.tickets;
   }
   catch
   {
      throw;
   }

}

Key Points:

  • Deserializing data requires a model of the object you are deserializing, in this case, it is the Tickets class.
  • We use data contracts to map our .NET members in our model class to the field names in the JSON data
  • Once we deserialize the object we return a List of that object type back to the calling object

The Data Contract – Be Sure to Read the Fine Print

The data contract is simply a class that models the object we are serializing or deserializing.  We must include System.Runtime.Serialization to access the [DataContract] and [DataMember] thingies.  I told you my knowledge of c# wasn’t all that sharp.

[DataContract]
public class Tickets
{
// This class serves as the data model for the Ticket entity

[DataMember]
public int id { get; set; }
 
[DataMember(IsRequired = false)]
public Person assignedTo { get; set; }

[DataMember]
public Person customer { get; set; }

[DataMember]
public string hasAttachments { get; set; }

[DataMember]
public int inboxId { get; set; }

[DataMember]
public bool isRead { get; set; }

[DataMember]
public int mergedToId { get; set; }

[DataMember]
public int numActiveTasks { get; set; }

[DataMember]
public int numCompletedTasks { get; set; }

[DataMember]
public int numThreads { get; set; }

[DataMember]
public string preview { get; set; }

[DataMember]
public string priority { get; set; }

[DataMember]
public string source { get; set; }

[DataMember(Name = "spam_score")]
public int spamScore { get; set; }

}

Key Points:

  • The Name attribute of [DataMember] allows us to map the name of our member to the name of the attribute in the JSON data.  This is used when the attribute names are not valid c# names, or if you have a different naming convention than the data that you are receiving.
  • Remember, names in JSON data are case-sensitive
  • If there is a problem with the data contract a serialization exception will be thrown.

Putting It All Together

Now that we have the individual tools for this solution, it’s time to give you the blueprint on how it all fits together.  As I mentioned earlier, I use an MVC architecture, which means each entity (noun) has a controller class that handles all the work.  In my SSIS Source component, I instantiate one of these controllers and pass my API key in the constructor.  This controller then instantiates my API class.

Depending on the type of operations that an entity supports, a controller may simply retrieve a list of objects for that entity, retrieve one object by ID, or a subset of objects by date last modified.  My SSIS component only needs to know about the model and the controller class for that entity and does not need to worry about the details of creating the web request or passing the request to the API.

Once the list of objects come back, the component simply loops through and maps the data to the output buffer.  WHen that is done, the data is sent through the data flow like any other SSIS package.

Have questions?  Let me know in the comments below!

One thought on “The “How I” Files #3 – Retrieving and Deserializing JSON Data from API

Leave a Reply