All Collections
Components
Transformer - Getting to know JOLT
Transformer - Getting to know JOLT

Understanding transformations with JOLT and how to apply them

Aquiles avatar
Written by Aquiles
Updated over a week ago

IMPORTANT: This documentation has been discontinued. Read the updated Transformer - Getting to know JOLT documentation on our new documentation portal.

This article aims to explain the concepts and applications of JOLT.

For development and tests of all examples contained in this article:

JOLT - JsOn Language for Transform

JOLT is a language used for the transformation of JSONs. It uses the following basic structure:

[ 
{
"operation": "",
"spec": {

}
}
]

Where:

  • "operations": defines the type of transformation that will be applied

  • "spec": field where to put the transformation

  • "[]": the JOLT basic structure is also a JSON: therefore, we can chain multiple operations inside the list.

The transformation details will always depend on the input JSON.

OPERATIONS

There are several types of operations in JOLT.

They are:

  • shift

  • default

  • remove

  • sort

  • cardinality

  • modify-default-beta

  • modify-overwrite-beta

Each operation has its own set of rules and will work in different ways.

However, they all follow the same principle for the transformations, which is the navigation in the input JSON structure.

shift

Used to change the structure of a JSON, keeping the values contained in that same JSON.

Its use consists in navigating the JSON structure to the field or object that we want to get its value and then informing us where this value should be placed in the new JSON that we want.

Let's see the example below:

We have an input JSON containing information about Clients:

{ 
"client": {
"name": "Sample Client",
"email": "sample-client@email.com",
"ssn": "123.456.789.10",
"birthDate": "02/15/1985",
"address": "Sample Client street, 123",
"country": "United States",
"number": "8888-8888"
}
}

And we want a new JSON with the following structure:

{
"customer" : {
"fullName" : "Sample Client",
"birthDate" : "02/15/1985",
"address" : {
"street" : "Sample Client street, 123",
"country" : "United States"
},
"phoneNumber" : "8888-8888",
"mobileNumber" : "8888-8888"
}
}

Our transformation will be:

[
{
"operation": "shift",
"spec": {
"client": {
"name": "customer.fullName",
"birthDate": "customer.birthDate",
"address": "customer.address.street",
"country": "customer.address.country",
"number": ["customer.phoneNumber", "customer.mobileNumber"]
}
}
}
]

What we did above was navigate to the fields of our interest and inform where the values of each one of these fields should be inserted.

Through the "." (dot) notation, we are able to define levels in the new JSON that we want to create. With "name":"customer.fullName" we take the value of the field name and throw it into the field fullName inside of the object customer, and in "address":"customer.address.street" we take the value of the field address and throw it into the field street inside of the object address which will also be contained in the object costumer.

Tip: We can take the same value and throw it into more than 1 field at the same time.

In "number": ["customer.phoneNumber", "customer.mobileNumber"], we take the value of the field number and throw it into the fields phoneNumber and mobileNumber, both contained in customer. This approach allows us to transpose a value into n new fields.

Obs.: On this operation, only the fields that are manipulated in the transformation will be replicated, so any data in the input JSON that is not transformed will be discarded (ie. SSN in the above example).

Final JSON:

{
"customer" : {
"fullName" : "Sample Client",
"birthDate" : "02/15/1985",
"address" : {
"street" : "Sample Client street, 123",
"country" : "United States"
},
"phoneNumber" : "8888-8888",
"mobileNumber" : "8888-8888"
}
}

default

It is used to add new fields or objects in a JSON if they don't already exist.

Its use consists in navigating the JSON structure to the desired level and inserting the field or object with its respective value.

Obs.: In case the field declared in the transformation already exists in the input JSON, the transformation will have no effect.

Let's see the example below:

We have an input JSON containing information about Customer:

{
"customer": {
"name": "Costumer Default",
"ssn": "123.456.789.10"
}
}

However, we need a JSON that in addition to the name and ssn, also contains the customer's date of birth.

{
"customer": {
"name": "Costumer Default",
"ssn": "123.456.789.10",
"birthDate": "01/01/1970"
}
}

Our transformation will be:

[
{
"operation": "default",
"spec": {
"customer": {
"birthDate": "01/01/1970"
}
}
}
]

Above we navigate to the object customer and add the field birthDate with a default value.

remove

Used to remove fields or objects from a JSON.

Its use consists in navigating the input JSON structure to the desired level and informing the field to be removed.

Let's see the example below:

We have an input JSON containing information about Customer:

{
"customer": {
"name": "Costumer Default",
"ssn": "123.456.789.10",
"birthDate": "01/01/1970"
}
}

However, we need a JSON that only contains the customer's name and ssn:

{
"customer": {
"name": "Costumer Default",
"ssn": "123.456.789.10"
}
}

Our transformation will be:

[
{
"operation": "remove",
"spec": {
"customer": {
"birthDate": ""
}
}
}
]

What we did above was navigating to the field birthDate and assign it to empty quotation marks.

The field to be removed must always be assigned to an empty String, otherwise there will be an error in the transformation and it will not occur.

sort

Used to sort fields and objects in a JSON in alphabetical order.

Obs.: The ordering of fields and objects cannot be configured, therefore all JSON will be affected. Only the fields and objects' names are ordered and not their values.

Let's see the example below:

We have an input JSON containing information about Employee:

{
"employee": {
"phone": "9 9999-9999",
"name": "Employee Sort",
"birthDate": "01/01/1980",
"role": "JOLT Analyst"
}
}

We need all fields contained in the input JSON to be ordered in alphabetical order:

{
"employee": {
"birthDate": "01/01/1980",
"name": "Employee Sort",
"phone": "9 9999-9999",
"role": "JOLT Analyst"
}
}

For the operation sort, we don't need to define the "spec" object that we always use together with the "operation" field.

Our transformation will be:

[
{
"operation": "sort"
}
]

cardinality

Used to transform simple fields and objects into lists of objects and vice-versa.

Obs.: When we transform a list of objects to a simple field or object, only the first element of the list will be considered.

Let's see the example below:

We have an input JSON containing information about Products:

{
"products": {
"name": "Product A",
"id": "123-A",
"value": 10
}
}

We need to transform the "products" object into a list:

{
"products": [
{
"name": "Product A",
"id": "123-A",
"value": 10
}
]
}


Our transformation will be:

[
{
"operation": "cardinality",
"spec": {
"products": "MANY"
}
}
]

In case we have a list of products:

{
"products": [
{
"name": "Product A",
"id": "123-A",
"value": 10
},
{
"name": "Product B",
"id": "456-B",
"value": 20
}
]
}

And we need to transform that list into a simple object:

{
"products": {
"name": "Product A",
"id": "123-A",
"value": 10
}
}

Our transformation will be:

[
{
"operation": "cardinality",
"spec": {
"products": "ONE"
}
}
]

Deepening your knowledge

Until here we saw essential concepts to understand JOLT and how to use them, however before we see the remaining operations, we need to learn a few more elaborate concepts.

As a pre requisite to moving forward, we will see 2 concepts that will be used in future explanations:

They are:

  • LHS (Left Hand Side)
    Used to reference the left side of the transformation.

  • RHS (Right Hand Side)
    Used to reference the right side of the transformation.

That is, all JSON content that is before : (colon) will be our LHS and what is after : (colon) will be our RHS.

Transformation example:

[
{
"operation": "shift",
"spec": {
LHS -> "customer": {
LHS -> "name": "client.fullName", <- RHS
LHS -> "birhtDate": "client.dateOfBirth", <- RHS
LHS -> "address": "client.address.street", <- RHS
LHS -> "country": "client.address.country" <- RHS
}
}
}
]

Now we can move forward.

The great power of JOLT it's in the possibility of dealing with transformations in a dynamic way. For this, we use wildcards, which are specific characters allocated in different ways in our transformations, each one with a different function.

One wildcard can have different functions depending on its use (LHS or RHS), in addition, we can combine different wildcards in the same transformation.


Below we will see its definitions and some usage examples.

&


It uses the content of what is declared in the LHS to compose the structure of the output JSON, without the need to make this content explicit in the transformation. This wildcard is based on the navigation made during the transformation.

Usage: RHS
Operation: shift

Example:

We have a JSON that contains the customer's data:

{
"name": "Client Example",
"email": "client-example@email.com"
}

And we need this data in an object called "client":

{
"client": {
"name": "Client Example",
"email": "client-example@email.com"
}
}

Our transformation will be:

[
{
"operation": "shift",
"spec": {
"name": "client.&",
"email": "client.&"
}
}
]

In &, we take the values of fields "name" and "email" then we assign them to the fields called "name" and "email" inside a "client" object. Thus we create a new JSON keeping the input fields structure of the input JSON.

* (asterisk)


References all fields and objects in a JSON without having to make their names explicit in the transformation.

Usage: LHS
Operations: shift, remove, cardinality, modify-default-beta e modify-overwrite-beta

Example:

We have an input JSON containing the customer's data:

{
"name": "Customer Example",
"email": "cliente-exemplo@email.com",
"document": "1234567890",
"birthDate": "10/31/1990",
"address": "Customer Example Street"
}

And we need this data in an object named "customer", but we need to change the "document" field to a field named "ssn":

{
"customer": {
"name": "Customer Example",
"email": "client-example@email.com",
"document": "1234567890",
"birthDate": "10/31/1990",
"address": "Customer Example Street"
}
}

Our transformation will be:

[
{
"operation": "shift",
"spec": {
"*": "customer.&",
"document": "customer.ssn"
}
}
]


In the line "*": "customer. &", we are taking any content that exists in the input JSON and placing it in an object named "customer" keeping the entire structure of whatever that content is. As for the "document" field, we are taking its value and assigning it to a field named "ssn" also within the "customer" object.

Using the wildcard * next to & means that for each field that * finds, & will keep its name and value. This combined use of wildcards is very useful as it allows us to manipulate a JSON without the need to know and declare its content.

@


References the value of a field or object contained in the input JSON, but has different effects depending on its usage.

Usage: LHS e RHS
Operations: shift (LHS e RHS), modify-default-beta (RHS) e modify-overwrite-beta (RHS).

Shift example:
We have a JSON containing isolated Product's information:

{
"key": "code",
"value": "123-ABC"
}


And we need to group them into a "product" object, relating the "key" field to the "value" field:

{
"product": {
"code": "123-ABC"
}
}

Our transformation will be:

[
{
"operation": "shift",
"spec": {
"value": "product.@(1,key)"
}
}
]


In "@ (1, key)" we are taking the value of the "key" field to be used as the name of the field that will receive the value of the "value" field ("@value"). The use of @ in both LHS and RHS involves declaring the level at which we are seeking information and counting levels from level 1 onwards.

In this case, the "key" field is on the same level as the "value" field, so we use the number 1.

The usage of @ in LHS follows the same way as in RHS.

modify-default-beta e modify-overwrite-beta examples:

We have a JSON containing isolated Product's information:

{
"product": {
"name": "Produto A",
"price": 10
},
"manufacturer": "Company A"
}


And we need the "product" object to contain a "company" field with the value of the "manufacturer" field, which is outside "product":

{
"product": {
"name": "Product A",
"price": 10,
"company": "Company A"
},
"manufacturer": "Company A"
}

The transformation will be:

[
{
"operation": "modify-default-beta", //for modify-overwrite-beta
"spec": { //we'd have the same transformation
"produto": { <- level 2
"company": "@(2,manufacturer)" <- level 1
}
}
}
]


What we did was create a field named "company" and assign the value of the field "manufacturer" in it. For that we went up to level 2 to be able to see the "manufacturer" field and thus get its value.

The difference between modify-default-beta and modify-overwrite-beta is that in modify-default-beta the inclusion of the "company" field is only made if there is no other field named "company" within "product".

For modify-overwrite-beta, the company field will be included even if the "company" field already exists within "product". As the name overwrite suggests, if the content we want to add already exists in the input JSON, it will always be overwritten.

$


References the name of a field or object contained in the input JSON to be used as the value of a field or object in the output JSON.

Usage: LHS
Operations: shift

Example:

We have a input JSON containing product's data:

{
"product": {
"name": "Product Example",
"value": 10,
"category": "CATEG-1",
"weight": 25
}
}

And we need a JSON to know what product information is being provided:

{
"product": [
"name",
"value",
"category",
"weight"
]
}

Our transformation will be:

[
{
"operation": "shift",
"spec": {
"product": {
"*": {
"$": "product[]"
}
}
}
}
]


What we did was select all (*) the fields of the object "product", then take the name ($) of each one of them and assign them to a list named "product".

That way we can get the name of each field and not their values.

#

If used in LHS, it has the function of entering values ​​manually in the output JSON.

In RHS, on the other hand, it is applicable only to create lists and has the function of grouping certain content of the input JSON within the list to be created.

Usage: LHS e RHS
Operations: shift


LHS example:

We have an input JSON with Product information:

{ 
"product": {
"name": "Product Example",
"value": 10,
"weight": 25
}
}


And we need a JSON that contains product name, value, weight and category information:

{ 
"product": {
"name": "Product Example",
"value": 10,
"category": "CATEG-1",
"weight": 25
}
}


However, the input JSON never provides category information, so we need to manually add this field:

[
{
"operation": "shift",
"spec": {
"product": {
"*": "product.&",
"#DEFAULT-CATEGORY": "product.category"
}
}
}
]


The value contained after the wildcard # will always be assigned to the field declared in the RHS, which in our case is the "category" field within the "product" object.


RHS Example:

We have an input JSON containing a list of products:

{
"products": [
{
"code": "PROD-A",
"value": 10
},
{
"code": "PROD-B",
"value": 20
}
]
}


And we just need to change the name of the field "value" to "price":

{
"products": [
{
"code": "PROD-A",
"price": 10
},
{
"code": "PROD-B",
"price": 20
}
]
}

Our transformation will be:

[
{
"operation": "shift",
"spec": {
"products": { <- level 2
"*": { <- level 1
"code": "products[#2].&", <- level 0
"value": "products[#2].price" <- level 0
}
}
}
}
]


The use of # in RHS involves declaring the level at where we are seeking information. The declaration [#2] represents the creation of a list ([]) and that it must group (#) all the information that is found 2 levels above.

We need this declaration in order to guarantee the correct grouping of each product with its respective "code" and "price"


That is, in "code": "products [# 2]. &" We are taking the value from the "code" field and throwing it to another "code" field within a list named "products", and in "value": "products [# 2] .price" we take the value from the "value" field and throw it into a field named "price" within the same "products" list.


However, when creating the list "products" we will look 2 levels above (level of the list "products") and the way that each item is grouped in "product list", this grouping will be maintained in the new "products" list.

| (pipe)


It allows referencing multiple fields or objects of an input JSON so that, regardless of the name of the field or object, its value is allocated to the same destination in the output JSON.

Usage: LHS
Operations: shift


Example:

We have an input JSON containing customer data:

{
"customer": {
"fullName": "Customer Example",
"email": "customer-example@email.com"
}
}


And we need a JSON of the following structure:

{
"customer": {
"name": "Customer Example",
"email": "customer-example@email.com"
}
}


However, in the input JSON, there is the possibility that the field "fullName" comes with the name "customerName", so we need the transformation to be prepared to recognize the two possibilities:

[
{
"operation": "shift",
"spec": {
"customer": {
"fullName|customerName": "customer.nome",
"email": "customer.&"
}
}
}
]

Operations modify-default-beta e modify-overwrite-beta


As was said in the @ wildcard explanation, these operations allow us to dynamically reference values. While modify-default-beta will assign a value to a field if it does not exist, modify-overwrite-beta will overwrite any value even if the field already exists.

However, modify-overwrite-beta also allows us to apply functions to our JSON.

They are:

  • String
    toLower, toUpper, concat, join, split, substring, trim, leftPad e rightPad

  • Number
    min, max, abs, avg, intSum, doubleSum, longSum, intSubtract, doubleSubtract, longSubtract, divide e divideAndRound

  • Type
    toInteger, toDouble, toLong, toBoolean, toString, recursivelySquashNulls, squashNulls, size

  • List
    firstElement, lastElement, elementAt, toList, sort

Examples:

Input JSON:

{
"STRING": {
"product": "Product A",
"company": "company a",
"value": "100",
"measureWithSpaces": " 10 meters "
},
"NUMBER": {
"array": [ 3, 5, 2, 7, 1 ],
"negativeValue": -100,
"positiveValue": 50
},
"TYPE": {
"value": 10.5,
"stringBoolean": "true",
"objectWithNull": {
"fielWithValue": "ABC",
"nullField": null
}
},
"LIST": {
"array": [ "c", "t", "m", "a" ],
"stringField": "123"
}
}

Transformations:

[
{
"operation": "modify-overwrite-beta",
"spec": {
"STRING": {
"product": "=toLower(@(1,product))",
"company": "=toUpper(@(1,company))",
"product_company": "=concat(@(1,product),'_',@(1,company))",
"joinProductCompany": "=join(' - ',@(1,product),@(1,company))",
"splitProductCompany": "=split('[-]',@(1,joinProductCompany))",
"substringProduct": "=substring(@(1,product),0,4)",
"value": "=leftPad(@(1,value),6,'A')",
"measure": "=trim(@(1,measureWithSpaces))"
},
"NUMBER": {
"minArray": "=min(@(1,array))",
"maxArray": "=max(@(1,array))",
"absoluteValue": "=abs(@(1,negativeValue))",
"averageArray": "=avg(@(1,array))",
"sumArray": "=intSum(@(1,array))",
"subtrArray": "=intSubtract(@(1,positiveValue), 20)",
"division": "=divide(@(1,positiveValue),2)",
"divisionRound": "=divideAndRound(3,@(1,positiveValue),3)"
},
"TYPE": {
"integerValue": "=toInteger(@(1,value))",
"booleano": "=toBoolean(@(1,stringBoolean))",
"stringValue": "=toString(@(1,value))",
"stringBoolean": "=size",
"objectWithNull": "=recursivelySquashNulls"
},
"LIST": {
"arrayFirstItem": "=firstElement(@(1,array))",
"arrayLastItem": "=lastElement(@(1,array))",
"arrayElement": "=elementAt(@(1,array),2)",
"fieldToList": "=toList(@(1,stringField))",
"orderedArray": "=sort(@(1,array))"
}
}
}
]

Output JSON:

{
"STRING": {
"product": "product a",
"company": "COMPANY A",
"value": "AAA100",
"measureWithSpaces": " 10 meters ",
"product_company": "product a_COMPANY A",
"joinProductCompany": "product a - COMPANY A",
"splitProductCompany": [
"product a ",
" COMPANY A"
],
"substringProduct": "prod",
"measure": "10 meters"
},
"NUMBER": {
"array": [ 3, 5, 2, 7, 1 ],
"negativeValue": -100,
"positiveValue": 50,
"minArray": 1,
"maxArray": 7,
"absoluteValue": 100,
"averageArray": 3.6,
"sumArray": 18,
"subtrArray": 30,
"division": 25,
"divisionRound": 16.667
},
"TYPE": {
"value": 10.5,
"stringBoolean": 4,
"objectWithNull": {
"fielWithValue": "ABC"
},
"integerValue": 10,
"booleano": true,
"stringValue": "10.5"
},
"LIST": {
"array": [
"c",
"t",
"m",
"a"
],
"stringField": "123",
"arrayFirstItem": "c",
"arrayLastItem": "a",
"fieldToList": [
"123"
],
"orderedArray": [ "a", "c", "m", "t" ]
}
}


Note: Some functions were not included in the transformation above because they follow the same principle as others that were included, such as the doubleSum and longSum functions that are applied in the same way as intSum. Regarding the recursivelySquashNulls and squashNulls functions, both are applicable only to objects and lists and serve to remove fields with null values, however while recursivelySquashNulls will look at all levels below the object or list, squashNulls will look just 1 level below.


The modify-overwrite-beta has its execution in cascade, that is, each new transformation is impacted by the previous transformations.

To understand this cascading behavior, let's take a snippet of the input JSON and the above transformations:

//JSON de entrada
"STRING": {
"product": "PRODUCT A",
"company": "company a",
"value": "100",

//Transformação
"STRING": {
"product": "=toLower(@(1,product))",
"company": "=toUpper(@(1,company))",
"product_company": "=concat(@(1,product),'_',@(1,company))",


In "product": "= toLower (@ (1, product))" we change the value from "product" to lower case and in "company": "= toUpper (@ (1, company))", we change the value of " company "to capital. So the moment we do "product_company": "= concat (@ (1, product), '_', @ (1, company))", we are using the values ​​of "product" and "company" already transformed and not its original values ​​contained in the input JSON, with that we will have "product_company": "product a_Company A".


For a more technical reading about JOLT:

Did this answer your question?