Home > Enabling Data Access with Databricks SQL REST API
Donatien Tessier
15 June 2023
Lire cet article en Français

Enabling Data Access with Databricks SQL REST API

Enabling Data Access with Databricks SQL REST API

When setting up a new data platform, the Lakehouse architecture is usually the most popular choice because it can be used in several ways:

  • Machine Learning
  • Business Intelligence
  • Data sharing

Schéma SQL Rest

 

Databricks SQL lets you expose data so BI tools like Power BI can use it within a Microsoft ecosystem. It uses an SQL warehouse, which is a kind of Spark cluster optimized for analytical workloads.

Databricks has released a public preview of REST API access for Databricks SQL so that data does not have to be duplicated and can be accessed through REST APIs.

This means that applications can query data in the Data Lake using Databricks SQL, an endpoint warehouse, via query APIs. This is shown in the diagram below.

Endpoint warehouse

 

Three actions are currently possible with the APIs:

  • Submit an SQL query
  • Cancel a query
  • Check the status of a query

 

Submitting an SQL Query

 

A query is submitted via a POST action using the /sql/statements endpoint.

Code SQL

 

To do this, you need the following:

Serverless Starter Warehouse

 

If the warehouse isn’t already running, calling the API will start it.

In response to the call, it’s likely that you won’t get the result of the query but rather its identifier, the statement_id, and the query’s status, “PENDING.”

 

Sample response:

{
"statement_id": "01ed9a50-c9c9-178e-9be7-0ab52bc998b0",
"status": { "state": "PENDING" }
}


Canceling a Query

 

When a query is running, it can be canceled using the query identifier and the PAT. This is a POST to the endpoint “/sql/statements/<statement_id>/cancel”.

 

Checking the Status of a Query

 

Once a query has been run, you can use a GET action on the “/sql/statements/<statement_id>” endpoint to check its status.

Sample query:

GET /api/2.0/sql/statements/01ed9a50-c9c9-178e-9be7-0ab52bc998b0 HTTP/1.1
Host: <your_HOST>
Authorization: Bearer <your_access_token>

 

Sample response:

Exemple de réponse SQL

 

When the query is successful (see the status in 1), a schema property (see 2) shows the format of the query result, including the number of columns returned by the query and for each column, and the name, type, position in the response starting from zero, and other information depending on the type of data (precision, scale, for example).

The result property contains information about the query execution result. Among this information, the row_count property contains the number of rows returned by the query, and the data_array property is an array of arrays. Each result row is stored in an array inside the data_array array.

 

Demonstration Project

 

We created a demo project based on the churn project available at dbdemos.ai.

This demo project aims to expose data through a frontend.

The architecture looks like this:

 

The backend is used to initiate the query execution and then to check on its status regularly until it has finished.

The frontend shows three visuals:

  • A histogram showing how revenue changes by month
  • A pie chart that shows at-risk customers by channel: mobile, web application, watch, etc.
  • A table showing how many orders were placed by country and platform

Le frontend Databricks

 

We were inspired by the dashboards used in the churn project. For the pie chart, for example:

Dashboards used in the churn project

 

We had to adapt the queries to show these visuals. Databricks SQL lets you get visuals based on the query results:

Requêtes permettant d’afficher ces visuels

 

The front end does not have the same capabilities. As a result, the responsibility had to be moved by exposing a query that “premixed” the work. In this example, we had to write a query that calculated the percentage per channel.

The Cellenza experts who worked on this backend/frontend also added a data refresh function using the blue button to the right of each visual title.

Monthly Revenues

 

This reruns the query and updates the visual. A main button can also be used to refresh all the visuals.

 

SQL REST API: Key Takeaways

 

SQL REST APIs are an interesting way to expose data via a frontend, for example. However, there are some limitations, such as the fact that an SQL warehouse can only handle ten queries simultaneously. To address this limit, you can add clusters to increase the concurrent access number, but this obviously comes at a price.

In addition, the volume of data to be queried must be reasonable. Querying large-volume tables is not the focus here.

Currently, only a personal access token can be used for authentication (PAT). Therefore, it is not possible to send the user’s identity in order to use row-level security (RLS) or column-level security (CLS), for example.

Do you need help with your data projects? Contact us!

 

This posts should interest you
Comments
Leave a Reply

Receive the best of Cloud, DevOps and IT news.
Receive the best of Cloud, DevOps and IT news.