API Overview
dremio-arrow
package is a single module Python API that exposes two methods; a class DremioArrowClient
and a function dremio_query
. DremioArrowClient
class implements flight middlewares and is the gateway to dremio flight server. dremio_query
function is a shorthand to DremioArrowClient
and a very fast way to invoke the client especially when re-use of client parameters is not useful!
Using class DremioArrowClient
The class method is very useful for big applications where re-use of client parameters such as authentication token and workload management queues is important.
-
First step is to import
DremioArrowClient
. This makes the client available on our working environment. -
Initialize the client with dremio flight server connection credentials. These are the credentials you use to access your dremio engine.
-
Here you define SQL query string with data fetch instructions. The table name path must exist, whether as phisical or virtual dataset.
-
We now execute our SQL query against dremio flight server.
-
This step is not necessary but a way to preview the dataset we just fetched. Form here you have your data as a
pandas.DataFrame
and thus can proceed with analysis, processing and reporting bit.
'Cleaner
' Credentials Provision
Dremio flight server connection parameters can be supplied using environment variables. In that case, it is not necessary to supply the credentials when initializing the client.
Either export the variables in current terminal session or persist them on ~/.profile
(ubuntu) or ~/.zshrc
(Mac). To define the virables on current terminal session, execute below commands replacing placeholder texts with actual credential value.
To persist the environment variables, write them into ~/.profile
(ubuntu) or ~/.zshrc
(Mac).
-
Copy the enviroment variables into the file replacing placeholder texts with actual credential value.
-
Refresh active terminal session variables. This step may deactivate the virtual environment variable depending on your OS platform. If this happens, re-activate the virtual environment. By persisting the variables, you are assured your project will work even after a restart of the machine. In addition, chances of commiting secret tokens into VCS spontaneously reduce!
With the environment variables correctly set, we nolonger need to set connection credentials. The client is smart enough to extract the variables from the environmet!
If we had a second query to execute, we would just have reused the client with a different SQL query string.
We didn't create a new client
In this repeat operation, we did not create a new client object, instead we reused the one we earlier created above!
Getting Help
For more information on client usage, see [API Reference] or run below chunk in your python interpreter.
Using function dremio_query
This function is very useful when we are interested in executing a single query and are not sure when a second query might be executed. The method takes authentication credentials and returns data. In essence, this is to mean session bearer token cannot be re-used because we are not using the client directly!
-
First step is to import
dremio_query
function. -
Define SQL query string with data fetch instructions.
-
Execute the SQL query string against dremio flight server.
With credentials defined as environment variables, the above is refactired to:
Getting Help
For more information on the function usage, see [API Reference] or run below chunk in your python interpreter.
Created: July 4, 2023