In our previous blog post, Streaming Data Governance and DeltaStream, we discussed the importance of Data Unification and Data Governance in a stream processing data platform. In particular, we highlighted the Streaming Catalog and Role-Based Access Control (RBAC) on the Streaming Catalog tools DeltaStream exposes to users to help them govern their streaming data. In this post, we’ll go over an example use case to show off how the Streaming Catalog and RBAC work in DeltaStream.

For our use case, let’s assume we are a company that needs to do real-time analytics for ads and marketing reports. In our setup, we have the following:

  • 3 streaming Stores – ”CC_kafka” (Confluent Cloud), “kafka_store”, “kinesis_store”
  • 2 Teams – ”Reports Team” and “Ads Team”
  • 1 Organization administrator

Notice in our setup that there are 3 different Stores, 2 Kafka Stores and 1 Kinesis Store. The data in these Stores do not belong to a single team, in fact each team may actually be responsible for data in multiple Stores. For instance, our “Ads Team” needs read and write access to one topic from each of the Stores (when we say “topic” we are referring to the topics in Kafka and the streams in Kinesis).

The goal of this use case is twofold. First, to unify and organize the streaming data from the 3 Stores so that the organization of the data aligns with the team structure. Second, to set up roles for each of the teams so that users belonging to those teams can easily be granted the appropriate access to the resources that pertain to their team.

Below is a visualization of our use case.

The Administrator Role

The administrator will have access to the sysadmin and useradmin built-in roles. These, along with the securityadmin and orgadmin roles, are special roles in DeltaStream with powerful privileges that should only be given to a handful of people in an organization. To solve our use case, our administrator will first assume the useradmin role to create the appropriate roles that specific team members will be granted access to. Then, the administrator needs to use the sysadmin role to set up the Streaming Catalog and define Stores for our data sources, as well as grant the appropriate permissions for the roles created by the useradmin.

useradmin

The useradmin role has privileges to manage users and roles within the Organization. The administrator will assume the useradmin role to create new custom roles for our “Reports Team” and “Ads Team.”

We can switch to the useradmin role using the USE ROLE command before we start creating custom roles.

USE ROLE useradmin;

Following the best practices for creating custom roles, we will build out a role hierarchy where sysadmin is the top-most role. The below diagram illustrates the hierarchy of roles.

The following statements create roles to match the diagram in Figure 2:

CREATE ROLE "MarketingRole" WITH (IN ROLE (sysadmin));
CREATE ROLE "ContentRole" WITH (IN ROLE ("MarketingRole"));
CREATE ROLE "ReportsRole" WITH (IN ROLE ("MarketingRole"));
CREATE ROLE "AdsRole" WITH (IN ROLE (sysadmin));
CREATE ROLE "TrafficRole" WITH (IN ROLE ("AdsRole"));

Although we’ve created the roles, we haven’t actually assigned them any permissions. We’ll do this using the sysadmin role in the next section.

To invite our team members to our Organization, we can use the INVITE USER command. The following statement invites a new user on the “Ads” team and grants them the new “AdsRole” role.

INVITE USER '[email protected]' WITH ('roles'=("AdsRole"), 'default'="AdsRole");

Similarly, we can invite a new user on the “Reports” team and assign the “ReportsRole” role to them.

INVITE USER '[email protected]' WITH ('roles'=("ReportsRole"), 'default'="ReportsRole");

sysadmin

The sysadmin role has privileges to create, manage, and drop objects. As the administrator, we’ll be using this role to do the following:

  1. Add the connectivity to our data storage systems (ie. Kafka and Kinesis) by creating Stores
  2. Set up the Databases and Schemas in the Streaming Catalog to provide the organizational framework for step 3
  3. Define Relations for the topics in our Stores and assign them to the correct Database and Schema
  4. Grant access to these Databases and Schemas to the appropriate roles

Before we begin, let’s ensure that we are using the sysadmin role.

USE ROLE sysadmin;

First, we’ll define the Stores for our data. Since we can’t share our real Kafka or Kinesis connection configurations, the below SQL statement is a template for the CREATE STORE statement (CREATE STORE documentation).

CREATE STORE kafka_store WITH (
'type' = KAFKA, 'access_region' = "AWS us-east-1",
'kafka.sasl.hash_function' = PLAIN,
'kafka.sasl.password' = '<creds_password>',
'kafka.sasl.username' = '<creds_username>',
'uris' = '<broker_uris>'
);

The next step is to create Databases and Schemas for our Streaming Catalog. As you can see in Figure 1 above, there will be two Databases – ”Marketing” and “Ads”. Within the “Marketing” Database, there exists a “Content” Schema and a “Reports” Schema. Within the “Ads” Database, there exists a single “Traffic” Schema.

CREATE DATABASE "Marketing";
CREATE SCHEMA "Content" IN DATABASE "Marketing";
CREATE SCHEMA "Reports" IN DATABASE "Marketing";
CREATE DATABASE "Ads";
CREATE SCHEMA "Traffic" IN DATABASE "Ads";

Now that we have the namespaces in our Streaming Catalog set up, we can move on to our third task of defining Relations backed by the topics in our Stores to populate the Streaming Catalog. As you can see in Figure 1 above, there are many topics that exist in our Stores, and thus many Relations that need to be written. For the sake of brevity, we’ll just provide one example statement for CREATE STREAM (tutorial on creating Relations).

CREATE STREAM "Marketing"."Reports".reports_data (
col0 BIGINT, col1 VARCHAR, col2 VARCHAR
) WITH (
'store' = 'cc_kafka', 'topic' = 'reporting',
'value.format' = 'json'
);

This CREATE STREAM statement is creating a Stream called “reports_data” in the “Reports” Schema, which is in the “Marketing” Database. This Stream has three fields, simply called “col0”, “col1”, and “col2”, and is backed by the topic “reporting” in the “cc_kafka” Store. Similar CREATE STREAM or CREATE CHANGELOG statements can be created for the other topics in the same Store or other Stores.

For our fourth task, we must now grant the custom roles, which were created by the useradmin in the previous section, access to the Databases and Schemas. Based on the diagram in Figure 1, the following statements will grant privileges to the correct data objects corresponding to the appropriate roles. The USAGE privilege is similar to read, and the CREATE privilege is similar to write.

GRANT USAGE, CREATE ON DATABASE "Marketing" TO ROLE "MarketingRole";
GRANT USAGE, CREATE ON SCHEMA "Marketing"."Content" TO ROLE "ContentRole";
GRANT USAGE, CREATE ON SCHEMA "Marketing"."Reports" TO ROLE "ReportsRole";
GRANT USAGE, CREATE ON DATABASE "Ads" TO ROLE "AdsRole";
GRANT USAGE, CREATE ON SCHEMA "Ads"."Traffic" TO ROLE "TrafficRole";

Member of the Ads Team Role

As a new user on the “Ads” team, after accepting the invitation that the useradmin sent, I should expect the following:

  1. Access to the “AdsRole” and any roles that are granted to it, in this case the “TrafficRole”
  2. Access to the “Ads” Database and “Traffic” Schema

By listing our roles in the DeltaStream CLI, we can see which role is currently being used:

<no-db>/<no-store># LIST ROLES;

Name      | Current |      Created at
----------------+---------+-----------------------
…
Marketing     |         | 2024-01-04T22:24:16Z
sysadmin      |         | 2023-09-19T16:06:00Z
AdsRole       | ✓       | 2024-01-05T04:57:31Z
…

We can also describe the “AdsRole” role to see that “TrafficRole” is properly inherited:

<no-db>/<no-store># DESCRIBE ROLE "AdsRole";

   Name   |      Created at
----------+-----------------------
  AdsRole | 2024-01-05T04:57:31Z

Granted Roles
     Name
---------------
  TrafficRole
  public

Granted Privileges

   Type  |  Target  |   ID/Name   | Grant option
---------+----------+-------------+---------------
  Usage  | Database | Ads         |
  Usage  | Role     | TrafficRole |
  Create | Database | Ads         |

Finally, we can list the Databases and Schemas to see that we indeed have access to the “Ads” Database and “Traffic” Schema. Note that the “Marketing” Database is not visible, because only the “MarketingRole” role and any roles that inherit from the “MarketingRole” have access to that Database.

<no-db>/<no-store># LIST DATABASES;

Name | Default |  Owner   |      Created at      |      Updated at
-------+---------+----------+----------------------+-----------------------
Ads  |         | sysadmin | 2024-01-04T23:12:15Z | 2024-01-04T23:12:15Z


<no-db>/<no-store># LIST SCHEMAS IN DATABASE "Ads";

Name   | Default |  Owner   |      Created at      |      Updated at
----------+---------+----------+----------------------+-----------------------
Traffic |         | sysadmin | 2024-01-04T23:12:15Z | 2024-01-04T23:12:15Z

Member of the Reports Team Role

As a new user on the “Reports” team, after accepting the invitation that the useradmin sent, I should expect the following:

  1. Access to the “ReportsRole” only
  2. Access to the “Ads” Database and “Traffic” Schema

By listing our roles in the DeltaStream CLI, we can see which role is currently being used:

<no-db>/<no-store># LIST ROLES;

Name      | Current |      Created at
----------------+---------+-----------------------
…
ContentRole   |         | 2024-01-05T04:57:30Z
ReportsRole   | ✓       | 2024-01-05T04:57:30Z
MarketingRole |         | 2024-01-05T04:57:30Z

…

We can also describe the “AdsRole” role to see that “TrafficRole” is properly inherited:

<no-db>/<no-store># DESCRIBE ROLE "ReportsRole";

Name     |      Created at
--------------+-----------------------
ReportsRole | 2024-01-05T04:57:30Z

Granted Roles

Name
----------
public


Granted Privileges

Type  |  Target  |  ID/Name  | Grant option
---------+----------+-----------+---------------
Usage  | Database | Marketing |
Usage  | Schema   | Reports   |
Create | Schema   | Reports   |

Finally, we can list the Databases and Schemas to see that we indeed have access to the “Ads” Database and “Traffic” Schema. Note that the “Marketing” Database is not visible, because only the “MarketingRole” role and any roles that inherit from the “MarketingRole” have access to that Database.

<no-db>/<no-store># LIST DATABASES;

Name    | Default |  Owner   |      Created at      |      Updated at
------------+---------+----------+----------------------+-----------------------
Marketing |         | sysadmin | 2024-01-04T23:12:15Z | 2024-01-04T23:12:15Z


<no-db>/<no-store># LIST SCHEMAS IN DATABASE "Marketing";

Name   | Default |  Owner   |      Created at      |      Updated at
----------+---------+----------+----------------------+-----------------------
Reports |         | sysadmin | 2024-01-04T23:12:15Z | 2024-01-04T23:12:15Z

Conclusion

RBAC is one of DeltaStream’s core features that manages the access to different data objects in DeltaStream. In this example, we show off how different roles can be created to match an organization’s team structure. This way, giving permissions to specific roles inherently gives permissions to entire teams. While we focused on RBAC in the context of DeltaStream’s Streaming Catalog, giving access to Databases and Schemas in particular, RBAC can also be applied to other data assets such as Stores, Descriptors, and Queries.

If you want to learn more about DeltaStream’s RBAC, or try it for yourself, get a free trial.