Personal Blog of https://github.com/Cyberax

Multitenancy in Postgres

What is Multitenancy

Even in the microservice world there's a common requirement to host data for many tenants.
But first let's discuss what exactly is a tenant. When discussing multitenancy,
various websites and some books give examples like this: a company that resells database
access and needs to store data from multiple clients in one database. This is an extremely
naïve example and it doesn't really reflect the actual reality.

For the purpose of this document, a tenant is a group of users that work within the
same organization (or are somehow associated). For example, if we're making an enterprise
chat application (a Slack clone, why not?) then a tenant would be an organization that
subscribes for it. And the main goal would be to make sure that one tenant can't access
the data from other tenants.

We likely still need to add some kind of access control within the tenant, but we
absolutely need to make sure that data doesn't leak across the tenant boundary. Moreover,
we should make it a goal to ensure that any bug in our code does not result in leaking
data outside of the tenant. Basically, imagine how you can deal with the worst possible
scenarios: arbitrary unlimited SQL injection, or a fully exploitable buffer overflow in
server code.

PostgreSQL Row-Level Security

So with this in mind let's start designing the data access code. We're using PostgreSQL
as our main data storage, so we'll be looking at securing it. We can't do
database-per-tenant or schema-per-tenant partitioning as this will blow up the complexity
of all the routine operations like database upgrade. Instead we'll be looking at row-level
security.

Let's start with the schema and some sample data. I purposefully use human-readable
names for tenants, in actual production code it's better to use something like
uuid default uuid_generate_v4() instead (and probably for the orders table as well).

-- The application role
create role slack_app nosuperuser nocreatedb nocreaterole login password '123';

-- The tenants table
create table tenant (tenant_id varchar primary key, name varchar);
grant select on tenant to slack_app;

-- And a simple data table
create table orders (order_id int8 primary key, order_text varchar, 
    tenant_id varchar not null references tenant(tenant_id) on delete restrict);
grant select, insert, update, delete on orders to slack_app;

insert into tenant(tenant_id, name) values('HHL','Horns&Hooves Ltd.');
insert into tenant(tenant_id, name) values('CIA', 'Scary Government Agency');
insert into orders(order_id, order_text, tenant_id) values (1, 'Chairs', 'HHL');
insert into orders(order_id, order_text, tenant_id) values (2, 'Diamonds', 'HHL');
insert into orders(order_id, order_text, tenant_id) values (3, 'Killer Drones', 'CIA');

alter table tenant enable row level security;
alter table orders enable row level security;

So far so good. We can log into the database as slack_app and do the CRUD operations
on tenants and orders.

Now we need to add some kind of security. This how-to guide has a nice tutorial,
so we'll follow it.

create policy tenant_isolation_policy on tenant using (tenant_id = current_setting('app.current_tenant'));
create policy tenant_isolation_policy on orders using (tenant_id = current_setting('app.current_tenant'));

Now let's test it by logging in as slack_app and trying to do something:

cyberax@CybMac:/tmp$ psql --user slack_app slackapp
slackapp=> select * from orders;
ERROR:  unrecognized configuration parameter "app.current_tenant"

Good, we can't see all the data. Now let's try to change the tenant:

slackapp=> set app.current_tenant = 'HHL';
SET
slackapp=> select * from orders;
 order_id | order_text | tenant_id
----------+------------+-----------
        1 | Chairs     | HHL
        2 | Diamonds   | HHL
(2 rows)

Great! We can only see our own data. But there's one small problem, as nothing whatsoever
stops an attacker that gained ability to do arbitrary SQL injection from doing this:

slackapp=> set app.current_tenant = 'CIA';
SET
slackapp=> select * from orders;
 order_id |  order_text   | tenant_id
----------+---------------+-----------
        3 | Killer Drones | CIA
(1 row)

There is no way to limit the SET operations in PostgreSQL to be one-time only. Also
there are no ways to prohibit running SET commands altogether, or at least limit them
to a set of whitelisted options.

Tokenizing Everything

One possible solution is to use unguessable tenant names (e.g. UUIDs), so this way the
attacker likely won't know the other tenants' IDs right off the bat. But this is not a
good solution, any tenant ID leak would give an attacker access to all the tenant's
documents.

But this approach seems to be on the right track. What if instead of tenants we use
one-time tokens? These tokens can be populated by a small highly-secure service and
passed to the main application.

Let's try it! We need to create a table for the tokens and a stored procedure to
check them:

create table token(token varchar primary key, tenant_id varchar not null references tenant(tenant_id) on delete restrict, valid_until timestamp);

create or replace function get_tenant()
returns varchar language plpgsql security definer
as $$ declare 
  tenant_res varchar;
begin
select tenant_id into tenant_res from token where token = current_setting('app.token', true) and now() < valid_until;
return tenant_res;
end $$;

Some explanations: security definer modifier means that the function is always invoked
with the permissions of the user that defined it (the superuser in this case). This is
necessary because we absolutely DO NOT want to give the slack_app user permissions to
do select on our tokens table.

The rest is straightforward, we need to modify the row-level security policy on the
tables and insert some test tokens.

drop policy tenant_isolation_policy on tenant;
drop policy tenant_isolation_policy on orders;

create policy tenant_isolation_policy on tenant using (tenant_id = get_tenant());
create policy tenant_isolation_policy on orders using (tenant_id = get_tenant());

-- Insert some test tokens (they MUST be unguessable cryptographically random strings in a real application)
insert into token (tenant_id, token, valid_until) values ('CIA', 'token-high', now() + interval '2 hours');
insert into token (tenant_id, token, valid_until) values ('HHL', 'token-low', now() + interval '2 hours');

And now let's test it!

cyberax@CybMac:/tmp$ psql --user slack_app slackapp
psql (13.3)
Type "help" for help.

slackapp=> select * from orders ;
 order_id | order_text | tenant_id
----------+------------+-----------
(0 rows)

slackapp=> set app.token = 'token-high';
slackapp=> select * from orders;
 order_id |  order_text   | tenant_id
----------+---------------+-----------
        3 | Killer Drones | CIA
(1 row)

slackapp=> set app.token = 'token-low';
slackapp=> select * from orders;
 order_id | order_text | tenant_id
----------+------------+-----------
        1 | Chairs     | HHL
        2 | Diamonds   | HHL
(2 rows)

And this is exactly what we want! The limited time tokens provide authorization to access
the data for individual tenants. The tokens can be generated by a small service and
communicated to the app over a secure channel. And there's nothing an attacker can do
without knowing a token.

Performance

Row-level security is implemented as a hidden where clause, that is visible in
the explain statement:

slackapp=> explain select * from orders;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on orders  (cost=0.00..222.62 rows=4 width=72)
   Filter: ((tenant_id)::text = (get_tenant())::text)
(2 rows)

This hidden clause needs to be taken into account when designing queries and indexes.

Discuss...