Stable functions do not seem to be honored in RLS in basic form... #9311

GaryAustin1 · 2022-10-04T01:03:59Z

GaryAustin1
Oct 4, 2022
Maintainer

I was starting to do some testing on RLS optimization and there appears to be a big issue with using a stable function in RLS.

I've limited the example here to using auth.uid(), a stable sql function.

To test I am running this basic template so that I can get explain on RLS in the SQL editor.

I have a table with 25000 rows and only one uuid match. RLS is (auth.uid() = user_id).

CREATE OR REPLACE FUNCTION run_explain()
  RETURNS SETOF text AS
$$
BEGIN
  alter policy "Test select RLS" on realtest
    to authenticated
    using (auth.uid() = user_id);

   set session role authenticated;
   set request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}';

   RETURN QUERY
   EXPLAIN ANALYZE 
    SELECT * FROM realtest;
END
$$ LANGUAGE plpgsql;

-- run select * from run_explain(); from another SQL window to see the results.

auth.uid() is called 25000 times apparently.

After verifying stable functions work if added to the actual query in a where clause, I wondered if not having a where query in the RLS policy was causing the optimizer to not do its thing....

So I changed the policy to:

using (exists(select true where auth.uid() = user_id))
edit: using (non_index_user_id in (select auth.uid())) also works.

And I get this:

So over 10x improvement (on a very simple function... imagine querying another table in the function) by moving the stable function into a where in the policy.

I see the same results on my own stable plpgsql boolean function even when adding a hint at cost of 100000. The optimizer does not seem to honor stable in RLS policy with just a boolean function. As soon as I put it in the exists/where wrap it is honored. Interesting immutable type does work, but that is not useful in most cases. The same results occur using just a dummy RLS function doing a multiply and returning true.

I've done alot of searching on the web and only find examples of using explain for "fake RLS" testing where they just test the function itself in a where statement not as a real policy, which does work as expected.

Hopefully I'm missing something here, or my test environment is not reflective of calls coming in thru PostgREST.
Also with PostgREST 10 hopefully can do the same test thru the API.

Thoughts? @steve-chavez

Edit3:
This also "explains" why auth.role() did not work well. Running policy auth.role()='authenticated' is over 100msec here. Putting it in the exists/where wrapper it is 10msec. So it seems Postgres does not honor stable unless an index is involved (edit2 below) or in a fake where statement....

Edit2:
With an index on user_id, the optimizer seems to run the auth.uid() function just once and gets sub millisecond result. Not sure if it is because it is an SQL function or not. I need to test with a function doing a select on another table and plpgsql to see if indexing helps. But for sure it looks like you must index any column you are testing auth.uid() on for best performance.

Edit4:
LOL. If I create a column call dummy_boolean and populate with true. Add an index and do a boolean stable function on it.
my_function() = dummy_boolean. It goes from 193msec to 15msec....

Edit1:
With my boolean true or false stable plpgsql function I confirmed the multiple calls on it's own, versus 1 single call in the exists/where wrap.

GaryAustin1 · 2022-10-05T14:59:24Z

GaryAustin1
Oct 5, 2022
Maintainer Author

I've found work arounds for many cases in testing, but at the moment this fairly common RLS I can't get to optimize...

auth.uid() = indexed_column or is_admin() is_admin is a binary function doing a math operation and returning true in the test case.
auth.uid() = indexed_column or (true in (select is_admin()) does not help in this case.

putting two policies on select does not help:
auth.uid() = indexed_column and a second policy
true in (is_admin())

All bad cases are 200 msec or more versus <1 msec for auth.uid() = indexed_column and 10msec for true in (is_admin()). (remember just is_admin() is 100msec with out the surround)

Edit: this could have serious impacts to storage RLS policies as they are all on the same table...

So far the only thing that seems to work is this:
auth.uid()=indexed_column or dummy_indexed_boolean_column_true = is_admin()
Basically have an indexed boolean column which is all true. Then we go from 200msec down to 15msec.
No combination of stable, leakproof declarations helps except immutable.

Edit2:
Adding auth.uid() about as well written as you are going to get, SQL stable function returning a single value, does not work well at the top level.
auth.uid() = 'a uuid' is 120msec versus exists( select true where auth.uid() = 'a uuid')` which is 12msec. (running thru 25000 rows with a constant check, versus with a function being evaluated each time.

Even worse on auth.uid = 'a uuid', if you add a filter where auth.uid() = user_id to the query itself, the time goes to 220msec to return 1 row. I would have hoped the optimizer would have run the where first and then just checked 1 row.

0 replies

GaryAustin1 · 2022-10-06T15:57:16Z

GaryAustin1
Oct 6, 2022
Maintainer Author

Some initial results are interesting....

I've also run a quick test with an RLS that takes 900msec on my SQL editor test to return 1 record. Running thru supabase.js the response takes 2 seconds. So the testing seems valid.

5 replies

steve-chavez Oct 6, 2022
Maintainer

Hey Gary, sorry for the late reply.

The auth.uid function has some logic in it to maintain backwards compatibility.

I was wondering, if you use (current_setting('request.jwt.claims', true)::jsonb ->> 'sub')::uuid directly in your policy tests does it change anything?

GaryAustin1 Oct 6, 2022
Maintainer Author

Not really.

Just ran 4 cases and maybe a slight difference if used inside another function (probably removing one layer of function call).
It just seems RLS voids any knowledge of stable inlineable functions (I assume current_setting() is also a function).

If I put the functions in the main select where statement the functions only run once as expected.

Like I said, without using explain, I did run a couple of worst case runs with javascript-js/PostgREST and things got worse (just looking at response time in the seconds) but I believe those queries are more complex than the basic ones I'm using. The main issue is stable functions seem to run every time unless tricked with surrounding logic and even then not 100%.

Really struggling with how to put together some guidelines on RLS when it is this messy. Only consistent thing is index helps alot if it is the only column involved and surround functions with a bit of logic...

Oh and advice I've seen not to use a filter if the RLS handles it (like for user_id), is certainly not valid in many cases. In fact that may be the biggest take away. Use filters to narrow down (even if not secure) and then let RLS be the backstop on those rows. Won't cover every case, but alot. Oh, and that will not help storage or realtime.

steve-chavez Oct 22, 2022
Maintainer

I think the solution for this would be PostgreSQL "schema variables", with that we could avoid using current_setting.

References:

steve-chavez Oct 25, 2022
Maintainer

@GaryAustin1 One idea here, have you tried changing the current_setting function to be leakproof?

select proleakproof from pg_proc where proname = 'current_setting';
 proleakproof
--------------
 f
 f
(2 rows)

alter function current_setting ( text, boolean) leakproof;

select proleakproof from pg_proc where proname = 'current_setting';
 proleakproof
--------------
 f
 t
(2 rows)

Not sure why the current_setting(text, boolean) is marked as non-leakproof since the following

For example, a function which throws an error message for some argument values but not others, or which includes the argument values in any error message, is not leakproof.

Doesn't apply here because the missing_ok parameter(the boolean arg) enforces no error is thrown.

GaryAustin1 Oct 25, 2022
Maintainer Author

I’ll try it when I get back to Texas tomorrow. I assumed it was a system function and could not change it on Supabase.

michelp · 2022-10-06T23:58:48Z

michelp
Oct 6, 2022

Hi @GaryAustin1 , this is some great investigative work:

t just seems RLS voids any knowledge of stable inlineable functions (I assume current_setting() is also a function).

This is unfortunately the case, and you've dug up a lot of interesting examples of this. Do you mind pg_dumping the test database you are using and zipping it up to michel @ supabase . io? We are working on some tooling to simply the writing of "optimal" RLS policies and I'd like to use some of your examples as a basis. It may also help with any workarounds we can suggest for you.

0 replies

GaryAustin1 · 2022-10-26T00:06:24Z

GaryAustin1
Oct 26, 2022
Maintainer Author

Steve, I’ll check the explain wrapper again, but I ran a couple of tests going thru the Rest API with no explain and saw huge differences that were repeatable based on the RLS tests and clearly not random network delay just to confirm it was not the test code.

…

Sent from my iPhone

On Oct 25, 2022, at 5:00 PM, Steve Chavez ***@***.***> wrote: I think the run_explain plpgsql function wrapper is playing tricks on us. I can somewhat reproduce but when I get the EXPLAIN outside the run_explain I don't get the perf issue. insert into realtest(id, name, user_id, jsonb_col) select x, 'name-' || x, uuid_generate_v4(), jsonb_build_object('key', x) from generate_series(1, 25000) x; CREATE OR REPLACE FUNCTION run_explain() RETURNS SETOF text AS $$ BEGIN set local role authenticated; set local request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}'; RETURN QUERY EXPLAIN ANALYZE SELECT * FROM realtest WHERE user_id = ((current_setting('request.jwt.claims', true))::jsonb->>'sub')::uuid; END $$ LANGUAGE plpgsql; select * from run_explain(); explain analyze SELECT * FROM realtest WHERE user_id = auth.uid(); run_explain Seq Scan on realtest (cost=0.00..996.50 rows=1 width=62) (actual time=27.525..27.526 rows=0 loops=1) Filter: (user_id = (NULLIF(COALESCE(current_setting('request.jwt.claim.sub'::text, true), ((current_setting('request.jwt.claims'::text, true))::jsonb ->> 'sub'::text)), ''::text))::uuid) Rows Removed by Filter: 25001 Planning Time: 1.175 ms Execution Time: 27.639 ms — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

1 reply

steve-chavez Oct 26, 2022
Maintainer

@GaryAustin1 If you could provide me a reproducible example of those RLS policies that would be great.

What I've been seeing is that not only current_setting is the issue but also the ::jsonb and ::uuid casting inside auth.uid function because those are not LEAKPROOF as well.

select proleakproof from pg_proc where proname in ('uuid_in', 'jsonb_in');

proleakproof
false
false

Edit: I might have a workaround for this in PostgREST but I need to be sure I'm not barking at the wrong tree.

steve-chavez · 2022-10-26T00:10:43Z

steve-chavez
Oct 26, 2022
Maintainer

I think the run_explain plpgsql function wrapper is playing tricks on us.

I can somewhat reproduce but when I get the EXPLAIN outside the run_explain I don't get the perf issue.

create table realtest as
select x as id, 'name-' || x as name, uuid_generate_v4() as user_id, jsonb_build_object('key', x) as jsonb_col
from generate_series(1, 25000) x;

CREATE OR REPLACE FUNCTION run_explain()
  RETURNS SETOF text AS
$$
BEGIN
 set local role authenticated;
 set local request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}';

 RETURN QUERY
 EXPLAIN ANALYZE
 SELECT * FROM realtest WHERE user_id = auth.uid();
END
$$ LANGUAGE plpgsql;

select * from run_explain();

run_explain
Seq Scan on realtest (cost=0.00..996.50 rows=1 width=62) (actual time=118.094..118.096 rows=1 loops=1)
Filter: (user_id = (NULLIF(COALESCE(current_setting('request.jwt.claim.sub'::text, true), ((current_setting('request.jwt.claims'::text, true))::jsonb ->> 'sub'::text)), ''::text))::uuid)
Rows Removed by Filter: 25000
Planning Time: 0.926 ms
Execution Time: 118.147 ms

explain analyze SELECT * FROM realtest WHERE user_id = auth.uid();

QUERY PLAN
Seq Scan on realtest (cost=0.00..996.50 rows=1 width=62) (actual time=22.765..22.766 rows=0 loops=1)
Filter: (user_id = (NULLIF(COALESCE(current_setting('request.jwt.claim.sub'::text, true), ((current_setting('request.jwt.claims'::text, true))::jsonb ->> 'sub'::text)), ''::text))::uuid)
Rows Removed by Filter: 25001
Planning Time: 1.179 ms
Execution Time: 22.898 ms

Perhaps this is related to plpgsql functions? Seems similar to this previous discussion #8733

7 replies

GaryAustin1 Oct 27, 2022
Maintainer Author

I came up with another way to verify the PLPGSQL function is not causing an issue...

CREATE OR REPLACE FUNCTION my_test_function()
  RETURNS record AS
$$
  set session role authenticated;
  set request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}';
  select * from realtest where auth.uid() = non_index_user_id;
$$ language SQL;

then run
explain analyze select my_test_function()

for set request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}';
you get:

| Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)       
| Planning Time: 0.179 ms                                                                        
| Execution Time: 118.199 ms

dropping the set request.jwt.claims yields:

| Result  (cost=0.00..0.26 rows=1 width=32) (actual time=27.924..27.924 rows=1 loops=1)
| Planning Time: 0.181 ms                                                              
| Execution Time: 28.045 ms

and where non_index_user_id = '5950b438-b07c-4012-8190-6ce79e4bd8e5' yields:

| Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)  
| Planning Time: 0.213 ms                                                                    
| Execution Time: 9.616 ms

So agrees with PLPGSQL function results with a bit of overhead and that the null versus string test was the difference you saw.

So, I'll get back to trying to look at some of your earlier suggestions and building a simpler test suite.

On a side note, I believe we will lose the ability to do this testing around the 5th of November when the SQL editor owner changes to postgres user. set session role authenticated will not work for that user. I've flagged this in the discussion about the change.

GaryAustin1 Oct 28, 2022
Maintainer Author

Running this in RLS:
(user_id = (((current_setting('request.jwt.claims', true))::jsonb ->> 'sub'))::uuid)
With LEAKPROOF set on current_setting, jsonb_in,uuid_in does not seem to matter, get 115msec
Change current_setting from STABLE to IMMUTABLE drops to 7msec...

| proname         | pronargs | provolatile | procost | proleakproof |
| --------------- | -------- | ----------- | ------- | ------------ |
| current_setting | 1        | s           | 1       | false        |
| current_setting | 2        | i           | 100000  | true         |
| jsonb_in        | 1        | i           | 1       | true         |
| uuid_in         | 1        | i           | 1       | true         |

Also running this (note setting is not set here):
explain analyze select * from realtest where (user_id = (((current_setting('request.jwt.claims', true))::jsonb ->> 'sub'))::uuid)
With STABLE is 16msec, with IMMUTABLE is .1msec (which is the same as col = null)

Edit: And for what it's worth making auth.uid() IMMUTABLE goes from 115 to 7msec also (with current_setting IMMUTABLE).

I guess I really don't understand IMMUTABLE...
With auth.uid() set to IMMUTABLE this returns the 2nd sub, which is good, but I sort of expected it not to...

set request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}';
perform auth.uid();
set request.jwt.claims to '{"role":"authenticated", "sub":"5250b438-b07c-4012-8190-6ce79e4bd8e5"}';
return auth.uid();

GaryAustin1 Oct 30, 2022
Maintainer Author

As a further check I used this to verify the timings and get similar results...

CREATE OR REPLACE function logtime(text) RETURNS timestamp AS
$$
DECLARE
    time timestamp;
BEGIN
  time = clock_timestamp();
  RAISE LOG '% - % ', $1,time;
  RETURN time;
END;
$$ language plpgsql;

select logtime('start');
set SESSION role authenticated;
set request.jwt.claims to '{"role":"authenticated","sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}';
select * from realtest;
select logtime('end');

With RLS of uid() = user_id I get 118msec

With anon for the role I get less than 1msec

And even RLS of user_id IN ( SELECT uid())) yields 17msec

The last test agrees with all my other testing and is odd... wrapping a function in some SQL seems to speed it up...

steve-chavez Nov 1, 2022
Maintainer

The last test agrees with all my other testing and is odd... wrapping a function in some SQL seems to speed it up...

@GaryAustin1 Could you try with this function?

create or replace function auth.uid_matches(user_id uuid)
returns bool
language sql stable
as $$
  select user_id =
  nullif(
    coalesce(
      current_setting('request.jwt.claim.sub', true),
      (current_setting('request.jwt.claims', true)::jsonb ->> 'sub')
    ),
    ''
  )::uuid
$$;

So instead of

SELECT * FROM realtest WHERE user_id = auth.uid();

You'd do

SELECT * FROM realtest WHERE auth.uid_matches(user_id);

GaryAustin1 Nov 1, 2022
Maintainer Author

No difference from auth.uid() for both the WHERE case and using it in the RLS at about 115msec. Tested both with my original explain function and the timer method.
In RLS wrapping it like: (EXISTS ( SELECT true WHERE uid_matches(user_id))) goes to 15msec the same as wrapping auth.uid(). Note that does not work in the WHERE clause directly without RLS for both cases...

GaryAustin1 · 2022-11-02T00:45:17Z

GaryAustin1
Nov 2, 2022
Maintainer Author

@steve-chavez
Something different...
If I change the setting to just a string:
current_setting('request.myuuid',true) = user_id::text I get 30 msec instead of 115.
If I change the column to text instead of uuid I get 20msec.

So still looks like being called 25,000 times, but getting rid of the jsonb and converter processing makes a relatively big difference.

Note I get the 30msec result also if I put just the one line in your uid_matches function also.

3 replies

steve-chavez Nov 2, 2022
Maintainer

Yeah, I noticed the same as well when posting #9311 (reply in thread). The jsonb casting is expensive.

It could be possible to go back to having non-jsonb settings and just do current_setting('request.jwt.claims.sub', true)::uuid but it's going to be a painful breaking change for users, maybe it'd be better to wait for "schema variables" which might release soon? That will also be a breaking change anyways.

Also until now I'm not clear if this is something that only affects plpgsql functions. If that's the case, we could mitigate this issue with a guide that could say something like "it's recommended to cache your auth.uid() call inside a variable".

GaryAustin1 Nov 2, 2022
Maintainer Author

I've mainly been thinking about the RLS aspect of using a function for things like auth.uid() in RLS with no filter in the API request. That is mitigated for sure if you index the column or you narrow the rows with a filter that is then backed up with RLS for security.

But there are cases like is_admin() or is_member() that will run on every row and are not compared against a column and maybe access other tables. Trying to chase all the rows using index on them I guess is one way to cut it down.

I've seen no difference really in sql or plpgsql. Immutable is the only thing that really changes it, even setting high costs don't seem to matter.
Sometimes "hiding" the function in an SQL statement helps. For sure if you have a plpgsql function with a loop or a select in it using a variable is likely a big win.

I'll try and put together some notes on ways to mitigate as a few things seem to help. Then it is just a matter of showing how to test.

edit: Oh on

The jsonb casting is expensive.

It is but I think the bigger hit is the jsonb ->> filter.

GaryAustin1 Nov 2, 2022
Maintainer Author

Ha, had to add this....
where POSITION(user_id::text in current_setting('request.jwt.claims',true)) >0 is.... 40msec versus 115 for ::jsonb->>sub.
(although my jwt is short...)

steve-chavez · 2023-01-20T04:47:25Z

steve-chavez
Jan 20, 2023
Maintainer

@GaryAustin1 There's a workaround to avoid the JSON parsing and the subsequent json arrow operator ->>. This was reported to work and grant ~3x speed up for some queries.

Basically it uses a PostgREST pre-request function, to convert the JSON setting to a shorter text setting.

create or replace function pre_request()
returns void as $$
  select
    set_config(
      'request.jwt.claims.sub'
    , (current_setting('request.jwt.claims',true))::json->>'sub'
    , true
    );
$$ language sql;

alter role authenticator set pgrst.db_pre_request = 'pre_request';

notify pgrst, 'reload config';

Then we can use:

current_setting('request.jwt.claims.sub', true)
-- instead of (current_setting('request.jwt.claims',true))::json->>'sub'

On RLS policies or any other logic.

This can also work on more complex json in the claims. For a customer it was used like this:

create or replace function pre_request()
returns void as $$
  select
    set_config(
      'request.jwt.claims.app_metadata.workspace'
    , (current_setting('request.jwt.claims',true))::json->'app_metadata'->>'workspace'
    , true
    );
$$ language sql;

-- current_setting('request.jwt.claims.app_metadata.workspace', true) is then available, it can be wrapped by a stable function too

3 replies

GaryAustin1 Jan 20, 2023
Maintainer Author

Thanks.
I'll note it for now, but can't test it (I don't do local dev just hosted instances) and this no longer works:
alter role authenticator set pgrst.db_pre_request = 'pre_request';
with the new "security" patch.

Hopefully some progress gets made to allow some options as this really limits RLS testing both with SQL and with PostgREST thru the API as you can't turn it on.

tonyxiao Apr 3, 2023

Why is the is_local param for set_config set to true? is everything guaranteed to all be happening in a single transaction?

steve-chavez Apr 4, 2023
Maintainer

Yes, pre-request is transaction scoped, it will get executed on every request.

steve-chavez · 2023-04-22T00:12:23Z

steve-chavez
Apr 22, 2023
Maintainer

Just found out a solution to this without the need for pre-request. Basically we can force only ONE execution of the current_setting with a MATERIALIZED CTE.

Essentially, from this:

create policy "Test select RLS" on realtest
to authenticated
using (auth.uid() = user_id);

To this:

create policy "Test select RLS" on realtest
to authenticated
using (
  (
    with cached as materialized(
       select auth.uid() as val
    )
    select user_id = val from cached
  )
);

There's a reproducible example on https://github.com/PostgREST/postgrest/issues/2590.

cc @GaryAustin1, @tonyxiao

16 replies

steve-chavez Apr 26, 2023
Maintainer

The limiting case is though it must be constant, no passing column info from the row being checked by RLS.

Makes sense, because then it would need to be evaluated per-row anyway. SDFs in policies are mostly used to avoid the infinite recursion problem, like on https://supabase.com/docs/guides/auth/row-level-security#policies-with-security-definer-functions, thus they should be functions without parameters in most cases.

tonyxiao Apr 26, 2023

@GaryAustin1 did you test the initPlan approach here? Curious if that works better PostgREST/postgrest-docs#609 (comment)

GaryAustin1 Apr 26, 2023
Maintainer Author

@tonyxiao not yet. I just now see it. I’ll run it in a bit on the exact same data.

GaryAustin1 Apr 26, 2023
Maintainer Author

I get the same result in my last run between the initPlan way and the CTE way. 4msec, 25k rows.

So based on one test they seem to do the same thing (set a fixed value to compare against each row).

using (
(select is_admin_sql())
  );

So I think I was doing that initPlan without knowing what it was in my first tests...
You see cases in my original table using (exists(select true where auth.uid() = user_id)) type things to wrap the test. These made a huge difference and I believe is doing the same thing as initPlan.

GaryAustin1 Apr 26, 2023
Maintainer Author

I did just run a 2nd test of the original auth.uid() = user_id 25K rows, and get
43, 13, 3 for original CTE, init with non_indexed user_id
.07, 13, .13 for orginal, CTE, init with indexed user_id

So the initplan way is faster in this case.
If you have an indexed column it seems to take advantage of it, but still slower than just the indexed column test.

GaryAustin1 · 2023-04-26T18:05:45Z

GaryAustin1
Apr 26, 2023
Maintainer Author

Did some more testing with a more realistic policy: is_admin() OR auth.uid() = user_id.
is_admin is shown in the code but is a SD and checks role column in this table for admin.

100K selects, no difference between indexed and not. Offset is adding limit 10 offset 90000 but only for admin as user gets 1 row.

role	normal	wrapped	CTE
user	1900	10.3	53
admin	1900	18.5	18.5
offset 90k	1900	48	48

EDIT: * adding where user_id = '79530fa3-2e6a-4c26-9356-cecff8148d46' to the query takes the time down to... 0.2msec from 10 and 53 for a user query.

Code:

drop table if exists rlstest;
create table
  rlstest as
  select x as id, 'name-' || x as name, uuid_generate_v4() as user_id, uuid_generate_v4() as no_index_user_id, 'user' as role
from generate_series(1, 100000) x;

create index userid on rlstest using btree (user_id) tablespace pg_default;

update rlstest set (user_id,no_index_user_id,role) = ('5950b438-b07c-4012-8190-6ce79e4bd8e5','5950b438-b07c-4012-8190-6ce79e4bd8e5','admin') where id = 1;
update rlstest set (user_id,no_index_user_id,role) = ('79530fa3-2e6a-4c26-9356-cecff8148d46','79530fa3-2e6a-4c26-9356-cecff8148d46','user') where id = 2;

CREATE OR REPLACE FUNCTION is_admin()
  RETURNS boolean AS
$$
BEGIN
  return 'admin' = (select role from rlstest where user_id = auth.uid());

END
$$ LANGUAGE plpgsql SECURITY DEFINER;

------------ Change RLS here 

alter table rlstest ENABLE ROW LEVEL SECURITY;

create policy "rls_test_select" on rlstest
    to authenticated
    using (
    
--is_admin() OR auth.uid() = user_id     
--is_admin() OR auth.uid() = no_index_user_id

(select is_admin()) OR user_id =  (select auth.uid())
--(select is_admin()) OR no_index_user_id = (select auth.uid())
/*
(
  with cached as materialized(select is_admin() as val)
  select val from cached
)
OR
(
  with cached as materialized(select auth.uid() as val)
  select user_id = val from cached
)
*/
/*
(
  with cached as materialized(select is_admin() as val)
  select val from cached
)
OR
(
  with cached as materialized(select auth.uid() as val)
  select no_index_user_id = val from cached
)
*/
);

---------------
    
set local role authenticated;
--set request.jwt.claims to '{"role":"authenticated", "sub":"5950b438-b07c-4012-8190-6ce79e4bd8e5"}'; --admin
set request.jwt.claims to '{"sub":"79530fa3-2e6a-4c26-9356-cecff8148d46"}'; -- not admin

explain analyze SELECT count(*) FROM rlstest;
--select count(*) from rlstest
--select * from rlstest order by id limit 10 offset 90000;
--explain analyze select * from rlstest order by id limit 10 offset 90000;

4 replies

GaryAustin1 Apr 27, 2023
Maintainer Author

So nothing is easy. You still win big time if you have an indexed column you can filter on (even if not trusted like userid). The RLS will still protect, but the where cuts down things to just minimal. Not sure if this translates to PostgREST queries or not.

Adding where user_id = '79530fa3-2e6a-4c26-9356-cecff8148d46' to the query takes the time down to... 0.2msec from 10 and 53 for a user query.

tonyxiao Apr 27, 2023

This is awesome! thanks so much for the further research here. I'm probably gonna work on our RLS policy again today, and excited to put it into practice.

GaryAustin1 May 1, 2023
Maintainer Author

Ran an update explain analyze update rlstest set name = 'Gary' where user_id = '79530fa3-2e6a-4c26-9356-cecff8148d46'; with both user_id and no_index_user_id in the where clause.

Also used both index RLS and non-index RLS tests in the lastest code above (this is the code with both is_admin() and the auth.uid() = user_id in the RLS).. 100K rows.
Index in the where or wrapping (initPlan) the RLS functions separately don't help, both are needed for best performance (although wrapping is a bigger win over index in this case).

	Normal	Wrapped
index RLS , where index	290	0.5
index RLS, where no-index	290	10
non-index RLS, where index	290	0.5
non-index RLS, where no-index	290	10

GaryAustin1 May 31, 2023
Maintainer Author

https://github.com/GaryAustin1/RLS-Perfomance/blob/main/README.md
Results of testing various methods discussed here. Still some debate on leakproof versus wrapping the functions...

cipinistor · 2023-12-19T10:41:41Z

cipinistor
Dec 19, 2023

Unsure if this belongs here or in PostgREST/postgrest-docs#609 because it is related to both.

RLS with lower(column) index performs poorly (wrapped or not!) because optimizer simply refuses to use the index.
Without RLS, the index is used and query performs well.
If I change the index to not have lower, then the index is used and query performs well, so RLS doesn't affect performance.

0 replies

khera · 2024-05-31T21:47:45Z

khera
May 31, 2024

Reviving this old thread... I am building out a new project and I wanted to verify I can handle large tables to find projects my user is allowed to see. My observations show me that if my function is STABLE it is called exactly twice in a query no matter the number of rows (once for the planner and once for the execution), and indexes are used as per explain.

I therefore disagree with your conclusion that STABLE does not matter, and that indexes are not used on the RLS policy.

Background:

Users are called "makers", and there's a table of their metadata populated by trigger on create into auth.users. Each maker can have multiple configurations called personalities, and projects are linked to each personality. For this test, the relevant bits are the permissions mapping table and the projects table. These are:

CREATE TABLE maker_personality_user_role (
    user_id UUID NOT NULL REFERENCES user_metadata (user_id) ON DELETE CASCADE,
    maker_personality_id UUID NOT NULL REFERENCES maker_personality (maker_personality_id) ON DELETE CASCADE,
    permission_role user_role_type NOT NULL,

    PRIMARY KEY (user_id,maker_personality_id) -- one per user/team
);

CREATE TABLE project (
    project_id UUID NOT NULL DEFAULT gen_random_uuid(),
    maker_personality_id UUID NOT NULL REFERENCES maker_personality (maker_personality_id) ON DELETE CASCADE,
    created_time TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
   ... more fields...
    PRIMARY KEY (project_id),
    UNIQUE(maker_personality_id,project_name)
);

-- function to select what I'm allowed to access
CREATE OR REPLACE FUNCTION teams_i_can(min_role user_role_type) RETURNS UUID[] AS $$
 BEGIN
    RAISE NOTICE 'teams_i_can';
    RETURN ARRAY(SELECT maker_personality_id FROM maker_personality_user_role WHERE user_id=(auth.jwt()->>'sub')::UUID AND permission_role >= min_role);
 END
$$ LANGUAGE plpgsql STABLE PARALLEL SAFE SECURITY DEFINER SET search_path = public;

CREATE POLICY "view own maker_user_permission" ON maker_personality_user_role
    FOR SELECT TO authenticated USING ( (auth.jwt()->>'sub')::UUID = user_id );

CREATE POLICY "view own project" ON project
    FOR SELECT TO authenticated USING ( maker_personality_id = ANY (teams_i_can('view')) );

So what I did was create 10000 new users, and 11 projects per, with each user having permissions to their own projects. One randomly selected user was given permission to ⅓ of all projects, so there was a large list to pick as well. This leaves us with about 146k rows in the permissions role table.

Here's what I see when selecting all projects:

postgres=*> explain analyze select count(*) from project;
                                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.39..8.40 rows=1 width=8) (actual time=65.263..65.263 rows=1 loops=1)
   ->  Index Only Scan using project_maker_personality_id_project_name_key on project  (cost=0.38..8.39 rows=1 width=0) (actual time=0.123..57.971 rows=110000 loops=1)
         Heap Fetches: 110000
 Planning Time: 0.264 ms
 Execution Time: 65.359 ms
(5 rows)

Time: 67.082 ms
postgres=*> CALL auth_login_as_user('perm-perf-test@' || :'single_user_id');
NOTICE:  Set role authenticated and logging in as 'f2f625f3-aceb-45ad-b0c1-2262b2d1855f' ('perm-perf-test@f2f625f3-aceb-45ad-b0c1-2262b2d1855f')
CALL
Time: 2.485 ms
postgres=*> explain analyze select count(*) from project;
NOTICE:  teams_i_can
NOTICE:  teams_i_can
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48.09..48.10 rows=1 width=8) (actual time=0.523..0.524 rows=1 loops=1)
   ->  Index Only Scan using project_maker_personality_id_project_name_key on project  (cost=0.62..48.09 rows=1 width=0) (actual time=0.411..0.514 rows=11 loops=1)
         Index Cond: (maker_personality_id = ANY (teams_i_can('view'::user_role_type)))
         Heap Fetches: 11
 Planning Time: 1.360 ms
 Execution Time: 0.571 ms
(6 rows)

Time: 4.409 ms
postgres=*> CALL auth_logout();
CALL
Time: 0.376 ms
postgres=*> CALL auth_login_as_user('perm-perf-test@' || :'multi_user_id');
NOTICE:  Set role authenticated and logging in as 'c8d04fe9-42a6-4304-9829-e63f638c4850' ('perm-perf-test@c8d04fe9-42a6-4304-9829-e63f638c4850')
CALL
Time: 4.269 ms
postgres=*> explain analyze select count(*) from project;
NOTICE:  teams_i_can
NOTICE:  teams_i_can
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48.09..48.10 rows=1 width=8) (actual time=41.737..41.737 rows=1 loops=1)
   ->  Index Only Scan using project_maker_personality_id_project_name_key on project  (cost=0.62..48.09 rows=1 width=0) (actual time=11.993..39.892 rows=36671 loops=1)
         Index Cond: (maker_personality_id = ANY (teams_i_can('view'::user_role_type)))
         Heap Fetches: 36671
 Planning Time: 35.695 ms
 Execution Time: 41.812 ms
(6 rows)

Time: 78.774 ms
postgres=*> CALL auth_logout();
CALL

Explanation of that is: first select found all 110000 records as it is running as the table owner. Next I simulate logging in as one of the 11 project owner users. You can clearly see it is only calling the teams_i_can function twice, and matched 11 records. The third query I logged in as the user with ⅓ of the projects. Again, only two calls to the function and the index is used. Even if I select other fields so it cannot do an index only scan, it will still use the index.

postgres=*> explain analyze select distinct(created_time) from project;
NOTICE:  teams_i_can
NOTICE:  teams_i_can
                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=48.09..48.10 rows=1 width=8) (actual time=43.079..45.611 rows=1 loops=1)
   ->  Sort  (cost=48.09..48.10 rows=1 width=8) (actual time=43.078..44.292 rows=36671 loops=1)
         Sort Key: created_time
         Sort Method: quicksort  Memory: 1537kB
         ->  Index Scan using project_maker_personality_id_project_name_key on project  (cost=0.62..48.09 rows=1 width=8) (actual time=10.933..41.396 rows=36671 loops=1)
               Index Cond: (maker_personality_id = ANY (teams_i_can('view'::user_role_type)))
 Planning Time: 28.385 ms
 Execution Time: 45.676 ms
(8 rows)

Now, if I change the STABLE to VOLATILE, the function is evaluated for every row it looks at, and no index is used. (I had to cheat this down to 100 test users otherwise it takes forever to printout 110000 lines of NOTICEs. The index/seq scan still holds, though).

Side note: I notice in recent versions that auth.jwt() is now stable also. Before it was volatile, which made no sense to me. Maybe that was the problem when you ran your tests?

postgres=*> CALL auth_login_as_user('perm-perf-test@' || :'multi_user_id');
NOTICE:  Set role authenticated and logging in as '96869103-1fc5-487a-82b9-f7c4e16da1bc' ('perm-perf-test@96869103-1fc5-487a-82b9-f7c4e16da1bc')
CALL
Time: 6.776 ms
postgres=*> explain analyze select distinct(created_time) from project;
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
NOTICE:  teams_i_can
 ... hundreds of these lines removed ...
NOTICE:  teams_i_can

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1375.01..1375.02 rows=1 width=8) (actual time=3431.541..3431.565 rows=1 loops=1)
   ->  Sort  (cost=1375.01..1375.02 rows=1 width=8) (actual time=3431.540..3431.552 rows=372 loops=1)
         Sort Key: created_time
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on project  (cost=0.00..1375.00 rows=1 width=8) (actual time=11.467..3431.311 rows=372 loops=1)
               Filter: (maker_personality_id = ANY (teams_i_can('view'::user_role_type)))
               Rows Removed by Filter: 728
 Planning Time: 0.240 ms
 Execution Time: 3431.675 ms
(9 rows)

Time: 3433.138 ms (00:03.433)

And for completeness, my version of Supabase running locally on my Mac M1 laptop:

postgres=*> select version();
                                                                 version
-----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.1 (Ubuntu 15.1-1.pgdg20.04+1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

12 replies

GaryAustin1 Jun 10, 2024
Maintainer Author

For RLS stable is not good enough. You still have to wrap it in a select.

vickkhera Jun 10, 2024

My demonstration shows otherwise, though. Even without wrapping my RLS query into a SELECT it uses the index and only runs the function once per query.

GaryAustin1 Jun 10, 2024
Maintainer Author

It is possible because you have an index that is impacting the way the optimizer decides to go. My tests do not assume you have an index for this particular case.

GaryAustin1 Jun 10, 2024
Maintainer Author

Also the tests I showed a couple of days ago are using the an auth.uid() that is stable. I believe the tests back when I did the initial post were also, but I'd have to dig back and see. Stable is not enough in all cases to run once.

Stable functions do not seem to be honored in RLS in basic form... #9311

GaryAustin1 Oct 4, 2022 Maintainer

Replies: 11 comments · 51 replies

GaryAustin1 Oct 5, 2022 Maintainer Author

GaryAustin1 Oct 6, 2022 Maintainer Author

steve-chavez Oct 6, 2022 Maintainer

GaryAustin1 Oct 6, 2022 Maintainer Author

steve-chavez Oct 22, 2022 Maintainer

steve-chavez Oct 25, 2022 Maintainer

GaryAustin1 Oct 25, 2022 Maintainer Author

GaryAustin1 Oct 26, 2022 Maintainer Author

steve-chavez Oct 26, 2022 Maintainer

steve-chavez Oct 26, 2022 Maintainer

GaryAustin1 Oct 27, 2022 Maintainer Author

GaryAustin1 Oct 28, 2022 Maintainer Author

GaryAustin1 Oct 30, 2022 Maintainer Author

steve-chavez Nov 1, 2022 Maintainer

GaryAustin1 Nov 1, 2022 Maintainer Author

GaryAustin1 Nov 2, 2022 Maintainer Author

steve-chavez Nov 2, 2022 Maintainer

GaryAustin1 Nov 2, 2022 Maintainer Author

GaryAustin1 Nov 2, 2022 Maintainer Author

steve-chavez Jan 20, 2023 Maintainer

GaryAustin1 Jan 20, 2023 Maintainer Author

steve-chavez Apr 4, 2023 Maintainer

steve-chavez Apr 22, 2023 Maintainer

steve-chavez Apr 26, 2023 Maintainer

GaryAustin1 Apr 26, 2023 Maintainer Author

GaryAustin1 Apr 26, 2023 Maintainer Author

GaryAustin1 Apr 26, 2023 Maintainer Author

GaryAustin1 Apr 26, 2023 Maintainer Author

GaryAustin1 Apr 27, 2023 Maintainer Author

GaryAustin1 May 1, 2023 Maintainer Author

GaryAustin1 May 31, 2023 Maintainer Author

GaryAustin1 Jun 10, 2024 Maintainer Author

GaryAustin1 Jun 10, 2024 Maintainer Author

GaryAustin1 Jun 10, 2024 Maintainer Author

GaryAustin1 Jun 10, 2024 Maintainer Author

GaryAustin1
Oct 4, 2022
Maintainer

Replies: 11 comments 51 replies

GaryAustin1
Oct 5, 2022
Maintainer Author

GaryAustin1
Oct 6, 2022
Maintainer Author

steve-chavez Oct 6, 2022
Maintainer

GaryAustin1 Oct 6, 2022
Maintainer Author

steve-chavez Oct 22, 2022
Maintainer

steve-chavez Oct 25, 2022
Maintainer

GaryAustin1 Oct 25, 2022
Maintainer Author

GaryAustin1
Oct 26, 2022
Maintainer Author

steve-chavez Oct 26, 2022
Maintainer

steve-chavez
Oct 26, 2022
Maintainer

GaryAustin1 Oct 27, 2022
Maintainer Author

GaryAustin1 Oct 28, 2022
Maintainer Author

GaryAustin1 Oct 30, 2022
Maintainer Author

steve-chavez Nov 1, 2022
Maintainer

GaryAustin1 Nov 1, 2022
Maintainer Author

GaryAustin1
Nov 2, 2022
Maintainer Author

steve-chavez Nov 2, 2022
Maintainer

GaryAustin1 Nov 2, 2022
Maintainer Author

GaryAustin1 Nov 2, 2022
Maintainer Author

steve-chavez
Jan 20, 2023
Maintainer

GaryAustin1 Jan 20, 2023
Maintainer Author

steve-chavez Apr 4, 2023
Maintainer

steve-chavez
Apr 22, 2023
Maintainer

steve-chavez Apr 26, 2023
Maintainer

GaryAustin1 Apr 26, 2023
Maintainer Author

GaryAustin1 Apr 26, 2023
Maintainer Author

GaryAustin1 Apr 26, 2023
Maintainer Author

GaryAustin1
Apr 26, 2023
Maintainer Author

GaryAustin1 Apr 27, 2023
Maintainer Author

GaryAustin1 May 1, 2023
Maintainer Author

GaryAustin1 May 31, 2023
Maintainer Author

GaryAustin1 Jun 10, 2024
Maintainer Author

GaryAustin1 Jun 10, 2024
Maintainer Author

GaryAustin1 Jun 10, 2024
Maintainer Author

GaryAustin1 Jun 10, 2024
Maintainer Author