Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

slightly off-topic, but related to what ory is doing in general. How do you usually do authorization-aware search?

Imagine, I have a bunch of Google docs and using https://github.com/ory/keto for authorization. I can quickly answer the question "does user X have access to document Y", but it is not easy to do "search all documents with word Hello in it, for which I have access" because access can be granted through nested groups (give read access to everyone in DepartmentA, and I am part of child department)




Probably via a Zanzibar-based system, excerpt from the Zanzibar paper [1]:

> [...] useful common infrastructure can be built on top of a unified access control system, in particular, a search index that respects access control and works across applications.

[1] https://authzed.com/zanzibar/28Sa8hWHLG:e:1I


> Probably via a Zanzibar-based system ... > ... a search index that respects access control

This is the exactly the part I want to understand. How are you modifying your search index, so that it respects the access control.

There are some ways I can think of, but want to learn more from others on how they are doing it:

* each object stores metadata of which access groups can access this data, at the search query time, first I fetch groups user belongs to and send it as part of search query

* fetch all matching objects and hope that list is not huge and for each item assess at run time if object can be accessed by this user, if not, remove from results

* ...

You either compute at query time, which might be costly or you pre-compute it at write time, but then you need to keep at least 2 data sources in sync objects (who can access can change on object level) and groups (group can get more permissions or less)


One approach that can be used is to use the centralized service to answer a broader question like: given this user, what rules can I use to know if a document is accessible for them. And have the service give you a set of rules to apply. Then take the result and embed those restrictions in your query.

An example access service response would be: this user can access data from groups they are part of + documents for which a share exists towards this user + documents for which a share exists to any of the users' groups.

Such an approach using OPA is described in https://blog.openpolicyagent.org/write-policy-in-opa-enforce....

This is not exactly the same as the first option you described, because instead of storing access controls in the index data, you use the available metadata + the rules from the access control service.


When I had to do this in the past for access control and compliance reasons, it was easy to just layer them. If you have a) fast search b) fast authorization, you can just do filter(lambda resource: can_read(resource, user), search(query)). There is some tuning necessary involved with pagination and such as you effectively have 2 paginations to maintain (one for user facing, one for your index which will included pruned resources).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: