Most engineers know how to try a database. Insert some data, try some queries, play with the tooling.
But authorization as a service is a new category of software that most engineers have never tried. What are the key features? What should you look out for? What’s a good approach to testing? What are you even supposed to test?
This is a guide on how to POC authorization as a service products, based on POCs from ProductBoard, Oyster HR, and hundreds of other companies. If you’re evaluating authorization as a service options, this post will give you a framework for running that evaluation.
Note: this guide is for B2B SaaS companies that want to solve for application authorization. If you’re trying to solve for infrastructure authorization, you might look at something like Open Policy Agent (OPA).
What to Assess
Here are the 3 most important areas to assess when evaluating authorization as a service options:
- Modeling - How you define the rules for who is allowed to do what in the application. For the rest of this post, we use the sample app GitCloud, a GitHub/GitLab clone. In GitCloud, an example rule is “Actors with the reader role on a particular repository can read that repository”
- Data - Even once the system has rules, it still needs authorization data to make any authorization decisions – e.g., “Willow has the reader role on repo 123.” That data has to live somewhere. If it's in the auth system, then you'll need to fetch it any time you need it in your application. If it's in your application database, then you'll need to synchronize it to the authorization system whenever you need to make an auth decision. This is an important consideration and can be a tricky balancing act.
- Enforcement - The third piece of the puzzle is enforcement: combining your rules and data at runtime and rendering decisions back to your application – e.g., “Yes, Willow can read repo 123.” You can also imagine rotating this question in 3D space to create other questions that applications need to answer all the time – e.g., return all the files of which Willow is an owner, or return all the permissions Willow has on repo 777 (to render a UI).
We recommend focusing on these areas, as they make up the core of the authorization problem itself, and thus any authorization solution must address them. Beyond these, organizations typically look at all the surrounding scaffolding too, like developer experience, ops, and maturity of the product.
We review each of these areas in detail below.
Authorization Modeling
The rules that describe who should be allowed to do what in your application make up your authorization model. Modeling is the process of building out all this logic. Every authorization as a service solution has its own approach for solving this problem.
Requirements
Ability to map application use cases requirements to authorization patterns and primitives
The first step in defining your authorization model is mapping the concepts in your application (e.g., “When a user is added to a team, they get access to view the contents of that team’s folders”) to authorization patterns that you can implement (e.g., “a user’s role grants them specific permissions on a resource”). Teams often do this by taking a set of requirements from a product manager, reading lots of documentation on authorization concepts, then writing a design doc with a proposed implementation for each requirement. An authorization solution that facilitates this process can cut weeks off the implementation time.
While this is a key step in the process of building authorization for your application, many frameworks leave this exercise to the reader. It’s worth understanding how you will figure this piece out – either with software, documentation, or on your own.
Questions to ask:
- What kind of tooling does the solution support to draw out the authorization patterns we need?
- What kind of documentation does the solution provide here?
Out of the box patterns and primitives for application requirements and use cases
Once you know what you need to model, you need to figure out how to model it. Here are some of the authorization patterns that B2B SaaS companies most commonly need:
(1) Role-based access control (RBAC)
- Multi-tenancy - Segregate users by organization or company
- Superusers - Give internal users full access across the app and all resources in it
- Custom roles - Let users create their own roles from a set of defined permissions
(2) Relationship-based access control (ReBAC)
- Org charts - Let users inherit permissions of those "below them" in a user hierarchy
- File/folder structures - Grant permissions to child resources based on permissions on the parent resource
- Groups - Put users into teams or other groups
- Impersonation - Let internal users impersonate customers for customer service
(3) Attribute-based access control (ABAC)
- Public/private resources - Let users mark resources (e.g., page, repo, folder) as public so anyone can access them
- Entitlements - Gate users' permissions by the features they pay for
Some authorization systems support only high level RBAC. Other frameworks, like those based on Zanzibar, are oriented around ReBAC.
It’s important to understand what kinds of patterns and primitives are natively supported so you have a sense for how opinionated the solution is. Having an opinionated solution can be helpful because it gives your team a running start and reduces the cognitive overhead of figuring out the implementation. But those opinions need to extend across all the patterns you need to satisfy your application’s requirements.
Questions to ask:
- What authorization patterns does the solution support natively? What kinds of abstractions does the solution provide to make it easier to implement these patterns?
- If a solution says it supports RBAC/ReBAC/ABAC, what flavors of those models does it support? These terms are widely known but unfortunately so high level that they leave a big gap between your application’s requirements and what you actually need to implement. For example, you may want to let a user approve a PR on a repo if they are a member of the team that owns the files in the PR and they have the editor role on the repo. This requirement combines elements of ReBAC (permissions on the PR are derived from the team that owns the repo and the files in the PR) and resource-specific roles / RBAC (assigning an editor role on a specific repo). For more on this topic, read about The 10 Types of Authorization.
- If a solution says it supports RBAC/ReBAC/ABAC, does that mean everything must be expressed in those terms? If so, that could wind up feeling like you’re trying to push a square peg through a round hole.
Ability to represent non-standard use cases without having to do gymnastics
Authorization models often start out simple and get more complex over time. So the common theme we hear from engineering teams is a desire for something that lets them put simple things in place quickly, while preserving the flexibility to support whatever complexity may be coming down the line.
For example, it’s pretty straightforward to model something like “Users can edit documents that they own,” but what happens when that evolves to “Users can edit documents that they own, or that belong to teams where they have been assigned the Editor role, unless those documents are archived or the user is blocked”?
It’s important to understand what work you need to do when the patterns and primitives don’t support one of your requirements.
Questions to ask:
- What’s the process for adding new authorization patterns to my model as my requirements evolve?
- What are the options for representing authorization patterns that fall outside what’s supported natively? E.g., Can we write custom logic in the DSL? Can we call out to application code? Do we just need to handle these as special cases outside of the solution altogether?
- What is the likelihood that we wind up in a situation where there’s no good workaround?
POC Approach
Goal: Confirm that the solution allows you to:
- Model the authorization patterns that satisfy your core application requirements
- Accommodate exceptions and edge cases
- Validate that your authorization logic is correct
Recommended Steps
(1) Choose 2-3 key application requirements where you need to implement authorization. For example:
- Giving users different kinds of access to specific resources based on their role (owner, editor, etc.)
- Giving users access to a resource based on the user’s relationship in a user hierarchy and/or the resource’s relationship in a resource hierarchy
- Internal superusers
- Blocking or banning users
(2) Express the authorization requirements in terms of common authorization patterns
(3) Implement the necessary patterns in the authorization system
(4) Write unit tests to confirm that the authorization system returns the expected results for expected application states
Authorization Data
Authorization data is the input to the rules in your model. Your authorization data says who actually has what permissions and roles. It might say, “Juno is the owner of this thing” or “Holden is an analyst on the finance team.” This includes more application data than most people realize. Things like organization membership and roles are obvious examples, but an authorization decision may also depend on data like:
- File and folder relationships (e.g., A user can edit files in folders that they are an editor on)
- Status of an approval (e.g., A vacation request can be edited until it’s been approved)
- The date a form was created (e.g., After two weeks, the survey is closed to new replies)
- How active a user is (e.g., A user can’t create new topics until they’ve posted 10 replies to other users’ topics)
This data is typically stored in your application’s database/s. In order to make use of this data to answer authorization questions (i.e., do enforcement), the authorization system needs access to it.
Requirements
Ability to sync and reconcile data
Most authorization solutions solve the problem described above by synchronizing authorization data from your application. There are multiple ways to get the authorization data to your authorization system, from a custom two-phase commit to database plugins. It’s important to understand how you sync data to the authorization solution, and how you ensure it stays in sync without any kind of corruption or drift.
Questions to ask:
- How do we sync data from our application database to the authorization solution? What options are possible?
- How do we ensure we’re aware of any drift between our application database and the authorization solution? How do we remediate any discrepancies?
- What are the acceptable options internally for solving this category of problem?
- What do we have to do vs. what tooling does the solution provide to solve this category of problem?
Data filtering
While you can sync some authorization data to a central authorization solution, it’s often not practical to sync all of it. This limitation often comes up when solving for data filtering (also known as list filtering).
For instance, say you want to display a list of repositories to the current user. You may want to display repositories that they’ve recently edited, new repositories in organizations they belong to, and public repositories that they’ve been tagged in since their last visit.
The challenge is that it’s often impractical to sync all of the data you’d need in order to display that list view. The authorization solution might store, say, a list of authorized resource IDs, which map to all the resource rows in your application database, along with the associated metadata you want to display.
Pulling out all the repositories and looping over them in your application is typically too slow in practice. So how do you solve this problem?
Questions to ask:
- Is it possible to filter using a combination of data in the authorization solution and data stored in a separate database?
- If not, are the lists you’d expect to get back from your authorization solution (e.g., a list of issues a user owns) small enough that you could stream them back to your application without exhausting memory or creating noticeable delay? If yes, do we have a workaround for when the list gets too big?
Fast and flexible data format
As previously discussed, most authorization solutions synchronize authorization data from your application. By definition, this requires that the solution define a shared data model. Roles are a common example of shared authorization data: you might need to know users’ roles in GitCloud’s Repo service (to determine whether the current user can view a given repo) and in an Admin service (to determine whether the current user can invite a new user to the organization).
The data model you choose needs to be both fast and flexible. It needs to be fast for all the various authorization questions you need to answer – from ”Does the current user belong to this organization?” to ”What are all the settings that can be viewed by a customer service rep impersonating a user with guest access to this organization?” It also needs to be flexible enough to support whatever authorization data you have. This includes not just roles and relationships, but also attributes.
Questions to ask:
- What data model/s does the authorization model support?
- What are the data model’s performance characteristics? How much optimization do we need to do – e.g., modeling, indexing, tuning?
- How easy/hard is it to accommodate the different kinds of data we need to store?
POC Approach
Goal: Confirm that the authorization solution allows you to:
- Store and provide access to shared authorization data (e.g., roles)
- Filter lists of results based on application data that is not synchronized with the solution.
- Update shared data as it is changed by the application
- Identify and reconcile inconsistencies between application data and data in the authorization solution
Recommended steps
For each of the application requirements that you selected for the Modeling evaluation:
(1) Identify the application data that you need to make the corresponding authorization decision. Examples include:
- Organization membership
- User roles
- File and folder relationships
- Resource attributes (public, locked, etc.)
(2) Synchronize the application data in its current state using the authorization solution’s mechanism for initializing its data.
(3) Modify the create, update, and delete code in your application to synchronize data with the authorization solution when your application data changes, using the mechanisms provided by the authorization solution (API, database replication, etc.)
- NOTE: If you’re changing application code, we recommend that you do so behind a feature flag or on a throwaway branch to minimize disruption to the rest of your team.
(4) Confirm that you can filter lists using a combination of authorization and application data as needed for your application. Examples include:
- Returning a list of all files on which the user has the “editor” role (authorization data) and sorting them by the last modified date (application data)
- Showing a list of all repositories on which the user has the owner role (application data) and that have unresolved issues (application data)
(5) Change shared authorization data using your application’s functionality
(6) Confirm that the changes are reflected in the authorization solution
(7) Change shared authorization directly at the database, or using another method that circumvents the standard synchronization approach
(8) Determine whether you can detect the inconsistency
(9) Reconcile the data using the mechanisms provided by the authorization solution (or your own if it doesn’t provide one)
Authorization Enforcement
Enforcement is when you combine your model and data at runtime to render a decision back to your application – e.g., yes, Holden can delete file 123 (because he’s an owner). We discussed previously the simple case: asking whether a user can take an action on a specific resource, yes or no. You can also imagine rotating this question in 3D space to create other questions that applications need to answer – e.g., return all the files of which Willow is an owner, or return all the permissions Willow has on file 777 (to render a UI). Your enforcement API needs to support all the potential authorization questions your service will ask.
Out of the box APIs for common authorization questions
There are common enforcement patterns:
- Authorize yes/no decisions, e.g. Can Holden delete file 123?
- List authorized resources, e.g. What are all the files that Holden can view?
- List authorized permissions for a UI, e.g. What are all the actions that Holden can take on this folder?
- Batch authorizations, e.g. Which of these 50 files is Holden allowed to delete?
Questions to ask:
- What authorization checks does the solution support natively? What kinds of abstractions does the solution provide to make it easier to make these authorization checks?
- What are the performance characteristics of these authorization checks? Do these meet our internal service level objectives (SLOs)?
Ability to ask arbitrary authorization questions
It’s common that 80% of the authorization checks you need to make are covered by the patterns above, but the last 20% are a touch different. For example, let’s say you want to render a screen showing the owners of all repos in GitCloud. None of the above patterns would make that possible. How would you solve this problem?
Questions to ask
- Does the solution natively support the ability to run arbitrary authorization checks or queries?
- If not, what workarounds exist to get the data out of the authorization solution and run those queries elsewhere? Do the performance of the workarounds meet our SLOs?
POC Approach
You’ve previously put in place the two puzzle pieces you need to make authorization decisions: an authorization model and authorization-relevant data. Now you can evaluate what you can do with those two pieces – i.e., what kinds of authorization decisions you can ask, and what kinds of authorization filters you can apply.
Goal: Confirm that the authorization solution:
- Resolves all the types of authorization questions that your application needs (e.g.m yes/no access requests, list authorized resources, show all permissions on an object)
- Satisfies your application’s performance requirements or customer-facing SLAs
- Adapts to incorporate exceptions and changing requirements
- Returns the correct responses to authorization questions
Recommended steps
For each application requirement that you’ve modeled:
(1) Replace the authorization code in your application with a call to the authorization service
- NOTE: We recommend that you do this behind a feature flag or on a throwaway branch to minimize disruption to the rest of your team.
(2) Use your application’s functionality to perform the operation
- Confirm that the authorization response is correct for both the positive and negative cases
- Measure the time that it takes for the authorization solution to return a response
(3) Exercise a list endpoint for cases that support them (e.g., file lists, user rosters)
- Confirm that the lists are correct
- Measure the time it takes to return the response
(4) Identify an edge case or an exception that is related to the requirement (e.g., locked files, blocked users, public resources)
(5) Incorporate this exception into the authorization solution
- Modify the model
- Update the data
- NOTE: Authorization logic always evolves over time. By iterating on your model during the POC, you can get a sense for how well the solution will support these changes after it’s in production.
6. Repeat steps 2 and 3 for the revised model.
Non-functional Requirements
In addition to the core functional requirements described above, most teams take into account other factors that shape the experience and confidence of rolling out an authorization solution. Here are the top 5 considerations that come up:
(1) Documentation - In API-first products, the documentation is like the frontend – if it’s not documented (or worse, if it’s poorly documented), it might as well not be there at all. Having consistent, clear, and comprehensive documentation can mean the difference between quickly getting something done or faffing about with Google and Stack Overflow for an entire afternoon. Questions to ask:
- When I have a question about the solution, how easy/hard is it to find the answer in the docs? What level of problem and solution coverage does the documentation have?
- What kind of documentation does the solution provide? Technical docs, conceptual guides, videos, sample apps and demos, blog posts, etc.
- Can I understand the documentation without having to look up a bunch of jargon or refer to external sources of information?
(2) Support - Authorization is a complex domain, and authorization as a service is a relatively new market category. Additionally, there aren’t many engineers on the market who are experts in authorization. One of the ways that organizations derisk their authorization rollouts is through technical support. Questions to ask:
- How do we ask support questions (e.g., Slack, Zendesk)?
- What kinds of support are included – e.g., just break/fix, or also advisory?
- What is the average response time to questions during the evaluation process?
- What is the quality of responses from the solution vendor?
(3) Developer experience - It’s hard to say objectively what makes for a good developer experience, but we can confidently say that engineers want an authorization system that is easy to learn and maintain. This doesn’t just provide better quality of life. Importantly, it also helps teams ship faster, ramp new engineers, and ensure that the team responsible for the solution can scale and support the rest of the org. Questions to ask:
- How does testing work? Does the solution support unit testing?
- How does debugging work? How do we debug scenarios where a user has access that they shouldn’t have, or vice versa?
(4) Ops & Security - Authorization is a critical piece of your stack. If your authorization system is down, your app is down. Outsourcing this capability is not something engineering teams take lightly. Questions to ask:
- What are the deployment options, e.g., cloud-only, hybrid, on-premises?
- Does the solution expose telemetry?
- What options are there for backup and point-in-time restore?
(5) Company and product viability - Most companies don’t switch authorization implementations every year. So it’s important to assess the viability of the product and the company to ensure that it will continue to be grow and be around. Questions to ask:
- How long has the solution been in-market and generally available?
- Who else in the market is using this authorization solution? Startups? Large organizations?
- How much funding and runway does the company have?
- Who has invested in the company?
Time to Build
POCs can often feel daunting, especially for a system as critical as authorization. Fear not! You now have a roadmap for how to do the evaluation based on hundreds of engineering teams that have come before you.
If you’re still debating whether authorization as a service is right for you, you can read about why other engineering teams have decided to use Authorization as a Service.
Or if you’re ready to dive in and want a little moral support, join the Oso Community Slack, where you’ll find the Oso engineering team and almost 2,000 other developers working on the same problems as you.
You got this!