In the previous article, I introduced Biscuit, our authentication and authorization token, and mentioned its Datalog based language for authorization policies. Let's see how it works!
From a personal blog to an entire newspaper
As an example, we will build up authorization policies, going from a small, personal blog, to a professional journal with multiple teams, editors, etc.
Since those policies will be written in Datalog, let's take a short look at that language first.
Side note: introduction to Datalog
Datalog is a declarative logic language that is a subset of Prolog. A Datalog program contains "facts", which represent data, and "rules", which can generate new facts from existing ones.
As an example, we could define the following facts, describing some relationships:
parent("Alice", "Bob");
parent("Bob", "Charles");
parent("Charles", "Denise");
This means that Alice is Bob's parent, and so on.
This could be seen as a table in a relational database:
parent | ||
---|---|---|
Alice | Bob | |
Bob | Charles | |
Charles | Denise |
We can then define rules to query our data:
parent_of_charles($name) <-
parent($name, "Charles");
This could be written in SQL as:
SELECT DISTINCT name from parent where child = "Charles";
(we use DISTINCT
because Datalog will always remove redundant results)
We can also use rules to create new facts, like this one: (variables are introduced with the $
sign)
grandparent($grandparent, $child) <-
parent($grandparent, $parent),
parent($parent, $child);
You can read it as follows:
create the fact grandparent($grandparent, $child)
IF
there is a fact parent($grandparent, $parent)
AND there is a fact parent($parent, $child)
with matching $parent variable
or in SQL:
INSERT INTO grandparent( name, grandchild )
SELECT A.name as name, B.child as grandchild
FROM parent A, parent B
WHERE A.child = B.name;
Applying this rule will look at combinations of the parent
facts as defined on the right side of the arrow (the "body" of the rule), and try to match them to the variables ($grandparent
, $parent
, $child
):
parent("Alice", "Bob"), parent("Bob", "Charles")
matches because we can
replace$grandparent
with"Alice"
,$parent
with"Bob"
,$child
with"Charles"
parent("Alice", "Bob"), parent("Charles", "Denise")
does not match because
we would get different values for the$parent
variable
For each matching combination of facts in the body, we will then generate a fact, as defined on the left side of the arrow, the head of the rule. For parent("Alice", "Bob"), parent("Bob", "Charles")
, we would generate grandparent("Alice", "Charles")
. A fact can be generated from multiple rules, but we will get only one instance of it.
Going through all the combinations, we will generate:
grandparent("Alice", "Charles");
grandparent("Bob", "Denise");
which can be seen as:
grandparent | ||
---|---|---|
Alice | Charles | |
Bob | Denise |
Interactions with a Datalog program are done through queries: a query contains a rule that we apply over the system, and it returns the generated facts.
First steps: personal blog
*note: you can follow along the various steps of this tutorial in the online playground.
When we are the only user of that blog, we do not need much (honestly we could get away with just a random string in a cookie, but bear with me). We only need a way to identify ourselves to the blog engine's admin panel. So we could just consider the Biscuit token as a fancy JWT, that will only contain data (so, in Datalog, facts).
Our token will contain this fact: user(#authority, "user_1234")
.
Here, "user_1234"
is our user id, and #authority
is a special symbol that can only be added to facts in the first block of a token (or added by the verifier). A block contains facts (data), rules (to generate facts) and checks (queries used to validate the facts). Attenuation is done by adding more blocks. Since #authority
facts are about the basic rights of a token, adding #authority
facts would increase the number of rights. So we forbid adding #authority
facts in additional blocks. Symbols, as indicated by the #
prefix, are special strings that are internally replaced with integers, to compress tokens and accelerate evaluation.
The token can be serialized to a byte array (encoded with Protobuf) and then to base64 if we want to carry it in a cookie.
On the blog engine's side, we will only have this single line:
allow if user(#authority, "user_1234");
Biscuit can enforce authorization in 2 ways:
- checks, starting with
check if
- allow/deny policies, starting with
allow if
ordeny if
They work a bit like rules: if there's at least one combination of fact in the body (after the if
) that fits, then it matches. They will not produce any fact.
To validate a token:
- all of the checks must match. If one does not, fail
- allow/deny policies are tried in order until one matches
- if allow matches, succeed
- if deny matches, fail
- if none match, fail
Here the allow test will succeed if the token contains the fact user(#authority, "user_1234")
It is not very useful yet, but maybe we can add more features?
Next: multi-blog platform
After a few friends have seen your marvelous website, they ask if you could host their blogs on the same platform. So now you need more flexible authorization rules. We could keep the small tokens with the user id, but add more intelligence on the server's side.
First we need to indicate who owns which blog, with the format owner(#authority, $user_id, $blog_id)
. You can load this data when creating the verifier, from your database, from static files, etc.
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");
Here we own "blog1"
and "blog3"
, and "user_5678"
owns "blog2"
.
Now we need to actually validate the request, to see who has access to what. The request is represented through the #ambient
facts, added to the verifier: you indicate to the verifier facts representing the current request like which resource is accessed, which operation (read, write, etc), the current time, the source IP address, etc. As an example, a PUT /blog1/article1
to modify an article could be translated as:
blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #update);
In the verifier, we add a rule to indicate that the owner of a blog has full rights on it:
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
owner(#authority, $user_id, $blog_id);
If this rules finds a matching set of facts, it will produce a right(...)
fact.
The verifier will also use an allow policy for the presence of that right
(you will see why we separate them in the next section):
allow if
blog(#ambient, $blog_id),
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
right(#authority, $blog_id, $article_id, $operation);
// unauthenticated users have read access
allow if
operation(#ambient, #read);
// catch all rule in case the allow did not match
deny if true;
So if we tried to do a PUT /blog1/article1
with the token containing user(#authority, "user_1234")
, we would end up with the following facts:
user(#authority, "user_1234");
blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #update);
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");
If we applied the verifier's rule, we would end up with:
right(#authority, "blog1", "article1", #update) <-
owner(#authority, "user_1234", "blog1"),
article(#ambient, "blog1", "article1"),
user(#authority, "user_1234"),
operation(#ambient, #update);
So we end up with the new fact right(#authority, "blog1", "article1", #update)
.
Now the verifier applies the check:
allow if
blog(#ambient, "blog1"),
article(#ambient, "blog1", "article1"),
operation(#ambient, #update),
right(#authority, "blog1", "article1", #update);
And the test succeeds! If we had tried the request with a token containing user(#authority, "user_5678")
, the rule would not have produced the right()
fact, and it would have failed.
Now if we did a GET /blog1/article1
request, without being the owner of the blog, we would have matched allow if operation(#ambient, #read)
.
But maybe we don't want to have all articles available by default, maybe some of them are still in writing, so let's remove that allow policy. We want to mark an article as publicly readable by creating the fact readable(#authority, $blog_id, $article_id)
. We can do that with this test:
allow if
operation(#ambient, #read),
article(#ambient, $blog_id, $article_id),
readable(#authority, $blog_id, $article_id);
So if we did a GET /blog1/article1
request with that article marked as readable, we would get the facts:
blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #read);
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");
readable(#authority, "blog1", "article1");
The test would apply as follows:
allow if
operation(#ambient, #read),
article(#ambient, "blog1", "article1"),
readable(#authority, "blog1", "article1");
And we got access. In a few lines, we created basic rules to protect our blog platform. But users need more features!
add reviewers
Often, we'd like to ask friends and colleagues to review articles before they are published. In our system, it could be done in two ways:
- mint a token containing only
right(#authority, "blog1", "article1", #read)
- derive the user's token, adding a check restricting to the article
In the second case, the token would look like this:
Block 0 (authority):
facts: [ user(#authority, "user_1234") ]
rules: []
checks: []
Block 1:
facts: []
rules: []
check: [
check if article(#ambient, "blog1", "article1"), operation(#ambient, #read)
]
if we tried to do a PUT /blog1/article1
, the verifier's checks would succeed, but the token's check would fail, because it does not find the operation(#ambient, #read)
fact. But for a GET /blog1/article1
, all checks would succeed. The reviewer will not be able to remove the block while keeping a valid signature, so any alteration will result in a failed request.
premium accounts
Now some of the blog authors want to make living out of it (come on, it's 2021, do a newsletter instead) and mark some articles as "premium", so that only some users can access them.
We can do that by having premium_user(#authority, $user_id, $blog_id)
facts and adding a rule on the verifier's side:
right(#authority, $blog_id, $article_id, #read) <-
article(#ambient, $blog_id, $article_id),
premium_readable(#authority, $blog_id, $article_id),
user(#authority, $user_id),
premium_user(#authority, $user_id, $blog_id);
We could even add a feature like LWN.net where a paying user can share a premium article, by deriving their tokens to only accept that article.
We're a big newspaper now, we want roles and teams
Againt all odds, our blog platform is a smashing success. We need to recruit journalists, editors, copywriters… So now we might need more flexible rights management, maybe some teams and roles?
Let's define more facts and rules to encode that. As an example, let's define a "contributor" role that can only read or write articles, while owners are the only ones who can create or delete.
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
contributor(#authority, $user_id, $blog_id),
[#read, #update].contains($operation);
What you can see on the last line is an expression: Biscuit's Datalog implementation can require additional conditions on some values, like a string matching a regular expression, or a date being lower than an expiration date, or here, presence in a set. This rule will only produce if the operation is #read
or #update
.
Now, we want to define contributor teams to manage them more easily. So we will introduce the team(#authority, $team_id)
, member(#authority, $user_id, $team_id)
and team_role(#authority, $team_id, $blog_id, #contributor)
facts.
Additionally, we insert this rule in the verifier:
contributor(#authority, $user_id, $blog_id) <-
user(#authority, $user_id),
member(#authority, $user_id, $team_id),
team_role(#authority, $team_id, $blog_id, #contributor);
This rule will generate the contributor
fact for a blog if we are member of a team that has the "contributor" team role.
We could also fold the two precedent rules in one:
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
member(#authority, $user_id, $team_id),
team_role(#authority, $team_id, $blog_id, #contributor),
[#read, #write].contains($operation);
And that's it! With a few rules, we can model more and more complex authorization patterns, some of them relying on user provided policies, without compromising the previous features. Rules are additive, so there's no need for a long chain of if/else and special cases hardcoded in some endpoints. Everything can be managed in one place.
To sum up the rules of our system:
// the owner has all rights
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
owner(#authority, $user_id, $blog_id);
// premium users can access some restricted articles
right(#authority, $blog_id, $article_id, #read) <-
article(#ambient, $blog_id, $article_id),
premium_readable(#authority, $blog_id, $article_id),
user(#authority, $user_id),
premium_user(#authority, $user_id, $blog_id);
// define teams and roles
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
member(#authority, $user_id, $team_id),
team_role(#authority, $team_id, $blog_id, #contributor),
[#read, #write].contains($operation);
// unauthenticated users have read access on published articles
allow if
operation(#ambient, #read),
article(#ambient, $blog_id, $article_id),
readable(#authority, $blog_id, $article_id);
// authorize if got the rights on this blog and article
allow if
blog(#ambient, $blog_id),
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
right(#authority, $blog_id, $article_id, $operation);
// catch all rule in case the allow did not match
deny if true;
And here is an example Rust program reproducing this authorization system:
use biscuit::{crypto::KeyPair, error, token::Biscuit, parser::parse_source};
use biscuit_auth as biscuit;
fn main() -> Result<(), error::Token> {
let start = std::time::Instant::now();
// First, let's create the root key for the system
// its public part will be used to verify the token
let mut rng = rand::thread_rng();
let root = KeyPair::new();
// Token creation
// we will add a single fact indicating identity
let mut builder = Biscuit::builder(&root);
builder.add_authority_fact("user(#authority, \"user_1234\")")?;
let token = builder.build()?;
println!("{}", token.print());
let token_bytes = token.to_vec()?;
let serialized = base64::encode_config(&token_bytes, base64::URL_SAFE);
println!("serialized ({} bytes): {}", token_bytes.len(), serialized);
let deserialized_token = Biscuit::from(&token_bytes)?;
// Token verification
// first, we validate the signature with the root public key
let mut verifier = deserialized_token.verify(root.public())?;
// simulate verification for PUT /blog1/article1
verifier.add_fact("blog(#ambient, \"blog1\")")?;
verifier.add_fact("article(#ambient, \"blog1\", \"article1\")")?;
verifier.add_fact("operation(#ambient, #update)")?;
// add ownership information
// we only need to load facts related to the blog and article we're accessing
verifier.add_fact("owner(#authority, \"user_1234\", \"blog1\")")?;
//verifier.add_fact("owner(#authority, \"user_5678\", \"blog2\")")?;
//verifier.add_fact("owner(#authority, \"user_1234\", \"blog3\")")?;
let (_remaining_input, mut policies) = parse_source("
// the owner has all rights
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
owner(#authority, $user_id, $blog_id);
// premium users can access some restricted articles
right(#authority, $blog_id, $article_id, #read) <-
article(#ambient, $blog_id, $article_id),
premium_readable(#authority, $blog_id, $article_id),
user(#authority, $user_id),
premium_user(#authority, $user_id, $blog_id);
// define teams and roles
right(#authority, $blog_id, $article_id, $operation) <-
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
user(#authority, $user_id),
member(#authority, $usr_id, $team_id),
team_role(#authority, $team_id, $blog_id, #contributor),
[#read, #write].contains($operation);
// unauthenticated users have read access on published articles
allow if
operation(#ambient, #read),
article(#ambient, $blog_id, $article_id),
readable(#authority, $blog_id, $article_id);
// authorize if got the rights on this blog and article
allow if
blog(#ambient, $blog_id),
article(#ambient, $blog_id, $article_id),
operation(#ambient, $operation),
right(#authority, $blog_id, $article_id, $operation);
// catch all rule in case the allow did not match
deny if true;
").unwrap();
for (_span, fact) in policies.facts.drain(..) {
verifier.add_fact(fact)?;
}
for (_span, rule) in policies.rules.drain(..) {
verifier.add_rule(rule)?;
}
for (_span, check) in policies.checks.drain(..) {
verifier.add_check(check)?;
}
for (_span, policy) in policies.policies.drain(..) {
verifier.add_policy(policy)?;
}
let res = verifier.verify()?;
let dur = std::time::Instant::now() - start;
//println!("res: {:?}", res);
println!("{}", verifier.print_world());
println!("ran in {:?}", dur);
Ok(())
}
The entire program (key generation, token creation, serialization, deserialization, signature validation and facts verification) runs in 0.5 ms. So even with all of these features, Biscuit is fast enough to get out of your way.
What's next
You can already start using Biscuit in Rust, Java and Go.
The Rust version can also generate C bindings, currently used to develop a Haskell version, and there is a WebAssembly wrapper.
As an example integration, you can check out a Biscuit based authorization plugin for Apache Pulsar.
The specification is developed in the open, you can contribute.