A fast pseudorandom generator for KASLR
A recent patchset proposed for the Linux KASLR randomizes not only the kernel base address, but also reorders every function at boot time. As such, it no longer suffices to leak an arbitrary kernel function pointer, or so the logic goes.
Along with this patchset came a custom random number generator intended to be as fast as possible, so as to keep the boot time overhead at a minimum:
|
|
This was quickly decried as dangerous, and as Andy Lutomirski puts it,
> Ugh, don’t do this. Use a real DRBG. Someone is going to break the
> construction in your patch just to prove they can.
>
> ChaCha20 is a good bet.
In the end, this random number generator was quickly removed, and that was that.
But one can still wonder—is this generator secure but unanalyzed, or would it have been broken just to prove a point?
Bob Jenkins’s Small PRNG
The above generator was, as per the comment, derived from one of Bob Jenkins’s small-state generators1. It is, in particular, the following “three rotation 64-bit variant”:
|
|
The core consists of the iteration of a permutation; we can easily compute its inverse iteration as
|
|
The core permutation present in ranval
is depicted below.
This resembles a Type-3 Feistel network2, with some added operations for extra diffusion. Nevertheless, the resemblance still means that there are relatively few changes from one state to the next.
The mode of operation, in modern terms, looks pretty much like a sponge pseudorandom generator with a capacity of 192 bits and a rate of 64 bits. As such, an ideal permutation in this mode of operation should be indistinguishable from a random stream until approximately $2^{96}$ captured 64-bit words.
Analysis
There are several ways to try and attack a pseudorandom generator:
- We can try and find a bias in its output stream;
- We can try to find a weakness in its initialization;
- We can try to recover an intermediate state from its output;
- Many more…
Our approach here will the be third one. The initialization, with its 20 rounds (or 30 in the KASLR version), is unlikely to have easily exploitable properties. Finding a bias in the output stream seems feasible, but in practical terms it has rather limited applicability.
Becase the permutation is rather simple, we will try to model the problem algebraically. This means representing the problem as a multivariate system of equations in $\mathbb{F}_2$, where $a \cdot b$ means bitwise and, and $a + b$ means bitwise xor. Since the permutation above consists only of a combination of additions, xor, and rotations, every operation is trivial to represent except addition (and subtraction).
Let $x, y$ and $z$ be 64-bit variables, and $x_i$ (resp. $y_i, z_i$) indicate the $i$th bit of $x$ (resp. $y, z$). One can represent 64-bit addition $z = x \boxplus_{64} y$ as a recursive system3:
The equation system for subtraction is the same as with addition, with a simple reordering of the variables. Alternatively, we can explicitly write it as
Now it becomes quite straightforward to model the entire round as an equation system like above, reordering the equations such that it becomes a system of the form $$ \begin{align} p_1(x_0,\dots) &= 0, \newline p_2(x_0,\dots) &= 0, \newline \dots & \newline p_l(x_0,\dots) &= 0, \newline \end{align} $$ which we call the algebraic normal form, or ANF, of the system.
Below we present a Python script that does exactly this, receiving a number of output leaks as arguments:
|
|
Having this system, we can solve it in two main ways:
- Compute its Gröbner basis, and thereby easily find a solution;
- Convert the system from its algebraic normal form to its conjunctive normal form (CNF), and use a SAT solver to find a solution.
We note that both Gröbner bases and boolean satisfiability are NP-complete problems. However, for small enough and simple enough systems, the heuristics used by good modern solvers make many of these problems tractable.
Although we tinkered with the first approach, the latter is both simpler to implement and more efficient. We also made use of the recent and quite convenient tool Bosphorus, which makes it straightforward to export a simplified CNF given an ANF equation system exported by our script above:
|
|
In the above snippet, we use ./bob
to generate a random state and leak 8 outputs, bob.py
(the script above) to create the ANF from these leaks, bosphorus
to convert the system to CNF, CaDiCaL4 to solve the system, and recover.py
to convert the output of cadical
back to readable integer values.
The number of leaked values is significant to the recovery speed. The minimum number of consecutive leaks to have a unique solution is 4—the initial value of d
plus 3 other leaks to constrain the $2^{192}$ possible initial state variables $a, b, c$ to a single value.
However, 4 leaks seems to make the problem quite hard for SAT solvers. If, instead, we use 5 leaks the problem becomes tractable. The more leaks we have, the faster it will be, until a certain point. We found, experimentally, that 8 leaks are the sweet spot for recovery time, with more leaks failing to speed things up.
The following table contains the solving speeds, on an Intel Core i7-4770, for various numbers of leaks, averaged over 100 runs:
Leaked words | Average state recovery time (seconds) |
5 | 95 |
6 | 43 |
7 | 31 |
8 | 26 |
9 | 27 |
10 | 28 |
11 | 29 |
Thus, it is safe to say that this generator is not suitable for cryptographic purposes.
We also note that SMT solvers could have been used to make the instantiation of the problem simpler. However, this results in poorer solving performance, and the performance across SMT solvers fluctuates even wilder than with our approach.
Carried Away
And now for something completely different.
While looking through the KASLR code, we find a peculiar piece of code in kaslr_get_random_long
, the function that is used to get random values for KASLR:
|
|
The random 32 or 64-bit word that is returned begins with a simple hash of the kernel build and boot information for the present kernel:
|
|
After that, it depends on which CPU features are enabled:
- If
rdrand
is available,random
is xored with its value. Under the assumption thatrdrand
works as advertised, this should result in a perfectly distributed value. - If
rdtsc
is available,random
is once again mixed in with the timestamp counter value. This is not as good of an entropy source asrdrand
, particularly sincerdtsc
is usually available system-wide. - If all else fails, use the i8254 lower-resolution timer.
After doing all this mixing, and in particular if only timers are used, random values are likely to be highly biased—the most significant bits are likely to remain relatively static over time.
To convert this lopsided entropy into a uniformly distributed value, since 2013 the function ends with a “cyclic multiplication” to smooth things over:
|
|
In short, it computes the full product of random
times 0x3f39e593
or 0x5d6008cbf3848dd3
, and adds the upper bits (in raw
) to the lower bits (in random
). This ensures that all the bits are more or less equitably mixed.
But there’s a problem. Two, in fact: one theoretical and one practical.
In theory, what is being attempted here is randomness extraction. There are two usual ways to accomplish this: using a strong hash function modeled as a random oracle, or using a universal hash function and the leftover hash lemma. Here we have neither, and it’s clear that the output only looks unbiased for a naive attacker who cannot simply (approximately) invert the transformation.
The practical issue is different: if the entropy we have is actually well-distributed (say, by using rdrand
), then the cyclic multiplication makes it worse by creating many values that are simply unreachable. Why? Because the multiplication—as implemented here—is not bijective.
Cyclic multiplication
Cyclic multiplication is best interpreted as multiplication modulo $2^n-1$, with lazy reduction. In other words,
If $b$ is relatively prime to $2^n-1$, this operation is clearly invertible. Its implementation is simple, as well: $$ a \otimes b = \left(a \times b \bmod 2^n\right) + \left\lfloor{\frac{a\times b}{2^n}}\right\rfloor \pmod{2^n - 1},. $$ This is exactly what was implemented above. But there is one problem—the sum may overflow. To keep correctness, the overflowing bit—the carry—must be added back to the result.
If the carry is not added, there are a number of values proportional to $b$ that will never be reached. In the case of $b = \text{0x3f39e593}$, there are around $2^{28}$ unreachable values—1 out of every 16. While this is not particularly concerning here, it is an unnecessary flaw and easily corrected.
Fixing it
The fix, now, becomes obvious: simply add the missing carry. This way the final transformation cannot harm a good uniform distribution, unlike before.
|
|
To be clear, Bob Jenkins never claimed these generators were cryptographically secure. ↩︎
See page 2 of On Generalized Feistel Networks for an idea of what the various Feistel network variants look like. ↩︎
Remember, again, that we are working with individual bits and that $\cdot$ means and and $+$ means xor. ↩︎
Obviously, any SAT solver could have been used here; we used the one that maximized single-thread solving speed for our problem. ↩︎