When you’re picking a method for generating unique IDs in a system, auto-incrementing IDs (1, 2, 3, …) are often a good choice. These have a number of advantages over more complicated techniques:
Of course, there are many situations in which sequential IDs aren’t appropriate, but this isn’t a post about choosing your identifiers.
As nice as auto-incrementing IDs are, they do have (at least) one unpleasant downside: they’re often too transparent.
For instance, let’s say that you’re on a website and you notice that your own user ID is 48203.
http://mysite.com/userprofile?userid=48203
Now, using only this information, you can infer a lot more:
Let’s state it clearly, then: given that your underlying storage uses a sequential ID, how do you make it opaque to the user, while minimizing the amount of complexity the obfuscation adds to your system?
I finally found a satisfactory solution to this problem here:
A Practical Use of Multiplicative Inverses (by Eric Lippert)
Note that when Eric says “multiplicative inverse” he’s referring to the modular multiplicative inverse, which caused me some confusion when looking for resources.
Basically, the principle is as follows:
For our system, let’s say we know we’ll never have more than M = 100,000 users.
“Coprime” just means “has no factors in common with”. You can pick this one more or less at random. How about N = 48029?
This is a tremendous pain to do manually since the best known algorithm is iterative. Thankfully, unless your system needs to do this on the fly, you only need to do it once. There are a number of online calculators that will do this; here’s one, and here’s another.
The inverse in this case is 34069.
The formulas we need to transform IDs are simple:
masked_id = (48029 * plain_id) % 100000;
plain_id = (34069 * masked_id) % 100000;
Let’s try a few examples. How about ID 101?
masked_id = (48029 * 101) % 100000 = 50929
plain_id = (34069 * 50929) % 100000 = 101
Nice! How does it look for users?
Plain ID | Masked ID |
---|---|
101 | 50929 |
102 | 98958 |
103 | 46987 |
104 | 95016 |
105 | 43045 |
We’ve transformed a transparent and predictable sequence into an opaque and unpredictable one–and made it trivial to transform it back.
Hopefully it’s obvious that this adds no actual security to a system; it’s obscurity, at best. However, for discouraging casual tampering, and creating opaque tokens out of transparent ones, it’s hard to do much better without considerably more complexity.