jmcphers

Obfuscating IDs

Auto-incrementing IDs are great.

When you’re picking a method for generating unique IDs in a system, auto-incrementing IDs (1, 2, 3, …) are often a good choice. These have a number of advantages over more complicated techniques:

Of course, there are many situations in which sequential IDs aren’t appropriate, but this isn’t a post about choosing your identifiers.

Until they aren’t.

As nice as auto-incrementing IDs are, they do have (at least) one unpleasant downside: they’re often too transparent.

For instance, let’s say that you’re on a website and you notice that your own user ID is 48203.

http://mysite.com/userprofile?userid=48203

Now, using only this information, you can infer a lot more:

The problem.

Let’s state it clearly, then: given that your underlying storage uses a sequential ID, how do you make it opaque to the user, while minimizing the amount of complexity the obfuscation adds to your system?

A practical method for obfuscation.

I finally found a satisfactory solution to this problem here:

A Practical Use of Multiplicative Inverses (by Eric Lippert)

Note that when Eric says “multiplicative inverse” he’s referring to the modular multiplicative inverse, which caused me some confusion when looking for resources.

Basically, the principle is as follows:

Choose the biggest number you want to generate (“M”).

For our system, let’s say we know we’ll never have more than M = 100,000 users.

Pick large number (“N”) coprime to, and smaller than, the first.

“Coprime” just means “has no factors in common with”. You can pick this one more or less at random. How about N = 48029?

Calculate the modular multiplicative inverse of N.

This is a tremendous pain to do manually since the best known algorithm is iterative. Thankfully, unless your system needs to do this on the fly, you only need to do it once. There are a number of online calculators that will do this; here’s one, and here’s another.

The inverse in this case is 34069.

Use your new pair of numbers to obfuscate your IDs.

The formulas we need to transform IDs are simple:

masked_id = (48029 * plain_id) % 100000;
plain_id = (34069 * masked_id) % 100000;

Let’s try a few examples. How about ID 101?

masked_id = (48029 * 101) % 100000 = 50929
plain_id = (34069 * 50929) % 100000 = 101

Nice! How does it look for users?

Plain IDMasked ID
10150929
10298958
10346987
10495016
10543045

We’ve transformed a transparent and predictable sequence into an opaque and unpredictable one–and made it trivial to transform it back.

Some caveats.

Hopefully it’s obvious that this adds no actual security to a system; it’s obscurity, at best. However, for discouraging casual tampering, and creating opaque tokens out of transparent ones, it’s hard to do much better without considerably more complexity.