So I had the following requirements:
- Should be easy to work out proximity of cells
- Should be easy to work out if a cell is occupied or not by it's coordinate
- Each building is a collection of cells
- Prefer to avoid multiple indexes per cell coordinate
- Maps can be sparse and of arbitrary size
For example given a 2D map of 16x16 cells, I could have buildings such as below:
The obvious way to store this is to have a 2D array in my data store. This would mean I'd have to know the size of my map before I store anything, which breaks one of my requirements to have arbitrary sized maps. This also means I'd have to fetch the whole map every time I needed to query a specific (x, y) coordinate. I threw this idea out.
Another way to achieve this would be a tuple like (building_id, x, y) in the data store. This means the map size is not defined up front. Great, but I'd need an index on x and y columns this way. I wanted to avoid having this. I also wasn't sure if (0, 0) would be stored 'close' to (0, 1) for example due to multiple indexes.
So I've started doing some research and the first thing I found was Geohash. Looked good at first, but not really applicable to my maps since I am using integer coordinates. However, related to this is another approach called Z-order curve. This was the thing I was looking for! There's even a bunch of implementations in C for it here.
What the Z-order curve does is classify each cell using a hash value that assigns similar numbers to those cells that are close by. This is done using a technique called bit-interleaving e.g. the bits of x and y (xxxxxxxx, yyyyyyyy) become (xyxyxyxyxyxyxyxy).
The output of computing the hash values for the cells in my map look like this:
So the L shaped building ends up having hashes of 50, 51, 56...quite similar numbers, great! This means the index will have entries for these cells close by, exactly what I'm after.
Each hash value is also unique so I don't have to worry about collisions. Querying this back is simple too, since I'll know my building 'shape' and starting coordinate, I can calculate all the hashes that I'm looking for.
Now when it comes to placing a new building on the map, I can easily look up the potential cells and see if they are occupied already. Removing a building is a matter of deleting all cells with my building ID. Storage is not an issue and can accommodate for arbitrary sized maps.
I've ported the code from the above link to Java here:
private static int coordToZHash(int x, int y) {
final int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
final int S[] = {1, 2, 4, 8};
int z;
x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);
return z;
}
Note that Java doesn't have unsigned integers, however it does not matter for this purpose because the bit representation is still the same and only the left shift operator is used, so the sign bit will not get altered (compared to the right shift operator).
I'm still to do some real testing on this, so use at own risk.
-i