The raw Gaia DR3 data is ~600GB compressed, and traditional databases need 500GB+ of RAM to query it efficiently. That's insane for what should be simple questions like "show me all stars in this patch of sky."
Using succinct data structures, I compressed the entire catalog down to ~15GB while keeping queries blazing fast (microseconds vs seconds).
Morton Encoding: Converts 2D sky coordinates into a single 64-bit number using Z-order curves. Stars close in the sky have similar numbers, enabling 2D spatial queries with simple range searches.
Elias-Fano Compression: Instead of storing 1.8B full 64-bit integers, we store the gaps between sorted values. Gets ~2-4 bytes per star instead of 8, with O(1) random access and O(log n) searches.
Delta-encoded Source IDs: Gaia's star IDs have structure (nearby stars have similar IDs). Variable-length encoding cuts storage from 10.7GB to ~3-5GB.
Only ~20% of Gaia stars have measured parallax (distance). Those are the ones rendered in 3D here.
Browser Limit: The 3D viewer loads stars up to magnitude 14. The full 1.8B star dataset is available via the Query API.
Initializing WebGPU...