- converting every number into its sequence of digits in decimal notation,
- writing those one character at a time,
- also write the string representation of the label of each value repeatedly for every record,
- compress all this with a structure-unaware generic text compression algorithm based on longest match search.
Each time you want to read that data, undo all of the above in reverse order.
You can optimize to some degree, but that's basically it.
I expect that not doing any of this saves the time spent doing it.
I also expect data type aware compression to be much more efficient than text compressing the text expansion.
In numbers, I expect 2 to 3 orders of magnitude difference in time and also in space (for non random data).
- converting every number into its sequence of digits in decimal notation,
- writing those one character at a time,
- also write the string representation of the label of each value repeatedly for every record,
- compress all this with a structure-unaware generic text compression algorithm based on longest match search.
Each time you want to read that data, undo all of the above in reverse order.
You can optimize to some degree, but that's basically it.
I expect that not doing any of this saves the time spent doing it. I also expect data type aware compression to be much more efficient than text compressing the text expansion.
In numbers, I expect 2 to 3 orders of magnitude difference in time and also in space (for non random data).