That's a very bad way of solving that issue. If transmission is a problem, either use a proper retry-friendly protocol (such as bittorrent) or split the file. Using hacks on the data format just leads to additional pain
Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.
The OP was saying that zip files can specify their own special type of splitting, done within the format itself, rather than operating on the raw bytes of a saved file.
> Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.
I'm inclined to agree with you.
You can see good examples of this with the various multi-part upload APIs used by cloud object storage platforms like S3. There's nothing particularly fancy about it. Each part is individually retry-able, with checksumming of parts and the whole, so you get nice and reliable approaches.
On the *nix side, you can just run split over a file, to the desired size, and you can just cat all the parts together, super simple. It would be simple to have a CLI or full UI tool that would handle the pause between `cat`s as you swapped in and out various media, if we hark back to the zip archive across floppy disks days.
Without knowing the specifics of what's being talked about, I guess it makes sense that zip did that because the OS doesn't make it easy for the average user to concatenate files, and it would be hard to concatenate 10+ files in the right order. If you have to use a cli then it's not really a solution for most people, nor is it something I want to have to do anyways.
The OS level solution might be a naming convention like "{filename}.{ext}.{n}" like "videos.zip.1" where you right-click it and choose "concatenate {n} files" and turns them into "{filename}.{ext}".
> the OS doesn't make it easy for the average user to concatenate files
Bwah! You are probably thinking too much GUI.
X301 c:\Users\justsomehnguy>copy /?
Copies one or more files to another location.
COPY [/D] [/V] [/N] [/Y | /-Y] [/Z] [/L] [/A | /B ] source [/A | /B]
[+ source [/A | /B] [+ ...]] [destination [/A | /B]]
[skipped]
To append files, specify a single file for destination, but multiple files
for source (using wildcards or file1+file2+file3 format).
Why would you use manual tools to achieve what ZIP archive can give you out of the box? E.g. if you do this manually you’d need to worry about file checksum to ensure you put it together correctly.
We need to separate and design modules as unitary as possible:
- zip should ARCHIVE/COMPRESS, i.e. reduce the file size and create a single file from the file system point of view. The complexity should go in the compression algorithm.
- Sharding/sending multiple coherent pieces of the same file (zip or not) is a different module and should be handled by specialized and agnostic protocols that do this like the ones you mentioned.
People are always doing tools that handle 2 or more use cases instead of following the UNIX principle to create generic and good single respectability tools that can be combined together (thus allowing a 'whitelist' of combinations which is safe). Quite frankly it's annoying and very often leads to issues such as this that weren't even thought in the original design because of the exponential problem of combining tools together.
Well, 1) is zip with compression into single file, 2) is zip without compression into multiple files. You can also combine the two. And in all cases, you need a container format.
The tasks are related enough that I don't really see the problem here.
This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.
And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.
No need for 'sharding' to exist as a feature in the zip utility.
And... if you want only the shard from the get go without the original 1 file archive, you can do something like:
`zip dir/ | shard -O out_shards/`
Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.
The problem is that on DOS (and Windows), it didn't have the unix philosophy of a tool that did one thing well and you couldn't depend on the necessary small tools being available. Thus, each compression tool also included its own file spanning system.
The key thing that you get by integrating the two tools is the ability to more easily extract a single file from a multipart archive— Instead of having to reconstruct the entire file, you can look in the part/diskette with the index to find out which other part/diskette you need to use to get at the file you want.
The problem seems to be that each individual split part is valid in itself. This means that the entire file, with the central directory at the end, can diverge from each entry. This is the original issue.
Even worse, in the general case, you should really decompress the whole tarball up to the end because the traditional mechanism for efficiently overwriting a file in a tarball is to append another copy of it to the end. (This is similar to why you should only trust the central directory for zip files.)
If the point is being able to access some files even if the whole archive isn’t uploaded, why not create 100 separate archives each with a partial set of files?
Or use a protocol that supports resume of partial transmits.
This carries the information that all those files are a pack in an inseparable and immutable way, contrary to encoding that in the archive's name or via some parallel channel.
Try to transmit a 100G file through any service is usually a pain especially if one end has non-stable Internet.