None of those support the use case I actually want: an S3-compatible server that stores data locally and replicates to a cloud provider. Ideally it should also store enough integrity-checking metadata what one can verify that everything is replicated correctly to the cloud after the fact.
I don't know if it is a server or client in this context, but have you looked into using rclone? I think due to the nature of the way it intermediates the S3 API, and involves (re)downloading files for some operations, rclone is probably either, or both, depending on what you're doing and how.
I think you want rclone. Specifically a combination of:
a local filesystem remote
a cloud provider remote
rclone serve s3 to serve a local s3 compatible API for the local remote
something orchestating an rclone copy, rclone sync or rclone bisync command to sync the local and cloud remotes. Could be triggered on a timer, or using another tool to watch the local directory for changes and trigger a sync.
I swear I suggested rclone on an old refresh of the page only to find you had posted this nearly an hour ago. I wasn't sure rclone could fit this use case, but I'm glad I wasn't totally off-base in assuming it was very close.
I think you can run rclone as a service, they recommend you use bisync on crontab, for example.
What's bisync? I don't really know, as it's new to me, but it's an experimental beta feature from rclone and sounds neat.
> bisync can keep a local folder in sync with a cloud service, but what if you have some highly sensitive files to be synched?
> Usage of a cloud service is for exchanging both routine and sensitive personal files between one's home network, one's personal notebook when on the road, and with one's work computer. The routine data is not sensitive. For the sensitive data, configure an rclone crypt remote to point to a subdirectory within the local disk tree that is bisync'd to Dropbox, and then set up an bisync for this local crypt directory to a directory outside of the main sync tree.
Funny enough, MinIO supported that feature via bucket replication. You could set the bucket replication to asynchronous or synchronous, write to MinIO and it would sync away, including custom storage classes on the target bucket. I use it to sync to a aws S3 bucket.
Out of curiosity, if using replication how many copies would you expect to be stored locally and how many in the cloud? If using erasure coding, which parameters locally and which in the cloud? How quickly would you expect the replication to cloud to occur?
One or more locally. One in the cloud. I want a local object store that archives, reliably, to the cloud. And I don’t mean some wildly inefficient periodic job that tries to find new objects and archive them — I mean something actually integrated in that reliably and verifiably replicates.
The most established option is Ceph, which has a (optional) web ui too. It's what I use and I'm happy with it.
Before investigating it for eventual production use, I heard quite a bit about how complicated it was to use, but for my use case (storage for enterprise and academic k8s clusters) it's actually been quite simple to deploy and use. cephadm (one of many ceph management tools) can handle nearly all our bootstrapping and management needs. Little to no tweaking or configuration needed. Fairly low overhead. Very reliable and resilient to adverse conditions. Easy to handle different storage types and data retention needs.
One thing I will warn you, if you go the ceph docs site right now and just start browsing, it is in fact quite overwhelming because ceph has the capacity to handle a ton of edge cases and unusual environments. I'd recommend taking 15 minutes and build this [1], which gives you a fully functional toy-sized ceph cluster on a single node. Then, hit the docs to fill in the gaps of what you need to know for your deployment.