Batch processing on encrypted data is absolutbly ok. For example, you can just pull row-level keys and run your audited and signed binary. Also the pipeline should be put away from the frontend.
I'm not an expert but I think it should already be documented in the compliance doc.
If you can batch process the dataset, an attacker can do the same thing. They take whatever creds the batch script is using and use them directly. If the program only runs from a certain box, then they run their modified script on that box.
"Well don't let that happen!" Nope, not that easy. This threat model concedes that the hackers have RCE. They're on your network. That's why they can do this.
"All binaries are audited and signed.", which makes it trackable. And the runtime environment is also isolated and uses different set of access permissions. I'm not saying it is impossible. Just layers and layers. For internal stuffs, Google's beyondcorp is a good start.
There is no such thing as a runtime environment that (a) attackers gained RCE into, and (b) the box with the sensitive data can verify the incoming connection is from an audited, signed binary.
Simply put: if you have RCE into an app, then that breaks all the guarantees of that audited, signed binary. You can make it sing and dance and ask for a thousand rows instead of ten; you can do whatever you want to it. The modifications happen in memory, not on disk, so even if they don't persist you can still modify the app.
Why would you ever allow public web access to a machine that can do that level of batch processing? That sort of thing should be accessible only through internal-only tools protected by a VPN.
You'd be amazed at how easy it is to exploit internal tools once you're inside someone's network. Internal tools were the juiciest targets: a perfect storm of "created by an unqualified team that no longer works here" and "nobody realized it's still been running for a year."
It can work, but evidence seems to imply this strategy makes the situation worse. I'm not sure what to do with this information other than to relay the experience. It's one of those tricky counterintuitive facts.
Oh, definitely agree. But I'm talking about "internal" tools that aren't actually internal, but are exposed on the public internet. At least if they're behind a VPN there's a hurdle to jump before you get to those tools.
Batch processing on encrypted data is absolutbly ok. For example, you can just pull row-level keys and run your audited and signed binary. Also the pipeline should be put away from the frontend.
I'm not an expert but I think it should already be documented in the compliance doc.