Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Outside of nostalgia there's no engineering reason to do this - definitely not for performance.

That same go program can easily go over 10k reqs/sec without having to spawn a process for each incoming request.

CGI is insanely slow and insanely insecure.






What makes it insecure? It's a pretty simple protocol - anything in there that makes is insecure beyond naive mistakes that could be avoided with a well designed library?

EDIT: Looks like the way CGI works made it vulnerable to Shellshock in 2014: https://en.m.wikipedia.org/wiki/Shellshock_(software_bug)

I agree that there's probably not much of an argument to switch to it from the well established alternative mechanisms we are using already.

The one thing in its favor is that it makes it easier to have a polyglot web app, with different languages used for different paths. You can get the same thing using a proxy server though.


CGI has a very long history of security issues stemming primarily from input validation or the lack thereof.

Right, but anything relating to input validation can be avoided by using a well designed library rather than implementing the protocol directly.

> CGI has a very long history of security issues stemming primarily from input validation or the lack thereof.

And a Go program reading from a network connection is immune from the same concerns how?


It's not, you have to use Rust :)

> It's not, you have to use Rust :)

If only I could borrow such confidence in network data... :-D


The language in use often has input validation libraries. The failure of the programmer to use them is not the fault of CGI. Further, proper administration of the machine can mitigate file injection, database injection, etc. Again, that people fail to do this isn’t the fault of CGI.

That's like saying forks and knives are vulnerable becuase you could stab someone with them.

>EDIT: Looks like the way CGI works made it vulnerable to Shellshock in 2014:

From your linked article: If the handler is a Bash script, or if it executes Bash...

But we are talking about Python not Bash.


Yes, Shellshock is kind of a marginal case, but it probably does qualify as a security hole due in part to CGI itself, even though it doesn't affect Python programs (unless they spawn a shell). I don't know of any other examples of security problems caused by CGI, even partly. It's a very thin layer over HTTP.

IIRC the main issue was finding ways to convince a CGI script to write something to disk, at which point you could sometimes make it be treated as another CGI script. More of an issue on Windows than UNIX.

Because... it can execute an arbitrary executable? In the old days, it also ran as root.

It definitely can’t. Either you had to put your script in cgi-bin, use an extension like .cgi in a directory with that feature explicitly enabled, or a magic sticky bit on the file if that was enabled.

You could configure the server to be insecure by, eg, allowing cgi execution from a directory where uploaded files are stored.


No, on all the servers I have any experience with, it can only execute executables the server administrator configures as CGI programs, not executables supplied by an attacker, and they never ran as root. Apache in particular is universally run as a non-root user, since the very first release, and its suEXEC mechanism (used for running CGI programs as their owners for shared web hosting services) refuses to run any CGI program as root. I've never seen a web server on a Unix system running as root: not CERN httpd, not NCSA httpd, not Apache, not nginx, not python -m http.server, not any of the various web servers I've written myself.

I hesitate to suggest that you might be misremembering things that happened 30 years ago, but possibly you were using a very nonstandard setup?


BusyBox runs as root by default, and it's used by hundreds of millions of devices.

For embedded devices (routers, security cameras, etc), it's very common to run CGI scripts as root.

So it is not even 30 years ago, it's still today, because of bad practices of the past.


Oh, that's a good point. In those cases the web server actually needs root, in the sense that it has to be able to upgrade the firmware and reconfigure the network interface.

Hey, it could be worse. Some people launch entire VMs to service individual requests.

That’s cloud-native!

You mean "Web Scale".

How is that done?

I was making a joke about AWS Lambda. It doesn't necessarily start up a new VM for each request, though; it can launch a new VM but it will reuse an existing VM if the same CGI-bin (oops I mean Lambda function) has been executed recently.

You're joking, but working in finance I witnessed the misuse of AWS Batch + AWS ECS to do something similar. Not gonna dox the company, but it was a German fintech unicorn.

It wasn't exactly for serving the response of the request per se, but a single customer click would launch an AWS ECS container with the whole Ruby and Rails VM just to send a single email message, rather than using a standard job queue.

It is extremely slow and super expensive. Amusingly, the UI had to be hardened so that double clicks don't cause two VMs to launch.

The rationale was that they already had batch jobs running in ECS, so "why not use it for all async operations".


I mean, a process kind of is an entire VM. But yeah, serverless is a marketing term for CGI scripts.

Not everything needs 10k RPS, and in some sense there are benefits to a new process – how many security incidents have been caused by accidental cross-request state sharing?

And in a similar vein, Postgres (which is generally well liked!) uses a new backend process per connection. (Of course this has limitations, and sometimes necessitates pgbouncer, but not always.)


Few years ago I felt the same and created trusted-cgi.

However, through the years I learned:

- yes, forks and in general processes are fast - yes, it saves memory and CPU on low load sites - yes, it’s simple protocol and can be used even in shell

However,

- splitting functions (mimic serverless) as different binaries/scripts creates mess of cross scripts communication - deployment is not that simple - security wise, you need to run manager as root and use unique users for each script or use cgroups (or at least chroot). At that moment the main question is why not use containers asis

Also, compute wise, even huge Go app with hundreds endpoints can fit just few megabytes of RAM - there is no much sense to save so few memory.

At worst - just create single binary and run on demand for different endpoints


Even without pgbouncer postgres uses long lived connections (and long lived processes) . So bad example.

Uber famously switched from pg to mysql because their SWEs couldn't properly manage connections


Was that the only reason? In our last testing (2021), on the same hardware and for our specific case (a billions of records database with many tables and specific workfloads), mysql consistently left postgres in the dust performance wise. Internal and external devs pointed out that probably postgres (or rather, our table structures & indexes) could be tweaked with quite a lot of work to be faster, but mysql performed well (for our purpose) even with some very naive options. I guess it depends on the case, but I cannot justify spending 1 cent (let alone far more as we have 100k+ tables) on something while something else is fast enough (by quite a margin) to begin with...



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: