Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Damn, that’s a chonky database. Have you written anything about the setup? I’d love to know more— is it running on a single machine? How many reader and writer DBs? What does the replication look like? What are the machine specs? Is it self-hosted or on AWS?

By the way, really cool website.



I'll try to get a blog post out soon!

> Damn, that’s a chonky database. Have you written anything about the setup? I’d love to know more— is it running on a single machine? How many reader and writer DBs? What does the replication look like? What are the machine specs? Is it self-hosted or on AWS?

It's self-hosted on bare metal, with standby replication, normal settings, nothing "weird" there.

6 NVMe drives in raidz-1, 1024GB of memory, a 96 core AMD EPYC cpu.

A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IHMO is not providing much benefits outside of niche use-cases).

> By the way, really cool website.

Thank you!


> A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IHMO is not providing much benefits outside of niche use-cases).

That's kind of where I'm at now... you can vertically scale a server so much now (compared to even a decade ago) that there's really no need to bring a lot of complexity in IMO for Databases. Simple read replicas or hot spare should be sufficient for the vast majority of use cases and the hardware is way cheaper than a few years ago, relatively speaking.

I spent a large part of the past decade and a half using and understanding all the no-sql options (including sharding with pg) and where they're better or not. At this point my advice is start with PG, grow that DB as far as real hardware will let you... if you grow to the point you need more, then you have the money to deal with your use case properly.

So few applications have the need for beyond a few million simultaneous users, and avoiding certain pitfalls, it's not that hard. Especially if you're flexible enough to leverage JSONB and a bit of denormalization for fewer joins, you'll go a very, very long way.


> you can vertically scale a server so much now

And you often don't really need to.

Just last week for some small application and checking the performance of some queries I add to get random data on a dev setup. Which is a dockerized postgres (with no tuning at all) in a VM on a basic windows laptop. I inserted enough data to represent what could maybe be there in 20 years (like some tables got half a billion rows, small internal app). Still no problem chugging along.

It is crazy when you compare what you can do with databases now on modern hardware with how other software do not feel as having benefited as much. Especially on the frontend side.


Considering the front end today is an amazingly flexible client app platform with flexible rendering styling and accessibility compared to a VB app a few decades ago... It's kind of amazing.

Only if my favorite websites in the late 90s was 15 seconds make because that's how long people would wait for a webpage to load at the time. Things have improved dramatically.


> accessibility

I'd like to see those accessible frontends. The majority is not usable keyboard-only.


I have done a lot of work for eLearning, govt and banking which required good usability. Also, mui is very good out of the box.

https://mui.com/material-ui/all-components/


I'm gonna doubt you when on they own website navigating is really bad.

Only tab and shift-tab. Arrow keys are a bust. And the only visible shortcut is ctrl-K for the search input and I think it's because it comes as an algolia default.

For something better I only have to watch around the page at the browser itself: underlined letters in the menu tells me what alt+letter will open said menu. Then I can navigate using arrow keys and most menu items are shown with a key combination shortcut.


If I could show you some of the apps I've built with it would probably change your mind. A few had to go through testing and validation for accessibility. That and I'm pretty firm on keyboard navigation for all things. Had to tweak the corporate colors a little bit to fit WCAG compliance on the contrasts.

One thing that was crazy was having to go through verification for blind usability, when the core function (validating scanned documents) requires a well sighted user.

I won't say MUI is perfect... it isn't... but you can definitely go a lot farther in a browser than you can with what's in the box with most ui component libraries is the only real point.


> I'll try to get a blog post out soon!

Please do.

> It’s self-hosted on bare metal, with standby replication, normal settings, nothing “weird” there.

16TB without nothing weird is pretty impressive. Our devops team reached for Aurora way before that.

> 6 NVMe drives in raidz-1, 1024GB of memory, a 96-core AMD EPYC CPU.

Since you’re self hosted, I’m you aren’t on AWS. How much is this setup costing you now if you don’t mind sharing.

> A single database with no partitioning (I avoid PostgreSQL partitioning as it complicates queries and weakens constraint enforcement, and IMHO does not provide many benefits outside of niche use cases).

Beautiful!


> Since you’re self hosted, I’m you aren’t on AWS. How much is this setup costing you now if you don’t mind sharing.

About 28K euros of hardware per replica IIRC + colo costs.


Yearly, 28k Euros I presume.

Damn. I hope you make enough revenue to continue. This is pretty impressive.


No, one time + ongoing colocation costs.


So that's about 467 eur per month per server assuming a 5 year term. Anyone know what it would be on AWS with Aurora? I had a quick go with https://calculator.aws/ and ended up with a 5-figure sum per month.


I tried for fun:

https://calculator.aws/#/estimate?id=cfc9b9e8207961f777766e1...

Seems like it would be 160k USD a month.

I could not input my actual IO stats there, I was getting:

Baseline IO rate can't be more than 1000000000 per hour.


The CPU itself is around $8-10k for a top-end AMD Epyc, $15-20k for the rest of the server, including memory and storage is probably about right. There are still $100k+ servers, but they tend to be AI equipment at this point, not the general purpose stuff, which is sub $30-50k now.


I mean no disrespect, but it is stunning how hard the idea of owning your own hardware is to a large percentage of the tech population.

You can just… own servers. I have five in a rack in my house. I could pay a colo a relatively small fee per month for a higher guarantee of power and connectivity. This idea also scales.


> 16TB without nothing weird is pretty impressive. Our devops team reached for Aurora way before that.

Probably depends on the usage patterns too. Our developers commit atrocities in their 'microservices' (which are not micro, or services, but that's another discussion).


I continue to find horror shows during incidents; sometimes not even the cause, merely a tangential rabbit hole I wandered down.

“Are you… are you storing images in BLOBS?”

“Yes. Is that bad?”


Your replies are really valuable and informative. Thank you so much.

Question - what is your peak utilization % like? How close are you to saturating these boxes in terms of CPU etc?


I’d say 60-70% overall cpu usage, including database, ingest workers, web app and search.

> Your replies are really valuable and informative. Thank you so much.

Thank you!


I'm also self-hosting Postgres, and the project is getting to the point where a standby would be a good idea to ensure higher availability.

Did you use any particular guide for setting up replication? Also, how do you handle failover/fallback to/from standby please?


Not OP, but managed a ~1tb Postgres install for years. You should use something like pgbackrest or barman to help with both replication (replicas can pull WAL from your backups when catching up), backups, and failovers.

At least for pgbackrest, set up a spool directory which allows async wal push / fetch.


> 6 NVMe drives in raidz-1

Did you benchmark io rate with different ZFS layouts?

6 NVMe drives in mirrored pairs would probably be substantially higher latency and throughput

Though you'd probably need more pairs of drives to match your current storage size. Or get higher capacity NVMe drives. :)


> It's self-hosted on bare metal, with standby replication, normal settings, nothing "weird" there.

I can build scalable data storage without a flexible scalable redundant resilient fault-tolerant available distributed containerized serverless microservice cloud-native managed k8-orchestrated virtualized load balanced auto-scaled multi-region pubsub event-based stateless quantum-ready vectorized private cloud center? I won't believe it.


+1 as I'm hoping this is sarcastic humor.


from riches to RAG.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: