Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
At 50 Years Old, Is SQL Becoming a Niche Skill? (zwischenzugs.com)
44 points by ingve on June 27, 2024 | hide | past | favorite | 104 comments


I think SQL is due for a renaissance, and I hope htmx is part of that. I know htmx has a reputation as being something only a backend developer could love, but I think that in a few years some of the most effective htmx developers are going to be front end developers who know the web stack (CSS, JS, HTML) well and who have figured out how to use SQL effectively.

If you are smart enough to figure out GraphQL, you are smart enough to figure out SQL, and SQL comes without the security/expressiveness tradeoff[1] inherent in front-end data access technologies. You have hypermedia-oriented tools like hotwire, unpoly & htmx pushing data access back onto the server, and at the same time things like React Server Components doing the same.

I'm very bullish on SQL.

1 - https://intercoolerjs.org/2016/02/17/api-churn-vs-security.h...


Coincidentially, SQL is having a "revival" in the streaming world... feldera, materialize, arroyo and many others are all betting on SQL as the language to express computation on moving data.


Just used 3 tier nested sub selects and joins to diagnose a major incident, the breadth of what it can process is awesome.

Without SQL there would have been no way to determine the extent of the issue.

People forget that SQL is the only 4th gen language in common prod use.

Python, Java, C# are only 3rd gen.


I think SQL doesn't go away anytime soon and will live through GraphQL.

With distributed systems you probably interface another web api of some type instead of an SQL-DB and often it is just annoying.

Some people pretend that you shouldn't use any SQL-server today and instead use current-hipster-web-accessible-database. But these often just simply don't deliver, are expensive, have performance problems, lacking compatibility and a whole other issues.

edit: Some cloud consultants increasingly say that nobody uses SQL anymore. Makes me chuckle a bit.


> Some people pretend that you shouldn't use any SQL-server today and instead use current-hipster-web-accessible-database.

More commonly, there are the people who use a SQL server at the lowest level, but then stick another database layer on top in order to provide a more friendly querying experience for the bulk of the users. The prevailing idea is absolutely that most people shouldn't have to use SQL, and most likely that they should use a hipster-web-accessible database, even if SQL exists somewhere inside of a black box.

Niche does not imply that something is nonexistent, but it does seem pretty clear that we go to great lengths to ensure that most people don't have to use SQL, limiting its use to a narrow set of developers who specialize in that niche, while everyone else uses higher level languages to communicate with the database.

Which is not at all what SQL envisioned in its infancy. It was supposed to be the language for all business people far and wide.


Can you say more about (the|your perceived) relationship between SQL and htmx? When looking up HTMX[1] I was surprised to see what looks like a relatively young project with no direct relation obvious to SQL.

[1]: https://htmx.org/


htmx is a hypermedia-oriented library (like hotwire, unpoly, etc.) and so when you use htmx you exchange HTML with the server, rather than JSON. This means that HTML construction takes place on the server side, where SQL is available.

This is in contrast with traditional SPA libraries such as React, which typically access data via JSON APIs and construct a user interface on the client side. You can't make SQL generally available on the client side due to the obvious security issues, but projects like GraphQL have attempted to make the client side more flexible. Unfortunately I believe there is an inherent trade off with client-side data access that I outline here: https://intercoolerjs.org/2016/02/17/api-churn-vs-security.h...

So, as developers rediscover hypermedia & hypermedia oriented libraries, they will suddenly have access to SQL again.

And, as I mention above, new technologies like React Server Components, by putting evaluation on the server, also make SQL more easily accessible for developers.


In Codd we trust


Codd was no fan of SQL.


based


No, the number of tools JS developers have made to avoid learning SQL if it was going to die out it would have done by now.

All that collective effort to avoid learning one language and thinking about your data structures first is staggering.


Among software engineers it’s possible SQL is becoming a niche skill. I’m a data scientist and probably write SQL at least 4 hours a day, lately that’s been more like 10. SQL is an incredible language and I strongly prefer it to python for data processing.


This almost feels like a case of Betteridge's law.

Every single job I've had in the software space in the last 6 years (and I've had a bunch) has made pretty extensive use of SQL. Obviously this is a sample size of one, but me being good with SQL wasn't especially impressive to any of my coworkers at any of my jobs.


Ironically a lot of my data science peers only use SQL to extract data, then they like to bring it into excel or python to analyze. I’m pretty convinced that you should use SQL right up until you can’t. If you need to do text analysis or compute complex statistical measures you should just use SQL for extraction (to max effect) and then use the appropriate tools. Unfortunately a lot of my peers only use SQL for about 10% of the analysis that could be done in SQL which makes peer review and team work extremely difficult. I really wish data science had just been a sub category of software engineer, where programming skills were required full stop. In data science sometimes people can have terrible programming skills but good math skills which in my opinion is like an airplane with a great engine but no wings.


My first engineering job was doing ColdFusion in 2012. Even at the time ColdFusion was considered dated, and it was extremely slow.

When I learned SQL at that job, and when I learned you can squeeze a lot of computation out of SQL, it was huge for me. I started doing as much pre-processing with SQL as I could, just because it felt like it was giving me a lot of free performance upgrades.


Yeah I have yet to really open the hood and learn more about query plans (which enable the computation ability you mention). I’ve been running into OOM errors all week and luckily the system gives me advice on where to find the cause. I’d like to learn more about how to determine what parameters to set (like partitioned vs broadcast joins) because even after all this time using SQL it still feels like an alien technology to me especially when my SQL is syntactically correct but causing worker explosions left and right lol. There’s no better feeling than refactoring the code and seeing a job that used to time out run and you see how much data is being processed efficiently like that.


Honestly even very crappily written SQL will probably still be faster than ColdFusion, particularly if you're using stored procedures. I'm not going to claim I'm the world expert of anything, and the SQL I write was never hyper-optimized, but ColdFusion was just ridiculously slow.


Sort of, what I've been noticing is that it's considered a background skill. Until the database and your queries need to be optimized for higher load, then suddenly everyone wants a DBA.


>> Until the database and your queries need to be optimized for higher load, then suddenly everyone wants a DBA.

And then the table structure, relations, indexes, and keys all get changed because it wasn't carefully designed the first time. Then changes propagate into a bunch of queries and code and it's a much bigger deal than doing it better the first time ;-)


Same here but no one hires for a DBA, just keep moving me between projects to put out fires or get them finished one-time or not later then they're already running. Few are actually interested in learning details of designing schema/indexes/queries to be good before there's a problem. I keep telling them to put example SQL queries in PR descriptions with EXPLAIN query plans hoping it will click for them.

My view is that the schema + queries is the essential performance consideration. The code that goes from one persisted state to another is largely plumbing implementation detail. Of course there also lots of precise logic there but it's a healthy alternate view. The first thing I do on any project I join is make a big ERD to know the names, cardinalities, and relations (and wonder how they keep changing that without an up-to-date picture).


I think this is one of those things that’s situation dependent. It sounds to me like your organization has a lot of data that needs to be processed. In that case it’s extremely important to write your SQL so that it executes efficiently. SQL does a lot under the hood but it won’t automatically switch a left join to an inner join if it figures out that would have no impact on the output. So if you’re in a situation where that’s important, you’ll want your pipeline building employees to be using best practices. I run into a similar problem with my peers (data scientists) when I ask them to provide a SQL query link to their charts / analysis. It’s like pulling teeth but to me it seems obvious that everyone should be doing it as mistakes would be found sooner and others could build off of your work resulting in a positive feedback loop of data understanding. But nope, not everyone seems to care about that.


People are spread too thin with their skills already, so it's no wonder they don't want to learn yet another thing (especially if you aren't the one signing their checks). The databases can be forgiving and EXPLAINs are not easy to interpret correctly for people who don't know databases that well.

A proactive solution of hiring a DBA to build things along with the FE/BE developers will likely never get traction due to costs. Businesses regularly put off performance and optimization until later and they still make good money in spite of that.


And then, ask for a backend rewrite, be it microservices or something else, just because nobody even thinks of optimizing the queries or such.


There are too many projects and not enough DBA's to go around.


So true. What I've been doing is rotating myself through different parts of the organization on projects that need a big schema/performance improvement. And hoping to have imparted some interest and knowledge in maintaining good schema design and performance along the way.


To all junior SWEs: read a SQL book over the weekend. You'll end up being the respected DB expert on most calls after literally a few weeks. Most everyone never progresses above a SELECT with a couple JOINs and then complain the DB is slow and need to push work into the backend service where usually the opposite is true.


Pretty senior developer here (manager now, actually) but I'm realizing that other than college, I haven't really picked up a book on SQL. What would you recommend?


For SQL Server, I would read through these articles:

https://www.itprotoday.com/sql-server/sql-server-database-en.... Explains role of query plan (can pull either from actual file or from an index(s)) and write-ahead-log (Microsoft gave it the unfortunate name of Transaction Log) for hardening.

https://learn.microsoft.com/en-us/sql/relational-databases/p.... Explains how to use the Query Store to identify poor performing queries.

https://learn.microsoft.com/en-us/sql/relational-databases/s.... Explains that SQL Server handles the problem of two concurrent users attempting to modify the same record at the same time via DEADLOCKS. You want to use Extended Events to troubleshoot deadlocks.

https://learn.microsoft.com/en-us/azure/architecture/pattern.... Explains how CQRS and Event Sourcing architectures can help minimize deadlocks.

https://www.business-case-analysis.com/accounting-cycle.html. I would argue the best architecture to minimize deadlocks is to mimic what accounting systems do. Multiple Clients submit order requests to a batch system that posts/processes them sequentially. Until the central batch system posts, the transaction is pending. Similar to how Amazon accepts your order request instantaneously; but it may take several minutes for your order to be confirmed.


For bonus minutiae, SQL Server has a problem with upserts. Here is the old school and the new way for handling it.

https://sqlperformance.com/2020/09/locking/upsert-anti-patte...

https://learn.microsoft.com/en-us/sql/relational-databases/p...


The second is azure sql only.


Point of clarification: I've used SQL almost my entire development career, with multiple RDBMS engines, I've simply never looked at a book solely on SQL.


Just go through the W3Schools website first. It's free and you can write queries in the little window to test out things against their simple database.


I usually recommend the table of contents of your RDBMS but just grab whatever you can access quickest which has good reviews.


Many many years ago, I simply bought the O’Reilley SQL pocket guide, skimmed through it, read the section about Window Functions and that marked the start of a moment in my career that changed my life forever.


Ha, SQL is one of only a handful of computer languages that is both older than most of us and probably will outlive most of us.


I just hope that PostgreSQL becomes able to do something as conveniently (keyword, "conveniently") as this within my lifetime:

    CREATE TABLE permission_overrides (
        -- (...)

        granted_by_user_id BIGINT NOT NULL REFERENCES users (id),
        granted_to_user_id BIGINT NOT NULL REFERENCES users (id),

        -- Ensure both users belong to the same tenant.
        CHECK (DEREFERENCE(granted_by_user_id).tenant_id = DEREFERENCE(granted_to_user_id).tenant_id)
    );
(Or whatever similarly-convenient variation we want to bikeshed about.)

I understand there are performance implications[1] with such a thing, but being able to prevent invalid data from even existing in the database is one of the main reasons I use a database instead of some dumb store, because I don't want to have weird data the moment the application has a major change (or worse, a rewrite).

[1]: That specific example looks like it would be slow as hell unless that table is mostly append-only. (EDIT: And also that the referenced rows don't change too much.)


While there are lots of cases where referencing "through" foreign key relations would be useful, this specific example is one for which dereferencing & check seems to be LESS convenient that what is available today with just composite foreign keys.

  CREATE TABLE permission_overrides (
    -- (...)
    granted_in_tenant_id BIGINT NOT NULL,
    granted_by_user_id BIGINT NOT NULL,
    granted_to_user_id BIGINT NOT NULL,
    FOREIGN KEY (granted_by_user_id, granted_in_tenant_id) REFERENCES users (id, tenant_id),
    FOREIGN KEY (granted_to_user_id, granted_in_tenant_id) REFERENCES users (id, tenant_id)
  );


I worked at an organization that ran an energy market and we had a massive amount of data stored in SQL databases and this was how you did large-scale data analysis. Most electrical engineers there knew the schema backwards and forwards and could write multi page queries (there are a LOT of tables and many many join operations) pretty quick to go through TB, probably at least a PB of data. It wasn't considered a niche skill...more like bread and butter (equally useful as Python). Every now and then we'd hire someone that had a "data science" background that wouldn't know SQL (just some stats and R or Python+Pandas) that would try to convince us we were doing things wrong, but it was generally because the "big data" they learned in school could fit into a dataframes. I'd say SQL is now more important than ever.


How do you do

    SELECT AVG(price) FROM cars
    WHERE brand='BMW'
    GROUP BY year
    ORDER BY year
without SQL?

So far, every alternative syntax I have encountered was less human readable and/or less concise.

I think SQL will never be replaced because it is not really something that was invented but rather the simple consequence of wanting to store and retreive data.

You could replace "SELECT" with "GET" or "FETCH". But in the end you need to define what action you want to perform. You could replace "WHERE" with "FILTER" or "CONDITION". But in the end you need to define what you want to retreive. Etc. So by just stating the neccessary information, you always end up with SQL. And every real alternative has the problem of being less concise, less easy to read and less easy to write.


PRQL translation:

    from cars
    filter brand == "BMW"
    group year (
      aggregate [
        average_price = average price
      ]
    )
    sort year
    select average_price
A bit longer, but I love that it reads from top to bottom. When you have a complex SQL query, I’ve sometimes found it easier to write it as PRQL and then convert it to SQL

https://prql-lang.org/


A good ORM will mostly get out of the way and let you write something like

  return Cars
    .query()
    .select(avg('price'))
    .where( { brand: "BMW" })
    .groupBy('year')
    .orderBy('year')


Like all alternatives to SQL I have seen, this is not as lean and readable as SQL.


I totally agree - but this has the advantage of returning something like a well-defined object that hews to an interface, as long as you have the model set up.

The trade-offs are usually worth it, vs, doing something like:

  const row = db.query('select whatever from wherever');
  const obj = {
    id: row.id,
    brand: row.brand,
    ... etc
  };


It looks like "row" is already the object you want. Why make a copy of it? And why copy it by copying each property individually?


Row is just a dumb object, without a type.


No, "row" is a generic type, all concrete rows have concrete types narrowing the generic type with an ordered sequence of element tags and element types. (Relations are typed similarly to rows, since they are containers for rows of homogenous type.)


So is the obj you created.


Ah, I meant to specifically decorate it as a Typescript object, but forgot to add it.


The SQL is preferable in this overly simple, cherry-picked case, but now do composition.


I've done composition in Objection ORM, and honestly, it's a fair amount of hassle.

On our projects, which are moderately small, I genuinely don't know if it is/was worth the hassle of setting up, vs. writing some repeated code. (We also have a pretty decent testing culture in place, so I would be much more accepting of DRY-violations.)


I assume the above is based on said package? Wherein lies the hassle?

    bmws = Cars.query().where( { brand: "BMW" }).groupBy('year').orderBy('year')
    bmwAvg = bmws.select(avg('price'))
    bmwMin = bmws.select(min('price'))
    bmwMax = bmws.select(max('price'))


This is just a map/filter/reduce/sort, you can do that in pretty much any language you want


The filtering on 'BMW' is the only aspect of that query that the DB server should be spending its resources on. Avg, grouping, and sorting the results are the domain of the front end that has to go through them and pretty print them anyway


You want to drop the ability to group data from database software?

Some issues come to mind:

Grouping on the client would mean that you have to send the ungrouped data over the wire/air. Which might be orders of magnitude more data than the client needs. What if "cars" is a table of car photos made over the last 15 years and there are 20 million photos of BMWs in there? You send all the 20 million rows to the client to crunch it down to 15 rows?

The ungrouped data might be something the client is not allowed to see.

Depending on the type of grouping, the client would have to reimplement fast and efficient algorithms that have been tested and optimized for decades in RDBMs. Good luck, matching that by writing some JS for the browser.


Author here. Mongodb offers primitives around filtering, group bys and orders, but calls them 'pipelines'. I'm not an expert on that, but it's quite possible. It's just a series of set and list operations after all.


It is certainly possible, but is it as lean and readable?


No.


Is it a skill, though? Ask ChatGPT, "In SQL, given a cars table, with a price field a brand field, and a year field, how do you get the average price of each brand for each year...?


ChatGPT and similar LLM will get anything more complex and interesting wrong though, even the models specialized in SQL.

And writing the query in SQL is probably also faster than trying to write down the prompt.

Sometimes it can be used to optimize your queries, but it isn't its strength either.


Use QUEL?

    range of c is cars
      retrieve (avg(c.price) where c.brand = 'BMW' by c.year)


It seems either the group or the order is missing?


Yes, QUEL is a language for querying relational databases. Relations are fundamentally unordered.

I suppose if your example is using a NoSQL (meaning non-relational) database, that is ironically queried using SQL, then you might need something else. But my understanding is that we're back on the relational kick these days.


Everyone here missing the point. Yeah it’s doable elsewhere, but the SQL is so perfectly readable.


I think relational databases as a technology are absolutely the bees knees. It still boggles my mind that devs are choosing NoSQL databases for their products right out of the gate and throwing away 50 years of relational alegbra database technology.

SQL as a technology did an admirable job of allowing practitioners to express relational queries, but let's be real - the syntax of SQL, especially for advanced queries, was always pretty awful. I think that syntax, more than the underlying DB technologies, was the number one factor driving people towards NoSQL unnecessarily.

What would be amazing is if the RDBMS community could create an industry standard querying language that supported relational algebra but actually had a pleasant development experience.


For my work I use SQL pretty extensively, the PostGreSQL flavor, and it's really powerful. I wonder how it is possible to do without except for very simple queries where the ORM might kind of work.

There is however a weakness of SQL is that it's purely declarative and it's difficult to make sure it does the right thing. I often find myself in front of queries that are badly optimized and I'd welcome a language that would be less declarative where you could tell the database engine which index you want to use and how explicitly. The optimizer has table metrics but with the domain specific knowledge the programmer has generally better insights.


A basic understanding of SQL is one of the most useful skills a software Engineer can have. Sooner or later, you will get to face a database system that requires it, either for actual application development or for data mining.

Even if you don't use it all that often, I think that every developer should get some exposure to Functional, Logic and Declarative languages - and SQL is a decent way to learn more about the "declarative way of thinking", even if it isn't used in a fully declarative fashion in many practical situations.


I’m getting pretty damned tired of NoSQL as a default. Picking it because it matches your use case exactly is one thing, picking it because you want to avoid modeling your domain, building a schema, and managing migrations is a fool’s errand. All you’re doing is trading problems and ass ache, but I think NoSQL is less flexible than SQL. Hell, DocumentDB is built on Postgres, allegedly (is that confirmed?).

I started at a new place that uses DynamoDB and I can’t articulate how much I loathe it. Currently rewriting it into Postgres as we speak.


> picking it because you want to avoid modeling your domain, building a schema, and managing migrations is a fool’s errand

For anyone in this situation, maybe try tricking the NoSQL advocate like I did once:

Postgres's json column is just like Mongo, except you also so get the same and more because it's a relational database on top of that. We can do some stuff unstructured, then the rest structured.

I'm pretty sure that's not strictly true - I don't know a lot about Mongo, but neither did the person advocating for it on my team. It was enough though and the vast majority of the data ended up in normal columns instead of the json columns, with zero pushback.


When I started my dev career in the early aughts, by no particular plan, ended up down the route of a DBA after doing a few Access to MSSQL migrations. Wrote some really gnarly SQL sprocs back in those days.

In those early days, ORMs were really bad and didn't feel very ergonomic to use.

But the tooling has come such a long way. Today, ORMs are generally pretty competent for 80-90% of the use cases I would find in a typical business app. Microsoft's Entity Framework always seems to surprise me when I find that I can do fairly complicated queries and have it generate almost perfect SQL. It's migrations are pretty good to the extent that it's rare to have to manually amend its migrations. Even though I think I'm quite good at hand rolling SQL, EF Core just makes it far more productive to work at the application layer instead.

I think the biggest hurdle to SQL today is that the DBA is a dying breed and SQL hasn't really had a "revolution" to meet the (sometimes perceived) needs of younger teams. For example, scaling relational database still requires more forethought and each approach comes with some limitations and gotchas. On the other hand, the rise of NoSQL databases and JavaScript in the application stack means that document-oriented databases like Firestore and CosmosDB are perhaps easier for teams to adopt, even with the limitations, because infrastructure concerns (scaling, replication, etc.) disappear.


Good/serious SQL I think yes. It's gone except in legacy systems.

We see a lot of complexity moving into the application and the rdbms used mainly as a dumb table storage. The whole idea of normalised models is out the window, which kinda makes sense because that's a carryover from the 70s when storage costs were high and now they are negligible and development costs/time are instead the barrier. Really nobody gives a crap about their database size :P

I do see some stuff like stored procedures for security but the whole complex 10-table hyper optimised join, the kind you tweaked for a week so it could finally run at an acceptable speed and you'd feel like a magician, that stuff seems to be over.

These days if you need some kind of summary you just add it to the business logic and keep a separate table updated. Of course you have total duplicate information and a risk of mismatches but at agile breakneck speeds and basically zero storage costs this is not an issue these days.


I never ended up taking those database electives at uni

In practice though, SQL just seems like a lower-barrier to entry lingua franca than C (for its use)

Unless it’s absolutely essential to your performance -like GIS or data-driven stuff (at which point you might be looking beyond SQL) - I don’t really see it as a highly specialised niche skill

I think most people see it (perhaps rightfully) as a commonly understood but seldom directly used paradigm in isolation

Picking up an ORM is comparatively child’s play but not understanding the basic concepts of SQL is an uphill battle for even the simplest CRUD applications, even if SQL isn’t used

Lingua francas such as C/SQL/HTTP in computing are perhaps even more important as paradigms of thought rather than tools to be used

Especially as we seem to year on year move higher up the tech stack for the foundations to greenfield projects


SQL is bifurcating, becasue it's kind of two things.

Good SQL is becoming more specialized for dev. No longer are the DBAs tutoring the devs to avoid inefficiencies. Now, the DEs handle really complex pipelining, and get really good.

But SQL isn't just used to "build" - it's also used to "access." And that's less specialized. The PM, BA, etc can all use ChatGPT/Stack Overflow and get things done quickly. And it's a virtuous cycle: the more people access data, the more the DEs are asked to clean things up, the easier it gets to query, and the more people try/succeed.

So SQL-for-development is increasingly niche, while SQL-for-access is increasingly common.


SQL is the opposite of teenage sex - everybody does it, nobody talks about it.


I find it really weird that a discussion on the state of SQL right now doesn't include any mention of Snowflake or dbt. SQL is _everywhere_ in the data engineering world right now.


Data engineering is, indeed, a niche skill.


I think some of this is due to SQL frameworks becoming a must-have. Which is kind of fine, I guess, because it's certainly nicer to have your objects coming back to you in some known format rather than manually massaging raw row responses.

But at the same time, a fairly basic SQL query I could bang out in two minutes becomes an exercise in acrobatics & research in how to do it right in this version of this particular framework.


Within the decade it's likely the vast majority of programming work will be done by AI, being considered too complex and dangerous to trust to human beings. So all such skills and knowledge, including SQL, will become equally niche as job opportunities disappear and schools abandon CS for prompt engineering and AI driven education. We're not there yet but it's going to happen.


Yes. I’d consider myself an “intermediate” DBA but I can command an insane salary and I have people tripping over themseves to hire me.

Nobody cares about SQL until it takes 15 seconds to load your user facing login dashboard.


100% this. I heard a fun saying when I became a professional dev:

“When you’re fed up of keeping up just retire in to SQL, it’s the best pension there is”


What level of optimizations do you normally need in those cases? Just adding an index, removing subqueries or something more complex?


“it depends”

but in all seriousness, it depends. adding an index. removing/consolidating indexes. breaking the queries down into individual UoW and forcing intermediate materialization. identifying platform-specific optimization barriers. rearranging sufficiently complex query semantics to force behavior you expect.

99% of the cases i’ve personally had to resolve over the last 15 years have been the result of sql hero queries that try to do everything all at once. this is exacerbated by orms that generated bad sql but was acceptable at low cardinalities. under scaled-up concurrency and data volumes they can’t deliver the necessary performance anymore.


Where do you find insane salaries for DBAs?


You can do a lot with MongoDB, and maybe it does everything you need but you can do a lot more with a full-fledged RDBMS such as PostgreSQL.

Since MongoDB is so cheap, quick, and easy to get going my team tends to make that the default option. In many cases we've been able to run for years on MongoDB. In other cases, the needs changed such that we need updated DB capabilities and so we migrated that data from MongoDB into PostgreSQL.

That strategy has been working out really well for us.

As far as SQL itself becoming a niche skill, optimized SQL has always been a niche skill, and many data scientists don't have this skill. It seems many applications have that one "killer query" that simply takes too long to execute. A DBA will be able to optimize that query for you. They may have to create new indices and who-knows-what to optimize the query and almost always they use engine-specific SQL to optimize the query. You need people on-hand who can optimize SQL queries for whatever RDBMS you're using.


> As far as SQL itself becoming a niche skill, optimized SQL has always been a niche skill,

Skill issues are not the responsibility of SQL itself.

Also you even mention yourself that MongoDB is not sufficient so to me at least, it sounds like you shoehorn in a PostgreSQL when in reality, using the right tool from the start would have avoided all that work.

At the end of the day its a very intuitive query language. I'm yet to see one that surpasses it. Nor am I convinced there is a need to change it.


The so-called "right tool" is also considerably more expensive on AWS. We let our actual needs mandate the tools we use. We don't play the game of gold-plating our tools because we might need it someday. Also, migrating data from MongoDB to PostgreSQL is trivial. Since our architecture mandates that clients access data via APIs then the clients aren't affected by the change.


> The so-called "right tool" is also considerably more expensive on AWS.

I am currently in the process of reducing out AWS service coverage. I'm not sure if I've phrased that right, but reducing the amount of scalable services that they suggest and offer and that were promoted to the company I work for.

It's a numbers game, if you are serving a lot of clients, or processing a lot of things, AWS is fine, next question is, what is a lot?

All things are relative to workloads of course, but lets say you're serving 1 million requests a day, not concurrent, but overall, if you have an API gateway tied to aws lambda or step functions etc. Okay, I can understand you might want to scale up some of the 'workload' services.

Out of those 1 million requests, it's worth a companies time to take the step to evaluate, "Hey we have a contract with AWS, maybe a fixed price contract, but do we need it ?"

Fast forward after you realize maybe 10%~ lets say can be left as 'scalable' because they're CPU intesive or they could be consolidated to be a single 'microservice'. Great, now you have your scalable workload, the rest can be thrown on a single server..

> Since our architecture mandates that clients access data via APIs then the clients aren't affected by the change.

Architecture by definition should be forward stable. It shouldn't mandate anything, it should just facilitate the requirement. How you implement it isn't an architects concern (in theory).

AWS kind of makes a lot of tech debt here that is hard to justify to the people above. Because they've already paid. So now as a software engineer, I found a ~40% saving possible, but it doesn't sound right.

I'm honestly getting to discouraged with things. The MAX client list we will ever look at I was told was max 500 people. Maybe 200 concurrent.


I was specifically talking about the pricing between DynamoDB and Aurora running the PostgreSQL engine. If DynamoDB satisfies your needs, then you should run it - because it is a lot cheaper! But, it's not an RDBMS. Sometimes you really do need an RDBMS. Like I said, starting off with DnyamoDB and migrating to PostgreSQL is simple - and your clients aren't affected so long as they were never directly accessing the data store. That was my point. Architecture is what allows you to make these kinds of changes to your system without impacting every other component comprising the system.

Generally-speaking, API based systems and interactions between components in the system, are very scalable. AWS facilitates the creation of such a system. Utilizing their serverless components essentially forces you to utilize best practices when building your system - which can be a benefit to teams, especially readl-world teams struggling with a bit of dysfunction.


Betteridge's law of headlines: "Any headline that ends in a question mark can be answered by the word no."


"Does Betteridge's law of headlines still apply in 2024 ?"

no.


But your comment is not a headline. You gotta at least write a blog post.


That demonstrates the point because it's a boring headline. The clickbaitier version would be something like:

"It's 2024. Can We Just Forget About Betteridge's Law Already?"


I could just throw in a mention of Betteridge's law, but the question being asked is rather nuanced and is worth actually answering:

No, it is not. SQL is a way of thinking about your data, and querying against those assumptions and mistakes. If you think SQL is just a dumb row storage engine that you write garbage SQL for (ex: every single MySQL use I've ever seen), then you failed SQL 101.

A good proper use of a real SQL database (I'm obviously a huge pg and sqlite fan, but no hate towards DBA wizards that fight Oracle, DB2, and MSSQL every day at work) can shepherd your data, maintain its consistency, and conquer some of the weirdest footgun anti-patterns that keep showing up in business code that use 'modern' nosql databases.

To be absolutely fair, all technologies are a set of tradeoffs, and you have to understand them; if you don't, it will be fatal. Not today, probably not tomorrow, but in 3 years when your startup runway ran away. As technology advances, the meaning of those tradeoffs change; how I would answer some questions for people 10-15 years ago isn't how I'd answer them today.

Scalable DBs that make other tradeoffs are slowly being eaten by the march of technology. When someone picked up a mishmash of parts that don't fit, a nosql row store here, a nosql columnar store there, a dedicated event logger somewhere, all glued together with some message passing bus: its all people who don't understand the tradeoffs because they never learned the thought process behind how to use SQL effectively.

If you're asking "is SQL becoming a niche skill", you're really asking, "is thinking a niche skill?". Half of HN seems to absolutely adore the fake AI scams and willingly threw themselves down that flight of stairs, so I'm worried some people here really do think the answer to "thinking is a niche skill" is yes, and a dying one at that.


No. Those who refuse to learn it will be niche.


No


NoSQL application databases still a thing?

No data warehouse?

From a language/interface standpoint, there is no alternative.


SQL is not going anywhere. If anything it’s becoming more and more popular.

Every developer I’ve worked with who touches server-side code has eventually said some version of ”Bah damn these ORMs can we just write SQL?”.

And now with local-first there’s a growing trend of putting SQL on the client as well. Working directly with an in-memory (ish) SQLite and syncing to the server occasionally is super nice it turns out.


s/SQL/advanced SQL/ FTFY - SELECT/FROM/WHERE/GROUP BY for int/float/string is not hard or niche.

2c as DBA & researcher: advanced SQL is becoming irrelevant because LLMs are better at writing complex SQL than 99% of the available experts, who routinely botch NULL handling, datatype and index selection, performance optimization, etc. I see no reason why anybody should memorize the arcane syntax of SQL features like sliding window functions, exception handling in stored procedures, etc.

If this seems radical, consider that experts rely on query optimizers and EXPLAIN and people rely on the planner/optimizer for 99+% of queries. That was a radical position in the early 1980s.


I would love it if an AI could build what I’ve been banging my head against for over a year. How many lines of SQL can an LLM write and does it come up with the pipeline architecture all on its own? How does it incorporate subject matter experience to aggregate or transform the data? Like I said, I would love it if I didn’t have to worry about the nitty gritty stuff. But I have not seen that type of performance out of an LLM. Also I don’t get the comparison between LLMs and query optimizers which seem to be much more deterministic and logical about how they choose to build the query plan.


Copy paste the CREATE TABLE statements then ask it to write the query. It's amazing.


I wouldn't even know how to prompt the complex queries I have in mind. For simple queries that an ORM could write, I see, but for something complex that generates actual data from the tables, I don't see it coming.


Would love to know which LLMs you’re using because every single one I’ve used has been hot garbage that can’t do anything beyond joining a couple tables and fails miserably at basic GROUP BY/COUNT aggregates. Window functions? Fuggetaboutit


Paste an example and I'll show you! For me, I use ChatGPT 4 and start with the create table statements.


LOL...

And then there is Golang. SQLC ( https://sqlc.dev ) becomes a source of truth not a sink... mix in some yaml and you have your json tags and validation mixed in.

Candidly good engineers are still using SQL...


I expect that if you look at all the users of sqlc, most of them are simply using it to transform SQL queries into a different query language (e.g. REST) for everyone else to use. That supports the idea that SQL is a niche skill.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: