Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anyone who's been asked more than a couple of times for data that requires a non-trivial bit of ad-hoc SQL will know the sinking "oh shit" feeling that comes when you subsequently realise you borked the query logic in some subtle way and have accordingly emailed out a completely bogus answer/report.

From the article it doesn't seem that GPT is significantly better or worse than a human in this regard, although an experienced analyst would over time decrease their number of such errors.

The best fix imo is to slather a battery of views over your data to minimise the risk of getting the joins wrong, and it'd be interesting to see how that approach could improve the bot's quality.



I've done too little SQL to be traumatized by it, but I experienced it plenty of times doing more complex data processing on the backend.

I would e.g. spend a week on first discussing a solution, then researching both theoretical background and relevant libraries, and finally writing code to do the math and render the answers (probability distributions). I eyeball them, they seem to all match expectations, so I declare victory, post some nice screenshots to team Slack, and close the ticket.

Feeling high on success, I expect the weekend to start nicely - except, winding down on that Friday evening, I follow that little thought at the back of my head, that I realize was there since the start, quietly poking for my attention. I run one more sanity check, different from the ones I did before. The results come out completely wrong. Only then it dawns on me that I made a terrible mistake early on - I misunderstood how to implement certain step and forgot about a theoretical assumption that must hold. And so, instead of a relaxed weekend, I now dread having to explain to everyone on Monday that a) I actually need another week for this, and b) the nicely looking graphs they saw are all pure bullshit, and the things that looked correct only seemed so, and only by accident.

After a few such cases - including one where the higher-ups were nonplussed, because after I corrected the mathematical errors someone made a long time ago, the pretty graphs that looked nice on the demo stopped looking so nice - I learned to sanity-check my understanding of the problem as early as possible, and then also sanity-check the code I'm writing, and then keep testing as I go. It's all too easy to write "trivial" code that "obviously" does what you think it does, only to realize some time later that it's throwing out an entire dimension of the input data, and nobody noticed because test examples are too trivial, results look roughly correct, and no one is paying attention.


The traumatising experience of SQL is that the scenario you describe being scale-invariant. The smallest question and the largest question both have the potential to be completely wrong for any inane reason. Often turns out that metrics the company has been using for months are totally bogus too in my experience, which is not a fun conversation to have.

So the life of an SQL analyst in their early days (it gets better as they learn to trust nothing) is either continuously pumping out garbage or constantly needing to go back and, very awkwardly, correct and redo work that people thought was finished (and those people were often happy with a bad result because it looked like something interesting was going on).

This article is highly misleading, approx. none of the time an analyst spends is working on writing 5 line queries. The analyst is the person who knows that the data table for the series A and B rounds has different semantics depending on the data round because someone screwed up the table design beyond recovery and therefore a plain COUNT(*) will give incorrect results. This tool would let people who don't understand that pump out garbage faster. That might be a win regardless, most analytics requests cannot possibly be value-add.


It's so refreshing to read this discussion. I've had these thoughts and feelings a lot but this is the first time I see them reflected in other people!


This is very on point.

As an Analyst your asset is to own the data and know all the odds and quirks.

Very similar to how developers need to own their codebase. Yes you can ask chatGPT to write a complex function. But your job as a developer is to see how it fits in with the rest of the framework. The logic is the simple part.

As many pointed out you also need to be brutally honest with the numbers and also the problems you may have caused or fixed.

Trying to hide any problems will very quickly throw you downhill, into an extremely uncomfortable position.

Although before it acquires feelings and a sense of job safety, I guess chatGPT can be better at that part.


Your anecdote gave me an idea. What if we use AI to enhance the communication between people instead of replacing the people?

Imagine you give a problem statement to the LLM, then it tried to find all the subtleties, confusing parts, room for interpretation, etc. All of the people involved revise the problem statement to make it more clear and give it back to the LLM. Iterate like that until satisfied


Hm. I like your idea! It's one of the few that I think have a chance of working in practice.

In my experience, there is great value in describing your ideas to people who don't have the background to fully understand them, don't really care about them, and are in a bit of a trollish mood - the ways in which they misunderstand what you're saying, or pick on (what you think are) random, irrelevant things, is highly informative. The feedback you get from such people makes you think about things and in ways you wouldn't have thought otherwise.

The problem is, of course, you generally don't have a pool of such people available 24/7. However, LLMs today seem like they could fit this role just fine. They can't and won't understand the point you're trying to get across - but they can and will (if asked) pattern-match on vague language, logical jumps, sentences with multiple meanings, etc. They'll get it subtly wrong, too - much like a bored friend who's mostly focused on playing Angry Birds and hears only every third sentence of your monologue, and then blurts something to keep you talking for the next 30 seconds so they can focus on aiming the birds at the towers of pigs.

I would totally use a LLM-backed tool optimized to facilitate such conversation sessions. I actually tried this in the past, in AI Dungeons, and results were encouraging (i.e. responses got me to think in ways I normally don't).


If you’ve had some success with an existing model, I think I’ll explore the idea with GPT3! Getting the prompt right is gonna be tricky, do you remember how you got AI Dungeons to play along?


I'd set up a basic story prompt, describing the roles and personalities of my character and a few NPCs, as well as relationships between all of them. I would then introduce my idea in form of a role-play, usually with my character blurting it out - then let the language model fill in reactions of other characters. From then, I'd just play along with the story.

It wasn't the most efficient way of extracting commentary out of a language model - particularly out of one that was optimized to generate plot twists and setting changes instead - but it was a very fun way.


Asking the right question is surprisingly hard sometimes. I'd love a tool I could converse with to come up with a question at the core of my thoughts.


Rubber Duck AI chat bot, you explain your problem and it critiques your wording leading you sharpen your argument.


> Rubber Duck AI chat bot

This sounds more like friendly AI than [different] "Devil's Advocate chat bot."


That sounds like something I’d use


I really don't know why you think an AI as we currently have it could do this when it can't get these queries exactly right.


I really don’t know why you feel the need to point this out when I made no claims about feasibility. I think it’s worth exploring even if our current tech isn’t up to it.


When I was dealing with something similar and I didn't know how the result should even look like, something I did a few times was to separate some sample data, implement the idea using different technology than I was using for that project and then compare the numbers. Helps with debugging too.


That’s a great writeup of what it’s like to learn to do this job ahaha.

Everbody goes through this I think (Arguably I’m still going through this…)


SLPT: on the Friday night you can quieten the little voice by drinking beer, and then you will promoted faster.

If you really want to play for keeps then on Monday you can explain that you had a fantastic idea over the weekend for how to improve the results and make them more accurate.


> b) the nicely looking graphs they saw are all pure bullshit, and the things that looked correct only seemed so, and only by accident.

They’re not “bullshit” per se, they’re just “prototypes” for further refinement :)

The road to success is trodden in stumbling half-drunken steps on the pebbles of failures.


I've been there except instead of explaining to everyone on Monday, I didn't spot my mistake for two weeks _and none of the report users did either_. Nobody died but it wasn't my finest hour.


Lol, countless times that I ended up with the right solution in my head while taking a shower or the day after, because after chillin my mind found that the previous code was wrong.


I feel the pain.

Programming back-end code in an IDE takes you very far from working with the numbers on front of you.

Is it possible to prototype in notebooks and add the results and input to unit tests?


To me this boils down to Churchhills adage "Don't trust a statistical report unless you fabricated it by yourself". There are so many subtleties when translating business questions into SQL e.g. was the average deal size in the post's "trend" example computed using average/median/geometric mean formula? Should the trend only contain historic or also projected values. No serious report comes without data definitions, then its an exercise to the reader to spot flaws there.


I had not heard that Churchill quote, but now I feel vindicated that I take the same approach as him!

I think this can also be extended to data tools. I've had so many experiences where a SaaS tool that tries to automate away the manual SQL analysis shows something totally wrong. I tend to "Not trust, verify" anything coming from a SaaS UI when serious business decisions are about to be made based on it.


> slather a battery of views over your data

One needs to be careful with this approach in terms of query performance, though. Using simple views with a couple of joins and some filtering is fine, but be very wary of stacking more than 1-2 layers of views calling each other, and especially of using things like aggregates/window functions in views, if these views then are then used as building blocks for more complex queries.

That's a recipe for breaking query optimizers and ending up with very bad query plans.


Use case dependent. When I am tasked to generate some ad hoc analyses, performance is a non-issue. The query is only going to be run the handful of times while I iterate on the idea, and I would much prefer some convenience views rather giving a hoot about optimal query planning.


Simple views are perfectly fine - it's mostly nesting of views with aggregate functions and other complicated stuff that is bad. And if ad-hoc is a big part of what users are doing with an app/database and you don't care about performance, your angle sounds reasonable.

As an app developer/development DBA, I care mostly about performance of the queries that are known at development time, though, so I'm a bit biased.


> doesn't seem that GPT is significantly better or worse than a human in this regard

Probably, but I think a human is much more likely to realize that they made a mistake and good-luck convincing GPT that it's wrong about something non trivial.


Well it's trained to simulate humans and that includes simulating the errors.


This is an exceptionally deep, interesting, and important statement. Is it true? Does GPT-3 occasionally make spelling errors for instance? Is the rate of spelling errors higher or lower than the average human? How about versus the average in the training data? How does the prompt affect the frequency of such errors. Ditto for other kinds of objectively measurable error.


In my experience it makes less errors if you prompt it to reason carefully step by step.


Opposite of my experience. If you talk it through each step you can back it into persistently claiming that 0^2=1.


I'm pretty sure they're talking about errors it makes as part of neutral generation, not how much of a pushover it is.


> The best fix imo is to slather a battery of views over your data to minimise the risk of getting the joins wrong, and it'd be interesting to see how that approach could improve the bot's quality.

You're correct, but organizing queries into views and CTEs are an implementation detail that would be done by the dev in practice, but would not show up in a stackoverflow answer because it's probably not relevant to just answering the question.

I love how language equivalences such as informal vs formal code can throw off the quality of what the chat bots generate!


You can ask an analyst to double check their work, and a good one will find any such mistakes.

If you ask GPT-3 to correct an error it always confidently tells you you're wrong and it made no mistake (in my experience).

It's funny but I imagine more annoying when it's eventually integrated into a paid service people rely on.


Ha! See, my experience is a bit different.

Most times, it acknowledges its error, and sometimes it successfully corrects it. That is, IF I give it specific enough information (ex: "why do you treat this variable as a string, it's a number?", or "don't you need to unpack the json first?").

Other times, it fixes something, but breaks something else in the process, and no matter of coercing makes it go to where I want.

But, yeah, the "confidently wrong" aspect needs to be kept in check. I tend to treat it as any other source of information... with a grain of salt.

What's neat with code is that you can prove the "truthiness" of its affirmations pretty quickly.


Did you read the blog post? By pasting in the sql error, it got better.


>> The best fix imo is to slather a battery of views over your data to minimise the risk of getting the joins wrong, and it'd be interesting to see how that approach could improve the bot's quality.

Yes, the success of the bot's query (at least on GPT-3) is a combination of (a literal and good) prompt and how well did you explain each of the columns to the model.


20 years ago I Knew a sql analyst that would write his tests ahead of time.

Give X data, report should be Y. He would then figure out the SQL. He Didn’t know any other programming. This was before TDD took off. Blew my mind.


People dismissed the XP book by saying it was just stating practices that already existed. Articulating and selling an idea is a huge part of the challenge. Fifteen years later if you're having any success at all with Scrum it was because your team adopted the half of XP that didn't outright disagree with Scrum.

I liked the Refactoring book because it gave me names for things I figured out in 1996 (unfortunately while using VI as an editor, gave myself RSI in the process). It was part of an education in the value of books that help you only by existing so you can hand them to people who ask a lot of questions.

I had a boss who was big in the Kanban 'movement'. We got along famously, in part because I re-invented a Kanban board in 1994 when faced with a convenient whiteboard for the first time.

You can do a lot of amazing things with decent first principles. You can make some amazingly large messes without them.


> I had a boss who was big in the Kanban 'movement'. We got along famously, in part because I re-invented a Kanban board in 1994 when faced with a convenient whiteboard for the first time.

I was showing slides of a pre-1900 engineering office as part of a presentation a while back when someone shouted out "look - they're doing kanban". And sure enough there was a pinboard with process headings on it and pinned items the background.

It would not surprise me if they found one in Pompei


> It would not surprise me if they found one in Pompei

"Evacuate" was stuck in the backlog column


That's more or less how I learned as well but these days we have plenty of material available.


What we need is to have ML inputs judged as they're being ingested. Like PageRank for data and sources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: