Hacker News new | past | comments | ask | show | jobs | submit login
I made advanced BI queries with Scratch puzzle pieces (pixelspark.nl)
142 points by misterdata on July 17, 2022 | hide | past | favorite | 20 comments



I'd like to take a moment to appreciate that the first thing I saw on this page was a GIF explaining exactly what the blog post was talking about, but visually. I didn't even have to scroll down, and 60 seconds later, I had a much better idea of what this post was communicating.


I'm surprised there's no mention of Ab Initio, which looks like [1], in these threads. AFAIK they were the pioneer in BI ETL while everyone else was copying.

Then again, they are pretty secretive, and that may be why I can't find any videos of the tool itself in use (edit here's one [3]), maybe due to copyright takedown requests.

That software was the successor of Thinking Machines [2], which was the hot AI company of the 80s AI boom. The software itself is quite good at parallelizing logic. And, the graphical front-end makes it easy for non-programmers to pick up the tool.

[1] https://3.bp.blogspot.com/_FwFkbVFfnGQ/S1qa8lgcw4I/AAAAAAAAA...

[2] https://en.wikipedia.org/wiki/Thinking_Machines_Corporation

[3] https://www.youtube.com/watch?v=tlZlpsa0jyA


Probably a bit of a tangent, but the BI world sure loves their no-code tools. It's one of the few sub-industries where they really took hold.


Not because they're any better of a fit for BI stuff mind you. At my place of work, the BI department is a black hole you can keep shoving more compute into, and they'll just come up with worse queries.


It's still early in the space, but there are real advantages to defining visualisations, metrics, and reports in a code-first way. Aside from being able to do more powerful things with something like pandas as opposed to Tableau, being able to check your dashboards into git, create reusable libraries internally, review changes in PR format etc., goes a long way to making things reproducible and less of a black hole. Also, creating reports and dashboards programmatically opens up some interesting use-cases where a drag and drop tool would break down.

We're building an open-source framework for creating reports and dashboards using Python, which you might find helpful: https://github.com/datapane/datapane. You can think of Datapane as the view layer / interface for any BI analysis you're doing using the open-source Python ecosystem. Any feedback would be much appreciated!


I mostly agree, but I think there's a reason behind the madness. Back in the early 00s to early 10s they really were more productive, "secret alien technology"-type tools.

But these days obviously ten lines of python (or whatever else) calling the database do exactly the same thing, except you actually have git, debuggers, ides, etc.

Many BI departments still cling to them because they're comparing 2020s no-code tools to early 2000s programming languages.


Tableau is way faster to demonstrate your work on.

These professions need to get shit done fast. The workplaces that need them are extremely reactive to the market, etc.

So having debuggers isn’t necessarily a thing they’re interested in.

Now if you’re in a slower paced BI environment that’s where you generally see more traditional programming tooling be used.


It surely is fast enough, I'm not blaming people who use Tableau.

But "faster" compared to what? It's not faster than pandas/seaborn, not faster than d3... even just for super simple stuff where GUI tools shine.

If the final solution needs to be somewhat self-service, then sure, you can't expect the end user to write python or javascript, and Tableau is fine. If the end result is just a report that some technician has to prepare, then I sincerely doubt it's the fastest way to get there, even ignoring maintainability, source control, etc, where it's just a no contest.


I do a lot of BI and am a coder. I find these tools (Tableau) super frustrating. So much point and click. Such poor abstractions. Not at all DRY. I like the end product just not the process. Lacks a good API for doing it programmatically - which makes sense as they make money selling desktops.


The reason for point and click is that you need to be able to experiment and iterate quickly, which is what point and click is good for (ie tweaking knobs). BI is about asking lots of questions of your data and seeing which ones provide useful results.

If you already know what question you have in mind, then yeah, it's going to be a bit tedious.


Winner, winner. I am at a competitor of Tableau's, but the point remains the same. One of our favorite quotes was an upper executive at a major technology retailer who told us a version of "it's so cheap to fail with you guys".

There are absolutely needs for engineers who (deeply) understand SQL, can write python code, and can whip up a d3 chart. But that's an expensive project.

There are many, many more individuals in organizations who would make much smarter decisions if they learned a tiny bit of SQL, a basic data warehouse, and were presented with a GUI tool (PowerBI, Tableau, Qlik, etc, etc).


Exactly, mainly because nobody has a clue of their own data, so fast iteration is a must, and engineering in the IT departments supporting these solutions are some of the most clueless engineers I met.


Well in good tools (Tableau being one of them) there is always the raw SQL query somewhere (ideally with parameters) so that you can do the heavy lifting in code but style use the GUI for quick and nice interactive rendering.

PS: Well some tools are even cooler like QGis for map making. The GUI of this FOSS tool is heavily parametrizable and fully extensible/scriptable in Python/QT.

There is quite not not yet an equivalent in BI world...


Dis you tried Shiny, Dash or something like those tools?


Like excel this tools really help making user ->power users I think the problem whit no code is they make developer tools les powerfull instend of regular tools more powerfull


There may be no other sub-industry that desperately craves to abstract away as much complexity as possible


Thanks for making a whole blog post out of the anecdote mentioned yesterday. Really excellent to see the whole story!


I hoped it was a follow up on yesterday's comment on this. Thanks for sharing, Mr misterdata!


Few years ago I essentially went the same way though only up to a demo (because someone else got the contract) , but the target was to modify the ETL pipeline that before was implemented in hard coded T-SQL stored procedures and a Java app whose source code was evidently lost and recreated with disassembler.

The constructs in Blockly would have been used to generate the transform queries that turned input data into all kinds of summaries in snowflake schema


Old iron, 666 ~




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: