Hacker News new | past | comments | ask | show | jobs | submit login
Apache Zeppelin (apache.org)
144 points by saikatsg 7 months ago | hide | past | favorite | 31 comments



The big difference between Zeppelin and Jupyter is how you can easily build interactive notebooks with input fields, checkboxes, selects, etc. This is much closer to what I thought notebooks were going to evolve into back when I saw them the first time; Hypercard for the data engineer. Observable has kind of delivered that, but on the frontend. Jupyter seems to me to have gone down the path of code editor with cells, and Zeppelin unfortunately never got any traction.


This is possible to do with ipywidgets [0] and all the ipy[stuff] packages.

bqplot [1] for example is great for 2D dataviz, very responsive and updates real-time. Based on D3 I believe. Usually I can do what I want with base widgets and bqplot and the result is pretty.

ipyleaflet is another popular library for maps.

I especially enjoy using them with voila [2] to create an app, or voici [3] for a pure-frontend (wasm) version.

If you want to develop a widget, the new-ish anywidget library can reveal handy [4].

For an example, see this demo [5] I made with bqplot and voici, that visualizes a log-normal distribution.

[0] https://ipywidgets.readthedocs.io/en/stable/

[1] https://github.com/bqplot/bqplot

[2] https://voila.readthedocs.io/en/stable/

[3] https://voici.readthedocs.io/en/latest/

[4] https://anywidget.dev/

[5] https://horaceg.github.io/long-tail/voici/render/long_tail.h...


This is a great list, thanks!

I would add two more:

1. VizHub [1]: for D3 based visualizations. I have not tried it, but I have watched some D3 videos [2] by its creator Curran Kelleher who uses it quite a bit (oh, and a shout out to the great D3 content he has!).

2. This is slightly unusual but I have recently been using svelte's REPL notebooks [3] to try out ideas. Yes this is for svelte scripts, but you can do D3 stuff too. And on that note, svelte (which is normally seen as a UI framework) can be used for pretty interesting visualizations too, because how it can bind variables with SVG elements in HTML (you can get similar results with React as well). For ex., here's a notebook I wrote for trying out k-means using pure svelte [4]. Be warned: fairly unoptimized code, because this was supposed to be an instructive example! On a related note, Mathias Stahl has some content specifically for utilizing svelte with D3 [5].

[1] https://vizhub.com/

[2] https://www.youtube.com/watch?v=_ByiP7KM0So

[3] https://svelte.dev/repl

[4] https://svelte.dev/repl/1689f5c3699640ff86d9bd6a04ac8272?ver... Note that the "Iterate!" button iterates once; keep clicking it to move things along.

[5] https://www.youtube.com/watch?v=eNQQAkjxxdQ


Thanks for share these useful links. bookmarked.

any idea what "BQ" stands for in BQplot? I find that I am able to remember and recall tools and terms that I actually understand the full forms of :)


It originated at Bloomberg in a quant research group, hence the "bq".


I don't understand if you're saying that Zeppelin or Jupyter is easier for input fields, checkboxes, etc., though it reminds me either way of Mathematica (going strong since 1988 too!).


You can create interactive notebooks with marimo, an open-source reactive notebook inspired in part by Observable and Pluto.jl. We have sliders, checkboxes, selectable tables and charts, and more, built-in.

Here's our repo: https://github.com/marimo-team/marimo


Google Colab has this, I wouldn't be surprised if there was a Jupyter widget to implement something similar.

Edit: looks like Mercury (A jupyter extension) has them: https://runmercury.com/docs/input-widgets/


Although many of these ideas appeared on Xerox PARC and Genera machines first.

It is quite telling how long industry takes to adopt cool ideas, while rebooting some bad ones all the time.


Another nice feature was data exchange between different kernels


Tried deploying this in k8s for data analysts and data engineers to use (mostly with pySpark in mind) as a way to provide non-developer crowd with a ready-made environment with batteries included, e.g. all of the database and local s3 connections ready, popular libraries installed, secrets vault inregrated etc.

Didn't work out all that well for a number of reasons.

The most important thing is, users are used to Jupyter. Zeppelin's ui is very different, and most people are not willing to jump on yet another learning adventure just for the sake of it.

Then, it's not as widely adopted and supported as JupyterHub- with JupyterHub you can easily integrate whatever you want to. Want several simultaneous jupyters for each user? Sure. Want separate quotas, different k8s namespaces for user groups? Easy. A shitton of plugins? Here you go. A selection of different images for each user, depending on the tooling required? Welcome.

Third thing is really unfortunate, but Zeppelin proved to have a less than stellar stability and performance, at least in my experience. People are wary of something that's often unreliable.

So I've finally decided to just go with JupyterHub, and users can't be happier. Everything's fully customized, things are smooth and familiar to a non-dev crowd.

Another, and in some ways, better solution would be to go with vscode, but I doubt a typical analyst/ds would prefer vscode, at least for now.

All in all, I don't see a place for Zeppelin- it can't compete with what's already on the market and yet doesn't bring anything new and worthwhile.


If you're looking for more modern notebooks supporting Scala (and Spark):

- https://almond.sh

- https://polynote.org

Toree is mostly dead but might also get a Scala 2.13 release now that Spark 4.0 is approaching.


Unfortunately, it didn’t get enough community around, and development has stalled. For some time it was sponsored by Alibaba, but at some point of time, the main maintainer left it. Similar story with other people

P.S. I was committer there until changed job.


It's cool to stumble upon Apache projects every now and then.

Not all of them get that much love, but often they have pretty nice functionality.

I still remember that setting up Apache Skywalking was one of the easier ways of getting some APM and tracing in place, compared to the other options out there.

And, of course, the likes of Apache2 and Apache Tomcat are also quite useful in some circumstances.


As one of the people who got Sun to open source what became Apache Tomcat, much appreciated to hear that. =)


Even after all these years, Tomcat is still incredibly solid, so thanks for that :-)

Sometimes I do worry about the long term survival of the ASF. Many projects are largely supported by 1 person. A lot of projects are mostly abandoned (but not yet moved to the attic). Many others suffer from the blight of "what the hell is this for?", where their website is so vague that it might as well not exist.


Thanks for that. We used Tomcat as our app server at my last company from 2008 until I left in 2019. I'm guessing it's still be used....


Thanks for that. Ran my first SaaS on Tomcat.


Shout out to Apache Nifi


I did a project not long ago with NiFi and couldn't have been happier... And this was a use case not fully in line with what NiFi is.

The community was especially helpful, responsive, and patient with my limited understanding of their tool.

In the end it was the most stable part of the overall project operationally speaking.


I've always wondered how far you can push it to build a distributed application considering it can be clustered and you can control threading on the processors. Seems like you could prototype out a pretty large, robust backend without dealing with the overhead of individual services.


Good old Apache Zeppelin. It’s almost a decade since I last worked with Zeppelin and Spark Notebook at Technicolor Virdata. Shout out to Eric from Datalayer.


Obligatory mention of Livebook: https://livebook.dev/


Mean of me to say, but you're just better off using Jupyter as a local notebook sandbox, for one, the relevant development Docker image does bundle Spark[1], making it more convenient to fire up, and more importantly, it's used way more than Zeppelin, as orgs not using Jupyter are probably using Databricks notebooks instead, and it's split between those two.

Zeppelin does make it easier to run Scala Spark, I find, but Scala Spark usage has declined rapidly.

1. https://hub.docker.com/r/jupyter/pyspark-notebook


I worked at a non Databricks using organization and sharing Jupyter notebooks hosted on Kubernetes ended up being such a difficult endeavor that an ops team was hired for it. I don’t think we really got positive ROI on this but some people felt really cool (we had too much of a bias towards self hosting). We did need some sort of sharing and collaboration mechanism and at least for that job this checks a lot of the boxes, especially since our Spark SQL jobs couldn’t be visualized in Jupyter while I worked there.


We have found Zeppelin to largely be frustrating, bug riddled, and overly restrictive for normal notebook use cases.

I agree that Jupyter for PySpark makes more sense in almost every use case. We made the switch as an org about 2 years ago and haven’t looked back. Jupyter has its own issues but does feel much usable by just about every metric.


What is its use case? Looks like a jupiter-ish thing


I was an user of zeppelin for a couple of years (v0.7-0.8), we used it to run Scala Spark and the UI has a lot of bells and whistles to make it easier to use Spark, display spark dataframes and simple dataviz features out-of-the-box. It is a bit relatable to the notebook experience you would have in databricks.


Oh, i see, ty. Never used spark so i guess thats 1 reason ahah


I certainly felt like the use case of interacting directly with spark (through scala) and very low friction visualizations was quite nice. Not that it's hard to get that with jupyter, but batteries included, just click through the UI visualization was better for zeppelin


[flagged]


Mellow is the man who knows what he's been missing; Many, many men can't see the open road…




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: