Hacker News new | past | comments | ask | show | jobs | submit login
Pluto.jl – a reactive, lightweight, simple notebook (github.com/fonsp)
235 points by dunefox on Aug 28, 2020 | hide | past | favorite | 60 comments



I've switched from Jupyter to Pluto recently. Here's a few experiences with it.

* The fact that I can actually use the source files later because they're just Julia files is incredibly useful. I often copy-paste from them into actual REPL-code, and sometimes I just polish the notebook until its source becomes usable as a command-line tool.

* I like the reactive notebook concept. It does really help with bugs

* Pluto is still rough around the edges. Too few keyboard shortcuts. Buttons and text are tiny, afloat in an ocean of useless whitespace. pushing to LOAD_PATH doesn't work properly. Pluto is a very young project and just now gaining attention in the Julia community, so I'm confident these usability issues will improve.


The inability to simply import jupyter notebooks as python files has always been a point of friction for me, I’m glad to see this is a main feature for Pluto.


I'm not sure if Python has something like this, but in julia we have https://github.com/stevengj/NBInclude.jl which allows us to import jupyter notebooks like regular files.


I did not know about this. Nice. Thanks!


What do you think about jupytext? https://github.com/mwouts/jupytext


I frequently use Jupyter Lab notebooks as "pre-programming" sessions before coding an actual app or script in Python. I'll document all the interesting bits in Markdown cells, and test the bits I don't understand entirely in code cells. Then I copy/paste chunks of working code out into my text editor or IDE to begin "real" coding.


You mean like nbdev?


> I like the reactive notebook concept. It does really help with bugs

Can you say more? An example, maybe?


The idea is to keep the amount of global state as low as possible. Pluto.jl creates a dependency graph between cells: If cell A defines foo, and cell B uses foo, then cell B depends on cell A. Whenever cell A is updated, cell B will automatically be re-evaluated.

(edit: I suppose it's not really global state I'm talking about as much as hidden state left over from overwritten or deleted cells)

For example, I typically create tonnes of code cells when I visualize and try to get a sense of some data. The large majority of cells (probably >80%) are then deleted, and whenever I find a trivial bug in some code, I fix the cell where the bug occured.

But now - which cells had I rerun after fixing the bug? And did any of the run cells depend on some variable in a deleted cell? If so, the notebook will no longer be reproducible when I re-run it? It's impossible to keep track of. So when I use Jupyter, I frequently press the "restart kernel and run all" option. But of course, that is slow. So I need to serialize a lot of data, which is troublesome. Pluto completely circumvents that problem.


What happens if you reassign a variable below? Like:

  a = 1
  println(a)
  a = 2
Does it show 1 or 2?

Edit: tested it, it throws an error "Multiple definitions for a: Combine all definitions into a single reactive cell using a `begin ... end` block."

Not sure I like that way of working.


In a more complex example where you actually take a variable, do some operations to it, then reassign it, Pluto.jl encourages you to separate that into multiple cells. The reason is each cell marks a distinct node in the dependency graph. If you prefer to use cells, then the notebook can be smarter about what lines actually need to get re-run and what don't.

A downside to using multiple cells is vertical spacing/visual noise. This is something that the package authors are currently thinking about addressing.


Think of it as working with immutable data, because that's essentially what it is. Which has all the pros and cons of that approach (in my opinion a lot more pros, but YMMV).


Yes, that's the best explanation in the end I think. Maybe I'm too used to reuse variables and that's a bad habit I should work on. For example most of my counters are called i.


You just update the `a=1` cell, changing it to `a=2`. That's the whole point.

Every other cell that depends upon `a` will then automatically update.


Exactly. It requires a change in mindset wrt Jupyter, but it is totally worth it!


It completely fixes a huge number of the gripes in the JupyterCon talk I don't like notebooks by Joel Grus.

The primary complaint there is that notebooks have a disconnect between the state of the program and the display of the cells. By using a completely reactive mode the state is no longer hidden. It's more akin to a spreadsheet than a notebook. A number of other complaints are completely circumvented by using a file format that's simply a pure Julia file with clever comments.

Video: https://www.youtube.com/watch?v=7jiPeIFXb6U

Slides: https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...

Previous discussion: https://news.ycombinator.com/item?id=17856700


Now we just need to port all this over to python... or switch to Julia? Which one would take the least amount of effort?


Switching to Julia. You can just use PyCall to call all of your old code, so py"\paste" and you're using Pluto is like a 1 minute process. Then you can get fancy later.


You'll also save a lot of effort in future endeavors, IMO. Remember, once you've switched to Julia, Python is still only a `using PyCall` and `pyimport` away!


I've also switched to using Pluto, shortly after seeing the presentation during JuliaCon. It is still rough around the edges, but I've found it a lot easier to deal with than Jupyter, quite frankly.


Really happy to see that the good ideas from Observable notebooks are being copied elsewhere. I don't use Julia myself at the moment (although I think it's a beautiful language), but I know some people who will be very happy with this! Also, the playful enthusiasm in the presentation video linked in the description just makes me smile:

https://www.youtube.com/watch?v=IAF8DjrQSSk


Very interesting, thanks.

I don't understand the point of the "reactive" cell order, instead of conventionally doing top to bottom. It seems like it goes against the idea of "the program state being completely described by the code you see".


So in the context of these notebooks it is actually quite useful to not depend on cell order, because the idea is that notebooks aren't simply programs or scripts, they are (potentially interactive) documents. You can write an article that presents the results of your script, with cells that contain text, interactive widgets and plots at the top, and put the code and data that generates these plots in "appendix" cells below that. Observable, a "JavaScript ancestor" of Pluto.js, has plenty examples of these:

https://observablehq.com

Also, I guess that you haven't used notebook environments like Jupyter before, so a bit of historical context might help. In Jupyter, cells aren't necessarily executed top-to-bottom, they are executed when the user asks it too. The result is then stored in the global state (well, assuming there is a global variable that the data is assigned to). This means that cells that depend on other cells also depend on the order in which those cells were executed. Worse still, if you write your notebook in a sloppy manner, you can end up with a state that you cannot reproduce from the still-remaining code (for example, you can have variables A and B, B is generated from the result of A, then you remove A. Because Jupyter is not reactive this does not update B, so your notebook keeps working just fine... until you decide to edit B). So previously, notebook-like environments made it really easy to introduce bugs like this.

With reactive cells you don't have to think of state. It's kind of like pure functional programming: it removes global side-effects. And note how on a technical level, making cells execute top-to-bottom is really just very a simple way to enforce that cells must executed in order of dependency! So either option insists that this global-state-that-does-not-respect-dependencies is a big problem that should be avoided, they just present different solutions for the problem.


Thanks a lot for the write-up.


Reactive notebooks are closer to the idea you mention than other kinds of notebooks, because global state is reflected in the outputs of each cell. The biggest source of bugs when working in Jupyter for instance is that a variable was re-defined somewhere and it's not easy to see that it is changed when you have a cell somewhere above that has the output `foo = x` when the global state claims that foo = y.


In reactive notebooks, the program state is still being completely described by the code you see - it's just that it doesn't have to be in top-down order! Instead of forcing you to manually define everything before use, a reactive notebook just looks at the references you make, forms a dependency graph, and then evaluates it in topological order (and, as optimization, only re-evaluates things that depend on the thing you've just changed).

This leads to a much nicer code organization, particularly when you're writing something resembling an interactive document (e.g. an "explorable explanation"). Much of the documents I do on ObservableHQ has roughly the following structure:

  Title
  Prose
  Visualization
  Interactive UI elements (possibly mixed
    with further visualizations)

  ------------------------------
  All the actual source code
  powering the implementation,
  organized in readability order.


Even if you do it completely top down, it doesn't mean all cells need to update if you change something on the top, so you'll still profit from the dependency graph.

And notebooks are mostly for exploration, and you don't really have a fixed order. You start getting the data, then you run the model, then you go back and change the data in the import... The order of your internal logic (or how you want to explain it to others) isn't necessarily linear, and of course you can just go back and write like a normal program but you lose the chain of changes that led you to the result (in this case the compromise is that your chain is restricted, like you said you can't define the same thing twice, but in exchange the notebook will provide with you the program order for free).


This looks excellent.

In my opinion, these kinds of apps are the future of data science / data analyst work. Forget no-code, just enable these professionals to work in a single programming language that they're familar with and give them visualization superpowers. The Python ecosystem has https://www.streamlit.io/ and https://gradio.app/ now. R has https://shiny.rstudio.com/. I think we'll see more.


Looks good but I haven’t tried it yet.

Julia is a remarkable programming language, and pure Julia projects like this show that it is also good for general purpose development.

I want to try this combined with the Flux DL library.


I've been using Pluto for several weeks and I absolutely love it for quickly iterating on plots and making small UIs for interactively showing off results. It's altogether a much better experience than Jupyter for me.


wonder if you have tried it on large datasets given ur background.

Does it work well with large-datasets given its reactive nature?


Everything is cached unless something upstream changes, so it should work just fine with big datasets. I've only been using relatively small datasets so far, though.


Sounds like they fixed a bunch of things that are broken about Jupyter notebooks. I still don't understand why anyone would want to do work in their browser though.


As sibling says, it is a very good way to explore data and also iterate on design for script and document them.

Two examples from a previous work experience (remote sensing) :

(1) A colleague where creating SSH tunnel to create and explore data with the Jupyter process was on the calculation server. He was able to launch heavy calculation, fast-feedback loop for satellite images and shapefiles, manipulate the results and write the explanation next to each cell. As you would use a real notebook in fact. I had the same workflow and when I realized that the script will be used more than once, I moved the code to python script with command-line arguments support (just plug-in `argparse` to the script) and moved the text to comment the script.

(2) Teaching, we held seminar about different API and tools and used jupyter notebooks to teach everyone. The fast feedback loop was essential for anything with figures, plots, images, etc.

Pluto.jl while not yet perfect for me address a lot of broken things that made using Jupyter notebook driving me crazy (I had to broke my cells in a way that I can rerun everything when needed to update the global space, it was aweful).


I understand the value of notebooks, but I'd rather use them in a dedicated environment, like VScode or Rstudio.


FWIW, I thought I remembered hearing that the maintainers of the Julia extension in VSCode are working on getting embedded Pluto notebooks working.


>I still don't understand why anyone would want to do work in their browser though.

One of our colleagues had around thirty students who had to prepare their final year pojects in machine learning. We deployed our internal platform and gave them access so they wouldn't lose an academic year, as they were in a zone that was hit really hard with COVID-19.

They are mostly on Windows, are not comfortable with the CLI, have never used Git, they don't do Docker, have poor connectivity [4kB/s], don't have access to powerful machines, found it difficult to handle dependencies, and needed to work on +200GB datasets. They also were split into groups of two or three, and needed to be able to share the work with each other, and with their supervisor (our colleague).

So, one reason to use a browser based solution is to de-couple the user's computer from the dependencies, internals, or infra the work happens on, and simplify collaboration. This, or the main tool you rely on does not play nice with other tools, even if you're proficient.

We started to push for remote work in late 2018, and we started really going after it in 2019 because commute was draining our colleagues' energy. It really bothered us to see them arrive at work completely washed out, or see them worry about transportation at the end of the day, so we made remote work a priority. But they mainly trained models with notebooks, and there was a need to be able to do actual work as a team, so we built the tooling around our workflow and we've had to add in missing features to accomodate our colleagues who needed the notebook.


TL;DR: Working in a notebook in a browser is a terrible way to write a program but a fantastic way to explore data.

Even though notebooks are a common cause of headache in my world (I work on ML deployment), I think they're an incredibly valuable tool, and the familiar, visual interface of the browser plays a big part.

It clicked for me when I took a statistical genetics class taught by a team member of Hail.is (open source genomic analysis library). Coming from a dev background, I found working in a browser to be a clunky, awful experience—until I saw the way my classmates, most of whom were scientists by focus, used it. My instinct is to think of code in terms of the architecture of a program, but for them, code blocks were like buttons on a calculator. The speed at which they could iterate, and their ability to jump around, really drove home the value of the browser interface.

Would I want to write an API in one? Absolutely not. But for tinkering with genomic data? They're ideal in many ways.

Jeremy Howard of fast.ai talks about this a lot: https://twitter.com/jeremyphoward/status/1072555920029376512


would be curious what Jupyter notebooks things are better with Pluto.jl, can you name some concrete points?


1. Dependency graph for cells, letting it automatically rerun what's needed when you change one. This keeps everything up to date.

2. git-friendly.


I wonder how they are doing it? simply re-evaluating every cell?


It's mentioned in the video [1], it does static analysis of the code to create a graph of dependencies (for example which cell uses a variable defined by another cell), so when you update any cell it will find what cells are affected by the change (the downstream nodes on a directed acyclic graph) and only evals the code on them (instead of running everything). Julia is particularly good for those kind of code analysis since it's a very Lispy language.

It also does a trick of creating new modules to manipulate scope to make deleted variables/import/cells invisible (and therefore free to be garbage collected).

[1] https://youtu.be/IAF8DjrQSSk?t=596


Reproducibility is a huge one for me. I’m often doing exploratory work that I’ll want to save in its current state and pick back up a few months (or even years) later. With Jupyter, this almost never works because I am constantly editing and running cells out of order as I’m exploring things. If I save at any given point, there is no guarantee that the notebook will be in the same state when I reopen it and re-run the cells.

With Pluto and other reactive notebooks, you have a guarantee that the code you see on the screen will produce the same results. So if you go back and edit cells out of order, save the notebook, then open it and re-run later, it will always be in the same state you left it in.


In addition to what celrod said: Jupyter doesn't have an equivalent of Pluto's @bind macro, does it?


It's not as elegant but kinda exists

https://ipywidgets.readthedocs.io/en/latest/


Thank you! I hadn't seen that.


Just tried it. A couple of things.

Why are results displayed above the code cell instead of below it (as in Jupyter/IPython)? Was a bit confusing at first.

I'm noticing it crashes a lot. Get messages like:

   Worker 2 terminated.
   Distributed.ProcessExitedException(2)
Really like the reactive aspect, though.


> Why are results displayed above the code cell instead of below it (as in Jupyter/IPython)? Was a bit confusing at first.

I also strongly agreed with this at first, but after working with it some more I've found it compelling. The key mental model is to think of the code as something akin to a "figure caption."


Pluto.jl appears to be a Julia centric notebook alternative to Jupyter and its multi-language Kernels, including IJulia [1]. I imagine there are many trade-offs between the two but the primary one I see is the runtime size/Python-dependencies of Jupyter versus the reach of the platform.

There is also a great deal of overlap between IDEs and Notebook platforms.

[1] https://github.com/JuliaLang/IJulia.jl

Edit: "pure Julia" => "Julia centric" based on jakobnissen's comment


Not quite. Pluto is also built on JavaScript (of course, since it's a browser notebook).

The main advantages of Pluto is that

* The sources files are executable Julia files with minimal metadata, so it plays nice with Git. Also, the code of the source files is ordered to reflect the execution order of the cells, to keep the source code and the notebook in sync.

* It attempts to remove all global state. If you change a cell, and dependent cells will change as well (similar to Excel). This makes bugs less likely.


Git friendly native Julia files and clean cell reordering/refactoring are very nice features. Julia seems like a compelling data science platform, especially for greenfield projects.


To me this makes it a fundamentally different thing to jupyter.


Seems really smooth. I would like to see how does this scale with larger chunks of codes. Is there any benchmark comparisons?


It just runs Julia under the hood, so I would expect it to be simply as fast as Julia (assuming that the data processing is the more significant bottleneck compared to the HTML output that is used to present the results). Performance more likely affected by the way the data is processed than the language's speed.

From what I understand, results of cells are cached though, and they don't update unless something upstream changes, so there is a form of memoization happening. Which of course also has both implications for performance as well as memory usage.


The thing that still frustrates me about Pluto is that I have to put a `begin ... end` block in each cell where I want multiple commands to run. It would be nice if it ran more like Jupyter notebooks in this sense, where blocks can be blocks of code and not individual lines.


This would be better if there was VS Code integration with their notebook system.

Personally, I will never code in a web browser and don't understand how people can do so with large code bases.


Nobody has "a large codebase" in a notebook. They're for explorative programming and visualisations and they're excellent for that. And VS Code is technically a web browser, so...


One thing that is pretty great about Pluto.jl, is how responsive the author is (Fons van der Plas, or @fonsp on GitHub). I've been able to get great suggestions from him (as well as the fast growing community of Pluto users) on Zulip discussion group for Julia (https://julialang.zulipchat.com)


Imagine now the same but without HTML/CSS/JS - using a native 2D and 3D rendering, with acceleration. That would be blazingly fast.


Pluto is hard to do R&D but great if you want to compose a report using existing code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: