More

edublancas · 2025-01-24T16:08:36 1737734916

TIL there is pgvector and pgvecto.rs

edublancas · 2025-01-10T13:57:18 1736517438

Cool work!

I've been working on a similar product. Users can select between Streamlit/Shiny: https://editor.ploomber.io/ - so not necessarily for BI (although you can use it for that), but more broadly focused on data apps.

edublancas · 2024-09-18T14:09:05 1726668545

Papermill is great but has quite some limitations because it spins up a new process to run the notebook:

- You cannot extract live variables (needed for testing)

- Cannot use pdb for debugging

- Cannot profile memory usage

You can do all of that with ploomber-engine (https://github.com/ploomber/ploomber-engine).

Disclaimer: I'm the author of this package

ziddoap · 2024-09-18T14:17:30 1726669050

Not disclosed in this comment is that edublancas is

>Ploomber (YC W22) co-founder.

Kalanos · 2024-09-18T15:36:40 1726673800

who is a great technologist with a lot of hands on experience. if it made sense to leverage papermill, he would have done so and focused on something else.

ziddoap · 2024-09-18T15:45:16 1726674316

What does any of this have to do with disclosure?

Kalanos · 2024-09-18T17:00:23 1726678823

calling attention to disclosure suggests bias. i'm obviously saying that i trust him not to be biased.

fastasucan · 2024-09-20T20:16:05 1726863365

Not obvious at all. And no. It doesn't have to suggest bias, just a lack of disclosure.

throwpoaster · 2024-09-18T14:20:59 1726669259

iirc, a few years back I was able to do all of these things with the Papermill IPython runtime.

Papermill is great, but yes: lots of room to hack on it and make it better.

edublancas · 2024-09-18T14:24:00 1726669440

has papermill deprecated the ipython runtime? I used papermill extensively in the past and I never saw that in their docs.

throwpoaster · 2024-09-18T21:58:19 1726696699

It’s been a while but you do it with a custom kernel and maybe some entry point tweaks. IIRC.

edublancas · 2024-09-06T18:53:25 1725648805

I'd say how much is good enough highly depends on your use case. For something that still has to be reviewed by a human, I think even .7 is great; if you're planning to automate processes end-to-end, I'd aim for higher than .95

edublancas · 2024-09-06T18:51:40 1725648700

thanks a lot for the feedback! you're right, this is much better input data. I'll re-run the code with these tables!

andybak · 2024-09-07T13:50:42 1725717042

Also - is there a chance GPT is relying on it's training data for some questions? i.e. you don't even need to give it the table.

To be sure - shouldn't you be asking questions based on data that is guaranteed not to be in it's training?

edublancas · 2024-09-02T23:44:42 1725320682

author here: I'm working on a follow-up post where I benchmark pre-processing techniques (to reduce the token count). Turns out, removing all HTML works well (much cheaper and doesn't impact accuracy). So far, I've only tried gpt-4o and the mini version, but trying other models would be interesting!

edublancas · 2024-09-02T23:42:25 1725320545

author here: I'm working on a follow-up post. Turns out, removing all HTML tags works great and reduces the cost by a huge margin.

AbstractH24 · 2024-09-03T11:00:42 1725361242

Am I crazy or is there no way to “subscribe” to your site? Interested to follow your learnings in this area.

edublancas · 2024-09-03T18:32:42 1725388362

there isn't. but you can connect X or LinkedIn.

I might add a subscribe button once I get some time :)

7thpower · 2024-09-03T02:11:34 1725329494

What do you mean? What do you use as reference points?

edublancas · 2024-09-03T03:07:54 1725332874

nothing, I strip out all the HTML tags and pass raw text

isaacfung · 2024-09-03T03:45:30 1725335130

How do you keep table structure?

jaimehrubiks · 2024-09-03T09:14:12 1725354852

They should probably keep tables and lists and strip most of the rest.

edublancas · on March 7, 2024

You can get this for pretty much any language by re-using Jupyter kernels; here's a Python example: https://hidden-truth-8699.ploomberapp.io/

edublancas · on March 7, 2024

If you're looking to try this in a Jupyter notebook: https://jupysql.ploomber.io/en/latest/integrations/chdb.html

edublancas · on Feb 21, 2024

We'd love to support this in Ploomber Cloud!

We already support frameworks like Streamlit, Gradio, Shiny, Solara, Voila, among others. Ping me if you're interested!

https://docs.cloud.ploomber.io/en/latest/intro.html