Hacker Newsnew | past | comments | ask | show | jobs | submit | edublancas's commentslogin

TIL there is pgvector and pgvecto.rs


Cool work!

I've been working on a similar product. Users can select between Streamlit/Shiny: https://editor.ploomber.io/ - so not necessarily for BI (although you can use it for that), but more broadly focused on data apps.


Papermill is great but has quite some limitations because it spins up a new process to run the notebook:

- You cannot extract live variables (needed for testing)

- Cannot use pdb for debugging

- Cannot profile memory usage

You can do all of that with ploomber-engine (https://github.com/ploomber/ploomber-engine).

Disclaimer: I'm the author of this package


Not disclosed in this comment is that edublancas is

>Ploomber (YC W22) co-founder.


who is a great technologist with a lot of hands on experience. if it made sense to leverage papermill, he would have done so and focused on something else.


What does any of this have to do with disclosure?


calling attention to disclosure suggests bias. i'm obviously saying that i trust him not to be biased.


Not obvious at all. And no. It doesn't have to suggest bias, just a lack of disclosure.


iirc, a few years back I was able to do all of these things with the Papermill IPython runtime.

Papermill is great, but yes: lots of room to hack on it and make it better.


has papermill deprecated the ipython runtime? I used papermill extensively in the past and I never saw that in their docs.


It’s been a while but you do it with a custom kernel and maybe some entry point tweaks. IIRC.


I'd say how much is good enough highly depends on your use case. For something that still has to be reviewed by a human, I think even .7 is great; if you're planning to automate processes end-to-end, I'd aim for higher than .95


thanks a lot for the feedback! you're right, this is much better input data. I'll re-run the code with these tables!


Also - is there a chance GPT is relying on it's training data for some questions? i.e. you don't even need to give it the table.

To be sure - shouldn't you be asking questions based on data that is guaranteed not to be in it's training?


author here: I'm working on a follow-up post where I benchmark pre-processing techniques (to reduce the token count). Turns out, removing all HTML works well (much cheaper and doesn't impact accuracy). So far, I've only tried gpt-4o and the mini version, but trying other models would be interesting!


author here: I'm working on a follow-up post. Turns out, removing all HTML tags works great and reduces the cost by a huge margin.


Am I crazy or is there no way to “subscribe” to your site? Interested to follow your learnings in this area.


there isn't. but you can connect X or LinkedIn.

I might add a subscribe button once I get some time :)


What do you mean? What do you use as reference points?


nothing, I strip out all the HTML tags and pass raw text


How do you keep table structure?


They should probably keep tables and lists and strip most of the rest.


You can get this for pretty much any language by re-using Jupyter kernels; here's a Python example: https://hidden-truth-8699.ploomberapp.io/


If you're looking to try this in a Jupyter notebook: https://jupysql.ploomber.io/en/latest/integrations/chdb.html


We'd love to support this in Ploomber Cloud!

We already support frameworks like Streamlit, Gradio, Shiny, Solara, Voila, among others. Ping me if you're interested!

https://docs.cloud.ploomber.io/en/latest/intro.html


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: