Hi I’m George, I’d love to share lessons we made optimizing data pipelines with AI / embedding calls for our users, which increased the pipeline throughput 5x. We did adaptive batching - discussed in detail how we did it.
Developers still simply process data row-by-row, under the hood we queue requests and batch at the right moments (batching is effectively columnar), so no manual plumbing. Would love your thought.
Hi HN, I’m George — I left Google after 10 years working on infrastructure and am building CocoIndex https://cocoindex.io with my friend Linghua.
CocoIndex is an open source ETL framework that does incremental processing designed for AI workloads. We cut >90% of compute costs by processing only what’s changed — effortless fresh context for AI.
It is easy to build scalable, production-grade pipelines like Lego in hours. Think it as n8n with python blocks but for large scale RAG pipelines.
You can build vector index, knowledge graph and custom logic with any modality in the pipeline with AI. To get started, you can run `pip install -U cocoindex`.
This article offers a holistic, top-down perspective on Rust’s ownership, permission, and memory safety model. By rethinking Rust’s rules through this mental framework, it demystifies challenging concepts like lifetimes, Send/Sync, and interior mutability—making Rust’s safety guarantees easier to understand without memorizing a long list of rules.
Love the idea of writing data pipeline similar to spreadsheet. Spreadsheets is an amazing programming model and I can write my formulas brainlessly, and it calculate the result in order, and also automatically takes care of any updates.