Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LangChain: The Missing Manual (pinecone.io)
164 points by gk1 on May 19, 2023 | hide | past | favorite | 43 comments



LangChain has moved fast and made a decent first pass at a solution to the problem of LLM orchestration. But I'm skeptical that the first solution will be the best solution, and we should keep an open mind to other approaches.

Personally, I like the more declarative approach that Microsoft is taking with guidance [0]. The two projects are not substitutable at the moment, and might even complement each other, but I'm weary of building a new ecosystem on a possibly overly-complicated first pass solution to the orchestration problem.

[0] https://github.com/microsoft/guidance


Resonate with the assessment. I like guidance too so far, but the dev community is far behind langchain. One big problem I have with langchain is that there are too many wrappers, and its often very confusing what little additional functionality each wrapper is adding up.


Yeah, there are so many platforms out there. It is incredibly tough to figure out who will be heading in the right direction.

- https://shreyar.github.io/guardrails/ - https://github.com/NVIDIA/NeMo-Guardrails - https://www.askmarvin.ai/


s/weary/wary


Damn, I'm weary of making that mistake so often. Thanks :)


I personally prefer MS Guidance. Langchain is just super bloated and complicated, to the point that I found it easier to just write the code myself. Plus, I don't think Langchain is a good long-term strategy given their involvement with VCs.


Same. I feel like once you strip away the abstractions that you probably aren't going to use long-term, you're left with something that competes with python f-strings. And yep I've tried it at multiple points in its evolution and never found it useful even in my toy apps. It was just a time sink trying to figure out why their own abstractions didn't work with each other. For weeks after they released Chat API integration, I scoured the web for working examples of using an agent with ChatGPT-3.5turbo. Closest I got was Microsoft's VisualChatGPT, until I found out it used langchain+text-davinci, despite its name :(

Langchain really seems to be entirely hype, to me. Like, nothing in production I've heard about actually uses it AFAIK. Not AutoGPT, not BabyAGI, nothing at any of the big companies, etc. But it's available in 2 languages and has integrations with everything under the sun, making it easy to adopt! Despite this lack of production usage with positive anecdotes, you're still hearing about this library a lot! Definitely doing the VC playbook.

EDIT: Go to their discord and read the #ask-kapa-langchain channel. This is a retrieval augmented Q&A bot, powered by langchain, which (every time I've checked) has helped ~nobody. I'm really not trying to cherry-pick - this is something that should be rock solid if this software stack is useful at all.


Example: Say you want to use Llama models in langchain. They have TWO CONFLICTING documentations, only one of which works:

1. https://python.langchain.com/en/latest/modules/models/llms/i...

2. https://python.langchain.com/en/latest/reference/modules/llm... → Turns out, LlamaCppEmbeddings is from langchain.embeddings, not langchain.llms!

I just gave up using langchain and one of the main reasons was its terrible docs.


Didn’t they just get funded too? Sounds like more of a side project than a serious venture.


That intro to Langchain is absolutely terrible. Like it was copy-pasted from the worst LLM they could find and pasted in:

> The first high-performance and open-source LLM called BLOOM was released. OpenAI released their next-generation text embedding model and the next generation of “GPT-3.5” models.

Just random sentences strung together delivering no overall message. Yes we know BLOOM and GPT exist, what is your point?

> LangChain appeared around the same time. Its creator, Harrison Chase, made the first commit in late October 2022. Leaving a short couple of months of development before getting caught in the LLM wave.

That's nice that the text model that wrote this knows the creator and first commit but ugh -- just say "Langchain was published in October 2022" instead of all that garbage.

Also, "Leaving a short couple of months of development before getting caught in the LLM wave." doesn't even form a complete sentence.

I'm already hating the future of blog posts and articles where we have to mentally filter out all the LLM-generated garbage around any real information.


I caught myself the other day throwing my feed of articles into an LLM to give me summaries and what it thinks are interesting points / facts. I'm not sure how to feel about this.


Why wouldn't you use a language model to summarize? It is one of the most useful things a statistical language model is capable of.

Though it might be good to tune it to help it identify the parts you find interesting. Especially if you try to identify salient details.


is the manual itself good? guess we will have to go through it.


Given the comments in the previous submission about LangChain (https://news.ycombinator.com/item?id=35820931), I am working on a much simpler/faster/cheaper alternative that doesn't require delving deep into arcane documentation and guides as if LangChain's complexity and inflexibility is a good thing for more than quick demos.

Also relatedly, I am also working on a tutorial on how to do vector similarity search without having to pay for a vector store because there is confusingly very few blog articles highlighting the space between "embeddings as text in a CSV" and "full-on vector store management" which is annoying for people who want to do personal projects.


Looking forward to seeing that. I haven't yet managed to get my head around LangChain, excited to see what your simpler alternative looks like.


Always a pleasure to read you two guys' comments on LLM posts!


it’s not intentional, I promise


I used langchain to make a pretty basic LLM augmented with a (free) local vector database: https://github.com/mkwatson/chat_any_site


That's awesome. Looking forward to trying it.


I would like to see the performance improvement wrt to something like FAISS tuned than using a vector db. Could poke a hole in a budding trend.


A bit odd that Pinecone is publishing this…

I can understand if they identified that building with LangChain means more Pinecone usage, and a barrier to building with Langchain is its documentation and ease of getting started, but if the now well funded project isn’t producing this itself (and in my own experience the Typescript library at least doesn’t feel like it’s hitting the nail on the head and I ended up reading source code) then I think that’s a sign we’re still searching for the best way to build complex things here


The fact that pinecone published this is proof of how many AI tooling products see Langchain as key to their distribution.

Personally, I find langchain unnecessary if you already know which tools you are going to use.

It remains to be seen how important portability of workloads and discoverability of tooling will be in this space. My experience with cloud computing has taught me that portability of workloads is overrated, but I'm not sure that lesson will translate well to AI models.


this is basic content marketing strategy, Pinecone can be used with LangChain and a decent chunk of people searching for tutorials about LangChain will indirectly learn about Pinecone as a result of this article


Twitter retweets and seo boost . Pinecone has an extensive content strategy


LangChain is undeniably the best option for building demos.


Vector stores and calculations of vector similarity are an adjacent complement to the ReAct workflow, not replacing it or being specific to LangChain.


We’re building “Langchain for Ruby” under the current working name of “Langchain.rb”: https://github.com/andreibondarev/langchainrb

People that have contributed on the project thus far each have at least a decade of experience programming in Ruby. We’re trying our best to build an abstraction layer on top all of the common emerging AI/ML techniques, tools, and providers. We’re also focusbig on building an excellent developer experience that Ruby developers love and have gotten to expect.

Unlike the Python project, as it’s been pointed out here a countless number of times, we’d like to avoid deeply nested class structures that make it incredibly difficult to track and extend.

We’ve been pondering over the “what does Rails for Machine Learning look like?” question, and we’re taking a stab at answering this question.

We’re hyper-focused on the open source community and the developer community at large. All feedback/ideas/contributions/criticism are welcome and encouraged!


I don't get why Langchain is so popular. It's classic inner-platform effect: "a system so customizable as to become a replica, and often a poor replica, of the software development platform they are using."

Once we get better at getting structured output from LLMs, we can use the standard control flow mechanisms in built in to our programming languages.


Much of the popularity of langchain, AutoGPT, etc. is just due to hype and appeals to no-coders. Anyone with some knowledge of programming and LLMs sees the facade and avoids these things.


Thank you for posting this. I've started out some MVPs using vector DBs (testing pinecone and supabase with pgvector) and finding a lot of things that are not obvious to me.


For anyone thinking about applications of langchain and pinecone but who are looking for something more turn-key check out https://jiggy.ai

The core is actually open source as well, allowing you to take your data back out via sqlite and hnswlib (https://github.com/jiggy-ai/hnsqlite)


Hey I’m actually interested. Could you clarify what we mean by more “turnkey”?


In this case we mean you can get some of the benefits of langchain and pinecone such as semantic search and augmented GPT retrieval without needing to deal with vectors, chunking, and llm tooling at such a low level. You can upload docs and then begin chatting against them immediately. JiggyBase is just a higher level abstraction on top of these same type of components which may be useful in a lot of cases where you don't need full control over the vector embeddings and such and just want to interact with your data.


DO we need a LangChain alternative ? Nowadays i evaluate an open source project via documentation.

My expectation for a good documentation, is not much about "Reference". It's more about accessile content to get started, then go to advanced things.

Documentation is not the same as just a reference.

A bad documentation feels like: You're not welcome to contribute (yet).


A lot of people are dissing LangChain at this thread for various reasons. I am primarily interested in building tools that couple LLMs with other things like web browsing and using Wolfram or Zapier APIs. Is there a LangChain alternative for that?


We allow users to build LLM chains at trypromptly.com as apps. Once an app is built, it can be integrated into other applications as iframe embeds or can be called via our APIs so custom frontends can be built.

We also have Zapier integration (https://zapier.com/apps/promptly/integrations) so apps on Promptly can be invoked from zaps


We have just added support for ElevenLabs. https://twitter.com/ajhai/status/1659642782607372288 is a quick demo of the platform if interested.


ChatGPT plugins will be usable through the OpenAI API, and both of those services are available as ChatGPT plugins.


Mark Watson's guide to LangChain is free to read online: https://leanpub.com/langchain/read


Has langchain had positive ROI for anyone beyond the initial prototype of their app? My experience has me skeptical - I end up feeling like I'm painted into a corner and need to start over. Maybe if I just used PromptTemplate and LLMChain, but at that point I can just use function composition and formatted strings.

Like I'd be blown away if someone had a production app where they were able to swap LLM providers (and nothing else) due to langchain. And if that expectation is too high then why not just code against the openai API?


I'd argue that building your app based on a quickly evolving library is not a wise idea, esp. if the library is not well documented.


Outside a prototype what’s the benefit? The steps I envision are: 1) turn prompt to embeddings 2) return examples that match from vector db 3) load as examples 4) prompt LLM with examples. Am I missing something? Why on earth would you want to import multiple dependencies for this?


+1 on this experience. Langchain is great at wrapping simpler tasks, but once you start to decouple components you start to run into issues.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: