Hacker Newsnew | past | comments | ask | show | jobs | submit | dheerkt's commentslogin

based on their past usage of "interleaved tool calling" it means that the tool can be used while the model is thinking.

https://aws.amazon.com/blogs/opensource/using-strands-agents...


AFAICT, kimi k2 was the first to apply this technique [1]. I wonder if Anthropic came up with it independently or if they trained a model in 5 months after seeing kimi’s performance.

1: https://www.decodingdiscontinuity.com/p/open-source-inflecti...


OpenAI has been doing this since at least O3 in January, Anthropic has been doing it since 4 in May.

And the July Kimi K2 release wasn't a thinking model, the model in that article was released less than 20 days ago.


How is this different from a local Jupyter notebook? Can we not do this with ! or % in a .ipynb?

Genuine question. Not familiar with this company or the CLI product.


The main thing that keeps me from using Jupyter notebooks for anything that's not entirely Python, is Python.

For me, pipenv/pyenv/conda/poetry/uv/dependencies.txt and the invitable "I need to upgrade Python to run this notebook, ugh, well, ok -- two weeks later - g####m that upgrade broke that unrelated and old ansible and now I cannot fix these fifteen barely held up servers" is pure hell.

I try to stay away from Python for foundational stuff, as any Python project that I work on¹ will break at least yearly on some dependency or other runtime woe. That goes for Ansible, Build Pipelines, deploy.py or any such thing. I would certainly not use Jupyter notebooks for such crucial and foundational automation, as the giant tree of dependencies and requirements it comes with, makes this far worse.

¹ Granted, my job makes me work on an excessive amount of codebases, At least six different Python projects last two months, some requiring python 2.7, some requiring deprecated versions of lib-something.h some cutting edge, some very strict in practice but not documented (It works on the machine of the one dev that works on it as long as he never updates anything?). And Puppet or Chef - being Ruby, are just as bad, suffering from the exact same issues, only that Ruby has had one (and only one!) package management system for decades now.


Jupyter Notebooks have always felt a bit hacky for terminal purposes to me, so I'm excited to give this a shot.


How about marimo?


100% same question.

Usually, I feel like Jupyter gives both worlds—- flexible scripting and support for os commands (either through !/% or even os.system()


This is the non-AI version of an AI CEO saying programmers will not exist in 5 years.


I recently wrote a post outlining our method to reduce hallucinations in LLM agents by leveraging a verified semantic cache. The approach pre-populates the cache with verified question-answer pairs, ensuring that frequently asked questions are answered accurately and consistently without invoking the LLM unnecessarily.

The key idea lies in dynamically determining how queries are handled:

- Strong matches (≥80% similarity): Responses are directly served from the cache.

- Partial matches (60–80% similarity): Verified answers are used as few-shot examples to guide the LLM.

- No matches (<60% similarity): The query is processed by the LLM as usual.

This not only minimizes hallucinations but also reduces costs and improves response times.

Here's a Jupyter notebook walkthrough if anyone's interested in diving deeper: https://github.com/aws-samples/Reducing-Hallucinations-in-LL...

Would love to hear your thoughts—anyone else working on similar techniques or approaches? Thanks.


yeah converse api supports all models on bedrock, or atleast all the text2text ones


If the user asks such a question, your agent should not invoke the RAG at all, but simply answer from the history. You need to focus on your orchestration step.

Search for ReAct agents, can build using either LangGraph or Bedrock Agents.


Skeptical that this will be a "good" experience for everyone involved considering how generic AI openers/responses are, but also hopeful that it can reduce friction for some.

Excited to see what y'all cook up.


This might be our greatest differentiator.

We are trying to make it as personalized to you as possible. Currently, it is still limited, but we seek as much context of the user and the matches to personalize each message over time.

Please apply to check it out!


Not an expert by any means but streaming HQ video is pretty expensive (even more so for live content), seems like the only providers that can do so profitably are YouTube and Netflix. I'm sure a big reason for that is the engineering (esp. CDN)


This is actually not true nowadays. Streaming HQ video is pretty cheap (check out per GB pricing from Cloudfront or Fastly and divide that by 5-10 to get a realistic number)


Can you not use cross-region inference?


90% of our customers do not allow this due to data sovereignty.

Bedrock here is lagging so far behind several customers assume AWS simply aren't investing here anymore - or if they are it's an afterthought - and a very expensive one at that.

I've spoken with several account managers and SAs and they seem similarly frustrated with the continual response from above that useful models are "coming soon".

You can't even BYO models here, we usually end up spinning up big ol' GPU EC2 instances and serving our own, or for some tasks running locally as you can get better openweight LLMs.


Hmm interesting, didn't realize that data sovereignty requirements were so stringent. Wonder how other cloud providers are doing in this sense considering GPU shortages across the board.


I'm confused, what's expensive about it? It's a serverless pay per token model?

Do you mean specifically the Bedrock Knowledgebase/RAG -- that uses serverless OpenSearch which costs at minimum $200ish/month bc it doesn't scale to zero?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: