AdrienBrault's comments

AdrienBrault · on June 16, 2024

To feed content to LLMs

MilStdJunkie · on June 17, 2024

"But it's not structured!"

I swim around a lot in the "XML High Priesthood" pool, and the latest new thing is this: AI (sucking down unstructured documents) isn't capable of efficient functioning without Knowledge Graph, and donchaknow a complex XML schema and a knowledge graph are practically the same thing.

So they're glueing on some new functionality to try and get writer teams to take the plunge and - same old same old - buy multimillion dollar tools to make PDFs with. One sign of a terminal bagholder is seeing the same tech come up every few years with the latest fashionable thing stapled on its face. They went through a "blockchain" phase too, where all the individual document elements would be addressable "through the chain".

Anyway . . .

Anyway, thing is, there's a teensy shred of truth in what they're saying, but everything else about what they're suggesting would, I think, either not work at all, or make retrieval even less dependable. Also, to do what they're trying to do, you don't actually need a gigantic full on XML schema. Using Asciidoc roles consistently would get you the same benefit, and would save a hell of a lot of space in a very limited window.

simonw · on June 16, 2024

Yeah, this. Markdown uses less tokens than HTML and most LLMs have been trend on large amounts of Markdown.

That's why tools like this exist: https://jina.ai/reader/

Demo: https://r.jina.ai/https://news.ycombinator.com/item?id=40695...

smollestest_pp · on June 17, 2024

Additionally, when you have strict input token limits: it’s way easier to chunk Markdown while keeping track of context than it is to chunk HTML at all.

AdrienBrault · on Feb 8, 2024

Gemini Ultra is not available in France, even though it is in all neighboring countries: Germany, Spain, Belgium, Luxembourg, Switzerland, and Italy.

Is that because of french legislation, or Mistral? ;-)

leblancfg · on Feb 8, 2024

I'm like 98% sure it's the former. Geofencing would only be a minor inconvenience to the latter.

AdrienBrault · on Jan 14, 2024

Probably much better than alexa. Gpt 3.5 is miles ahead alexa

dr_dshiv · on Jan 14, 2024

Sorry that was a bad joke

AdrienBrault · on Jan 12, 2024

Production ready

AdrienBrault · on Dec 19, 2023

3.5-turbo might be 20B, not 10x larger

https://www.reddit.com/r/LocalLLaMA/comments/17jrj82/new_mic...

orbital-decay · on Dec 19, 2023

Let's see... the linked arXiv article has been withdrawn by the author with the following comment:

> Contains inappropriately sourced conjecture of OpenAI's ChatGPT parameter count from this http URL, a citation which was omitted. The authors do not have direct knowledge or verification of this information, and relied solely on this article, which may lead to public confusion

The URL in question: https://www.forbes.com/sites/forbestechcouncil/2023/02/17/is...

This article was written by Aleks Farseev, the CEO of SoMonitor.ai, who makes the claim with no source or explanation:

> ChatGPT is not just smaller (20 billion vs. 175 billion parameters) and therefore faster than GPT-3

moffkalast · on Dec 19, 2023

Hmm right, the ~300B figure may have been for the non-turbo 3.5

AdrienBrault · on June 14, 2023

I think YAML actually uses more tokens than JSON without indents, especially with deep data. For example "," being a single token makes JSON quite compact.

You can compare JSON and YAML on https://platform.openai.com/tokenizer

AdrienBrault · on April 7, 2023

Not "HN-like", but I have found Simon Willison's blog/newsletter very helpful: - https://simonwillison.net - https://simonw.substack.com

AdrienBrault · on July 3, 2022

I’m hopeful that https://dagger.io will help CI “shift left”! Especially the ability/easiness to run it locally, and the built in caching/DAG

AdrienBrault · on Nov 10, 2021

That feels like a change that will help misinformation

tjpnz · on Nov 10, 2021

And crypto scams.

kyle_martin1 · on Nov 10, 2021

And hide the White House's like to dislike ratio. Check out some videos-- the ratio is horrible: https://www.youtube.com/c/WhiteHouse/videos