More

a_wild_dandan · 2025-05-24T05:35:34 1748064934

That's using a mighty broad brush to paint folks' circumstantial situations.

a_wild_dandan · 2025-05-02T20:01:54 1746216114

Remembers everything that you say, isn't limited to an hour session, won't ruin your life if you accidentally admit something vulnerable regarding self-harm, doesn't cost hundreds of dollars per month, etc.

Healthcare is about to radically change. Well, everything is now that we have real, true AI. Exciting times.

tomalbrc · 2025-05-02T20:16:42 1746217002

Openly lies to you, hallucinates regularly, can barely get a task done. Such exciting.

Oh and inserts ads into conversations. Great.

astrange · 2025-05-03T05:06:44 1746248804

> Oh and inserts ads into conversations. Great.

Are you sure you don't have browser malware?

codr7 · 2025-05-02T20:17:28 1746217048

Quick reminder that it's still just a fancy pattern matcher, there's no clear path from where we are to AGI.

mensetmanusman · 2025-05-02T20:50:38 1746219038

>you are a stochastic parrot >no I’m not >yes you are

a_wild_dandan · 2025-04-09T18:55:40 1744224940

I suppose Google wants us to pretend that "agents" can't be "resources." MCP is already well established (Anthropic, OpenAI, Cursor, etc), so Google plastering their announcement with A2A endorsements just reeks of insecurity.

I figure this A2A idea will wind up in the infamous Google graveyard within 8 months.

thebytefairy · 2025-04-10T06:08:37 1744265317

Creating new standards is not easy, largely because everyone has to agree that they will use this particular one. Plastering it with endorsements attempts to show that there is consensus and give confidence in adoption. If they didn't put them in, you'd instead say nobody is using or going to use this.

mellosouls · 2025-04-11T16:41:42 1744389702

True, but look at those "partners"; most of them are lame BigCo/consultancy types with no history of technological innovation or collaboration, in fact generally anti.

The list is aimed at bureaucratic manager types (which may be the correct approach if they are generally the decision makers), its not a list that will impress engineers too much I think.

alittletooraph2 · 2025-04-10T17:33:38 1744306418

you know how the endorsements work right? some comms intern writes a quote, emails it to someone at the other companies for the go ahead/approval, and that's how you get dozens of companies all spouting BS that kinda sounds the same.

medbrane · 2025-04-09T21:07:26 1744232846

But MCP doesn't claim to address agent to agent communication, right?

varelaseb · 2025-04-10T03:30:28 1744255828

It's not so much about what you _can do_ but about the messaging and posturing, which is what drives the adoption of standards as a social phenomenon.

My team's been working on implementing MCP-agents and agents-as-tools and we consistently saw confusion from everyone we were selling this into (who were already bought in to hosting an MCP server for their API or SDK) for their agents because "that's not what it's for".

Kinda weird, but kinda simple.

TS_Posts · 2025-04-12T00:41:59 1744418519

Hi there (I work on a2a) - reposting from above.

We are working with partners on very specific customer problems. Customers are building individual agents in different frameworks OR are purchasing agents from multiple vendors. Those agents are isolated and do not share tools, or memory, or context.

For example, most companies have an internal directory and internal private APIs and tools. They can build an agent to help complete internal tasks. However, they also may purchase an "HR Agent" or "Travel Assistant Agent" or "Tax Preparation Agent" or "Facilities Control Agent". These agents aren't sharing their private APIs and data with each other.

It's also difficult to model these agents as structured tools. For example, a "Tax Preparation Agent" may need to evaluate many different options and ask for specific different documents and information based on an individual users needs. Modeling this as 100s of tools isn't practical. That's where we see A2A helping. Talk to an agent as an agent.

This lets a user talk to only their company agent and then have that agent work with the HR Agent or Travel Booking Agent to complete complex tasks when they cannot be modeled as tools.

a_wild_dandan · 2025-04-07T22:07:16 1744063636

Exposing invalid transitions to a user is a bug. Idempotency here doesn't solve anything, just hides said bug, which is arguably worse.

antonvs · 2025-04-08T02:17:58 1744078678

A race condition for which of multiple concurrent users initiated a transition is not a bug, it's a scenario that commonly needs to be handled. In many cases, idempotency can be a simple and effective approach for handling this.

a_wild_dandan · 2025-03-13T20:50:21 1741899021

When you identify where the infringing party has stored the source material in their artifact.{zip,pdf,safetensor,connectome,etc}. In ML, this discovery stage is called "mechanistic interpretability", and in humans it's called "illegal."

Dylan16807 · 2025-03-13T22:54:12 1741906452

It's not that clear cut. Since they're talking about taking lossy compression to the limit, there are ways to go so lossy that you're not longer infringing even if you can point exactly at where it's stored.

Like cliff's notes.

a_wild_dandan · 2025-03-02T03:45:17 1740887117

GLP-1 drugs offer palliative mercy for terminal societal malaise. You're in the bargaining stage.

a_wild_dandan · 2025-01-30T17:34:04 1738258444

And for anyone looking to dig deeper, check out "grammar-based sampling."

a_wild_dandan · 2025-01-28T18:40:04 1738089604

> We evaluate Qwen2.5-Max alongside leading models

> [...] we are unable to access the proprietary models such as GPT-4o and Claude-3.5-Sonnet. Therefore, we evaluate Qwen2.5-Max against DeepSeek V3

"We'll compare our proprietary model to other proprietary models. Except when we don't. Then we'll compare to non-proprietary models."

a_wild_dandan · 2025-01-27T22:29:55 1738016995

> [...] this technology which has progressed only by increasing data volume and variety

Sure, if you ignore major shifts after 2022, I guess? Test-time-compute, quantization, multimodality, RAG, distillation, unsupervised RL, state-space models, synthetic data, MoEs, etc ad infinitum. The field has rapidly blown past ChatGPT affirming the (data) scaling laws.

> [...] where when one output (, input) is obtained the search space for future outputs is necessarily constrained

It's unclear to me why this matters, or what advantage humans have over frontier sequence models here. Hell, at least the latter have grammar-based sampling, and are already adept with myriad symbolic tools. I'd say they're doing okay, relative to us stochastic (natural) intelligences.

> With relatively weak assumptions one can show the latter class of problem is not in the former

Please do! Transformers et al are models for any general sequences (e.g. protein structures, chatbots, search algorithms, etc). I'm not seeing a fundamental incompatibility here with goal generation or reasoning about hypotheticals.

mjburgess · 2025-01-28T00:55:02 1738025702

If your point is that there's a very very wide class of problems whose answer is a sequence (of actions, propositions, etc.) -- then you're quite correct.

But that isn't what transformers model. A transformer is a function of historical data which returns a function of inputs by inlining that historical data. You could see it as a higher-order function: promptable : Prompt -> Answer = transformer(historical_data) : Data -> (Prompt -> Answer)

it is true that Prompt, Answer both lie within Sequence; but they do not cover Sequence (ie., all possible sequences) nor is their strategy of computing an Answer from a Prompt even capable of searching the full space (Prompt, Answer) in a relevant way.

In particular, its search strategy (ie., the body of the `prompter`) is just a stochastic algorithm which takes in a bytecode (weights) and evaluates them by biased random jumping. These weights are an inlined subspace of Prompt,Answer by sampling this space based on historical frequencies of prior data.

This generates Answers which are sequenced according to "frequency-guided heuristic searching" (I guess a kind of "stochastic A* with inlined historical data"). Now this precludes imposition of any deductive constraints on the answers, eg., (A, notA) should never be sequenced, but can be generated by at least one search path in this space, given a historical dataset in which A, notA appear.

Now, things get worse from here. What a proper simulation of counterfactuals requires is partioning the space of relevant Sequences into coherent subsets (A, B, C..); (A', B', C') but NOT (A, notA, A') etc. This is like "super deduction" since each partition needs to be "deductively valid", and there needs to be many such partitions.

And so on. As you go up the "hierarchy of constraints" of this kind, you recursively require ever more rigid logical consistency, but this is precluded even at the outset. Eg., consider that a "Goal" is going to require classes of classes of such constrained subsets, since we need to evaluate counterfactuals to determine which class of actions realise any given goal, and any given action implies many consequences.

Just try to solve the problem, "buying a coffee at 1am" using your imagination. As you do so, notice how incredibly deterministic each simulation is, and what kind of searching across possibilities is implied by your process of imagining (notice, even minimally, you cannot imagine A & notA).

The stochastic search algorithms which comprise modern AI do not model the space of, say, Actions in this way. This is only the first hurdle.

nurettin · 2025-01-29T14:05:06 1738159506

> This generates Answers which are sequenced according to "frequency-guided heuristic searching" (I guess a kind of "stochastic A* with inlined historical data")

This sounds way too simplistic of an understanding. Transformers aren't just heuristically pulling token cards out of a randomly shuffled deck, they sit upon a knowledge graph of embeddings that create a consistent structure representing the underlying truths and relationships.

The unreliability comes from the fact that within the response tokens, "the correct thing" may be replaced by "a thing like that" without completely breaking these structures and relationships. For example: In the nightmare scenario of a STAWBERRY, the frequency of letters themselves had very little distinction in relation to the concept of strawberries, so they got miscounted (I assume this has been fixed in every pro model). BUT I don't remember any 2023 models such as claude-3-haiku making fatal logical errors such as saying "P" and "!P" while assuming ceteris paribus unless you went through hoops trying to confuse it and find weaknesses in the embeddings.

mjburgess · 2025-01-29T16:13:24 1738167204

You've just given me the heuristic, and told me the graph -- you haven't said A* is a bad model, you've said it's exactly the correct one.

However, transformers do not sit on a "knowledge graph", since the space is not composed of discrete propositions set in discrete relationships. If it were, then P(PrevState|NextState) = 0 would obtain for many pairs of states -- this would destroy the transformers ability to make progress.

So rather than 'deviation from the truth' being an accidental symptom, it is essential to its operation: there can be no distinction-making between true/false propositions for the model to even operate.

> making fatal logical errors such as saying "P" and "!P"

Since it doesn't employ propositions directly, how you interpret its output in propositional terms will determine if you think it's saying P&!P. This "interprerting-away" effect is common in religious interpretations of texts where the text is divorced from its meaning, a new one substituted, to achieve apparent coherence.

Nevertheless, if you're asking (Question, Answer)-style prompts where there is a cannonical answer to a common question, then you're not really asking it to "search very far away" from its inlined historical data (the ersatz knowledge-graph that it does not possess).

These errors become more common when the questions require posing several counterfactual scenarios derived from the prompt or otherwise have non-cannonical answers which require integrating disparate propositions given in a prompt.

The prompt's propositions each compete to drag the search in various directions, and there is no constraint on where it can be dragged.

nurettin · 2025-01-31T07:18:22 1738307902

I am not going to engage with your A* proposition. I believe it to be irrelevant.

> However, transformers do not sit on a "knowledge graph", since the space is not composed of discrete propositions set in discrete relationships.

This is the main point of contention. By all means, embeddings are a graph, as you can use a graph to represent its datastructure, but not a tree. Sure, they are essentially points in space, but a graph emerges as the architecture starts selecting tokens for use according to the learned parameters during inference. It will always be the same graph for the same set of tokens for a given data set which provides "ground truth". I know it sounds metaphoric but bare with me.

The above process doesn't result in discrete propositions like we have in prolog, but the point is, it is "relatively" meaningful, and you seed a traversal by bringing tokens to the attention grid. What I mean by relatively meaningful is that inverse relationships are far enough that they won't usually be confused, so there is less chance of meaningless gibberish emerging which is what we observe.

a_wild_dandan · 2025-01-28T18:31:22 1738089082

If I replaced "transformer" in your comment with "human", what changes? That's my point.

Humans are a "function of historical data" (nurture). Meatbag I/O doesn't span all sequences. A person's simulations are often painfully incoherent, etc. So what? These attempts at elevating humans seems like anthropocentric masturbation. We ain't that special!

a_wild_dandan · 2025-01-27T17:26:19 1737998779

Running a 680-billion parameter frontier model on a few Macs (at 13 tok/s!) is nuts. That'a two years after ChatGPT was released. That rate of progress just blows my mind.

qingcharles · 2025-01-27T23:27:16 1738020436

And those are M2 Ultras. M4 Ultra is about to drop in the next few weeks/months, and I'm guessing it might have higher RAM configs, so you can probably run the same 680b on two of those beasts.

The higher performing chips, with one less interconnect, is going to give you significantly higher t/s.