Okay well. I work on Bluesky and helped build the AT Protocol. I'm sorry Sam differs with us on this, and I'm glad that Activity Pub is already there for him. However, Sam doesn't understand the ATProto very well and I want to clear it up a bit.
Before I do, let me just say: Bluesky and the AT Proto are in beta. The stuff that seems incomplete or poorly documented is incomplete and poorly documented. Everything has moved enormously faster than we expected it to. We have around 65k users on the beta server right now. We _thought_ that this would be a quiet, stealthy beta for us while we finished the technology and the client. We've instead gotten a ton of attention, and while that's wonderful it means that we're getting kind of bowled over. So I apologize for the things that aren't there yet. I haven't really rested in over a month.
ATProto doesn't use crypto in the coin sense. It uses cryptography. The underlying premise is actually pretty similar to git. Every user runs a data repository where commits to the repository are signed. The data repositories are synced between nodes to exchange data, and interactions are committed as records to the repositories.
The purpose of the data repository is to create a clear assertion of the user's records that can be gossiped and cached across the network. We sign the records so that authenticity can be determined without polling the home server, and we use a repository structure rather than signing individual records so that we can establish whether a record has been deleted (signature revocation).
Repositories are pulled through replication streams. We chose not to push events to home servers because you can easily overwhelm a home server with a lot of burst loads when some content goes viral, which in turn makes self hosting too expensive. If a home server wants to crawl & pull records or repositories it can, and there's a very sensible model for doing so based on its users' social graph. However the general goal is to create a global network that aggregates activity (such as likes) across the entire network, and so we use large scale aggregation services to provide that aggregated firehose. Unless somebody solves federated queries with the sufficient performance then any network that's trying to give a global view is going to need similar large indexes. If you don't want a global view that's fine, then you want a different product experience and you can do that with ATProto. You can also use a different global indexer than the one we provide, same as search engines.
The schema is a well-defined machine language which translates to static types and runtime validation through code generation. It helps us maintain correctness when coordinating across multiple servers that span orgs, and any protocol that doesn't have one is informally speccing its logic across multiple codebases and non-machine-readable specs. The schema helps the system with extensibility and correctness, and if there was something off the shelf that met all our needs we would've used it.
The DID system uses the recovery key to move from one server to another without coordinating with the server (ie because it suddenly disappeared). It supports key rotations and it enables very low friction moves between servers without any loss of past activity or data. That design is why we felt comfortable just defaulting to our hosting service; because we made it easy to switch off after the fact if/when you learn there's a better option. Given that the number one gripe about activitypub's onboarding is server selection, I think we made the right call.
We'll keep writing about what we're doing and I hope we change some minds over time. The team has put a lot of thought into the work, and we really don't want to fight with other projects that have a similar mission.
Reading this I hear someone passionate about technology for the sake of technology. Which is cool, I totally get the desire to build things oneself, but it doesn't really address the substantive questions people are asking about AT:
An open protocol exists that broadly does what you want to do. That protocol is stable and widely used. That in itself, regardless of the quality of the protocol, already represents an OKish argument to strongly consider using it. If you're going to go NIH your replacement needs to not just be better but substantially better, and you should also show understanding of the original open spec.
Sofar, Bluesky's already been caught redoing little things in ways that show a lack of reading/understanding of open specs (.well-known domain verification), and the proposed technological improvements over ActivityPub fall into three categories:
1. Something ActivityPub already supports as a "SHOULD" or a "MAY": there's arguments to be made these would be better as "MUST", but either way there's no reason AT couldn't've just implemented ActivityPub with these added features.
2. Highly debatable improvements - as highlighted by this article. I do think some of this article is hyperbole but it certainly highlights that some of the proposed benefits are not clear cut.
3. Such a minor improvement as to be nowhere near worth the incompatibility.
All that coupled with the continuous qualifiers of it being incomplete/beta/WIP when there's a mature alternative really just doesn't present well.
I run a single user ActivityPub instance with a minimal following and small number of people across multiple instances that I follow. From a user perspective ActivityPub is fine I have no complaints.
However from an Ops perspective ActivityPub is incredibly chatty. If this had to scale to a larger instance the costs would spiral fast. Operationally and cost efficiency wise ATProto is a better looking protocol already. From a single individual user this won't necessarily be obvious right off the bat. But it will tend to manifest in either overworked operations people or slow janky instance performance.
While it's certainly a reasonable question whether the world needs another federated social protocol or not ATProto definitely solves real problems with the ActivityPub protocol.
Being technically better is usually not a good enough reason to be incompatible.
I'm not sure why people don't get this, but it is almost always true.
Starting from scratch, just because you can theoretically design a better system, is one of the worst thing to do to users. Theoretically better also rarely wins in the marketplace anyway.
If you want a slightly lighter position:
Software needs to be built to be migrated from and to, as a base level requirement. Those communities that do this tend end up with happy users who don't spend lots of toil trying to keep up. This is true regardless of whether the new thing is compatible with the old - if it doesn't take any energy or time, and just works, people care less what you change.
Yes, it is hard. yes, there are choices you make sometimes that burn you down the road. That is 100% guaranteed to happen. So make software that enable change to occur.
The overall the amount of developer toil and waste created en masse by people who think they are making something "better", usually with no before/after data (or at best, small useless metrics sampled from a very small group), almost always vastly dwarfs all improvement that occurs as a result.
If you want to spend time helping developers/users, then understand where they spend their time, not where you spend your time.
Something that I think your analysis is missing is that with a decentralized product, ops is also user experience.
The whole point is that it needs to be reasonably easy for people to run and scale their own servers. If people are constantly being burned out, quit, or run out of money, then it has an impact on regular users.
From a purely technical analysis ATProto looks better as a protocol to me. But I don't use Bluesky I use ActivityPub because the people I want to be connected to are there and not on Bluesky. I do think you could probably make improvements to ActivityPub that reduce operational costs. It's not something I feel the need to tackle right this moment because my usage doesn't incure those costs really.
I think this is mostly silly. Barely anyone uses ActivityPub. If BlueSky only moderately catches on and 100M people start using it, the total number of ActivityPub users would be a rounding error.
> If you want to spend time helping developers/users, then understand where they spend their time, not where you spend your time.
You're spending your time with ActivityPub, so this advice should apply to you, too. The bulk of potential users are spending their time on anything but ActivityPub. And as for developers, one of course needs to attract them, but I hear the people behind BlueSky have a couple of bucks, the ambition to create a huge potential new market, and a track record of creating a couple of things. I don't think they'll have trouble finding developers.
If BlueSky comes up with a better architecture, ActivityPub clients should rebase. BlueSky should pretend they don't exist, except if they have some nice schemas or solved some problem efficiently, try to maintain compatibility with that unless there's even the slightest reason to deviate.
Didn't ActivityPub have enough of a head start? Why didn't ActivityPub just use Diaspora? Why prioritize ActivityPub over OStatus?
"I think this is mostly silly. Barely anyone uses ActivityPub. If BlueSky only moderately catches on and 100M people start using it, the total number of ActivityPub users would be a rounding error."
Says everyone everywhere who thinks they made something better!
"If BlueSky comes up with a better architecture, ActivityPub clients should rebase. BlueSky should pretend they don't exist, except if they have some nice schemas or solved some problem efficiently, try to maintain compatibility with that unless there's even the slightest reason to deviate."
Look, i'm not suggesting whoever does it first gets to dictate it, but literally everyone thinks their thing will be better enough to attract lots of users or be worth it, and most never actually do/are.
They do, however, cause lots and lots and lots of toil!
Your position is exactly what leads people over this cliff - better architecture does not matter. it doesn't. Technical goodness is not an end unto itself. Its a means, often to reduce cost or increase efficiency, and unfortunately rarely, deliver new features or better experience. Reducing cost or increasing efficiency are great. But architecture is not the product.
The product is the product.
> The overall the amount of developer toil and waste created en masse by people who think they are making something "better", usually with no before/after data […], almost always vastly dwarfs all improvement that occurs as a result.
So, why is the Fedi not built on RSS/Websub/etc. then?
Don't know enough about this particular topic to offer a view, but in general, because people care more about releasing their better thing and pretending they are helping than the hard work of actually helping
Are there benchmarks for this? What's the level of difference here? Request frequency seems closely linked to activity and XRPC bodies are JSON just as ActivityPub so message size should be within order of magnitude at least. Are there architectural differences that reduce request frequency significantly?
> If this had to scale to a larger instance the costs would spiral fast.
Are we talking bandwidth costs or processing. I know the popular ActivityPub implementation is widely considered to be pretty inefficient processing-wise for reasons unrelated to the protocol itself: is that a factor here?
One example of the chattiness is a flow where more than one person is following the same individual on another server. That person will have to push new messages to every single one of the people following them. This means that if 10 users are following me from the same server I will not have 1 push for that instance, I'll gave 10 pushes for the same single unchanged message. This is built into the protocol. That's a lot of throughput for something that could be much much less chatty.
Now on my single instance it's not too bad because 1. I follow maybe 40 people and 2. I have like max 10 followers. For an instance with people with high follower counts across multiple other instances it could get to be a problem fast.
Edit: my previous description used fetch when it should have used push.
Isn't that "more than once instance" rather than "more than one user"?
I think the weirdness is with Bluesky all that cost is still there but it's now handled by a small group of massive megacorps which is a real tangible benefit to self-hosters but you could have that on top of AP by running your service off what would essentially be a massive global cache of AP content which is what the indexer is.
I should note that as a protocol I suspect that ATProto is less chatty which does translate to reduced costs. It adds features on top that some may or may not want which increase costs in other ways but only for the people who want to utilize those features. It's not exactly an Apples to Apples comparison.
... in which case it may be an implementation issue?
Mind you, there is liberal use of "MAY" there which I find is always a problem with specs: that would likely lead to mandatory chattiness of outgoing requests if you're federated with a lot of instances without shared inbox support, but should at least solve for incoming.
There can be an interrogation endpoint/message of supported versions/extensions to the base protocol, that's a very normal thing. If it supports bundled delivery, send a single bundle if not send them all individually.
Yep, And I'm sure there are some instances that do exactly that. But in a distributed protocol you only get the benefit if both sides of a given interaction support the optimization. For something in the spec that is optional you can't rely on it and you aren't forced to implement it so it's not irrational to just ignore it. Which typically means you only get occasional marginal benefit.
i mean it depends. The vast majority of fedi traffic is mastodon. Add it to mastodon it makes an impression and a real difference. At first mastodon to mastodon comms, but others will take notice, it will find it's way into libraries then it's smooth sailing.
I'm not to into the weeds of activitypub yet but in my head it seems like you could add some things to the protocol to optimize this.
1. server with the sender constructs a map by receiving server that contains a list of all users on that server who should receive the message.
2. sending server iterates the map. If the receiving server has multiple recipients do a check to see if the receiving server supports this kind of 'bundled' delivery.
2a. If so send the message once with a list of receivers.
2b. receiving service processes the one message and delivers it to all the users.
3. If not sending server sends it in the traditional way, multiple pushes.
Sidekiq seems to be the culprit from what I've seen and read from people who've run into this issue. It gets overloaded fast if you don't have enough processing power in front of the queue. Lighter implementations apparently do something different, or are more efficient in handling their queues without whatever overhead Sidekiq adds.
It's weird to me that all the complaints about @proto being NIH focus on ActivityPub when if anything it's closer to an evolution of secure scuttlebutt. The two are so fundamentally different I do not understand the complaints.
> The two are so fundamentally different I do not understand the complaints.
You will find that many people do not dig into details. You can post on Mastodon, you can post on Bluesky, therefore they must be similar.
It does mean that learning about things becomes a superpower, because you can start to tell if a criticism is founded in actual understanding or something more superficial.
User Identity in ATProto is decentralized, it's meant to use W3C DIDs.
That's actually one of the things that bugs me about ActivityPub... unless I'm running my own one-single-user instance I won't have control over my own identity.
It's also very weird how even on a supposedly "federated" system the only way to ensure you can access content from all instances (even if they differ in philosophy or are on opposite sides of some "inter-instance-war") is to have separate accounts for each side... it kind of defeats the point of federation. There's even places like lemmy which instead of using blocklists use allowlists, so they will only federate with pre-approved instances.
It is but that's largely conceptual so doesn't really mean anything in the context of the protocol. These are message exchange protocols so the defining element is whether the messaging is federated or decentralised.
Fwiw AP also supports DID I just haven't seen any implementations use it, since it strongly recommends other ID (a mistake imo).
> It's also very weird how...
What you're describing here is a cultural phenomenon, not a technological one, so isn't really relevant to the discussion: ActivityPub siloing isn't a feature of the protocol, it's an emergent feature of the ecosystem/communities.
It's also worth mentioning it isn't usually implemented as you describe, unless you're specifically concerned with maximising your own reach from a publishing perspective: most instances allow individual users to follow individual users on another "blocked" instance - its usually just promotion/sharing & discovery that are restricted.
> most instances allow individual users to follow individual users on another "blocked" instance - its usually just promotion/sharing & discovery that are restricted.
If they are still allowing access to all forms of third party content through their own instance (even if they restrict the discoverability) then they are still risking being held responsible for that content. So imho, that would be a mistake.
Personally, if I were to host my own instance under such a protocol, I'd rather NOT allow any potentially illegal content that might come from an instance I don't trust to be distributed/hosted by my node.
The problem, imho, is in the way the content needs to be cached/proxied through the node of the user in order for the user to be able to consume it. This is an issue in the design of how federation typically works.
I'd rather favor a more decentralized approach that uses standards to ensure a user identity can carry over across different nodes of content providers, whether those nodes directly federate among themselves or not.
There should be a separation between identity providers and content providers, in such a way that identity providers have freedom to access different content providers, and content providers can take care of moderation without necessarily having to worry about content from other content providers with maybe different moderation standards.
I'm not saying ATProto is that solution... but it seems to me it's a step in the right direction, since they separate the "Personal Data Server" from the "Big Graph Services" that index the content. I can host my own personal single-user server without having all the baggage of federating all the content I want to consume. The protocol is better suited for that use case.
In services using ActivityPub, instances are designed for hosting communities, they come with baggage that's overkill for a single-user service but that's still mandated due to how the communications work, they expect each instance to do its own indexing/discovery/proxying. So they are bound to be heavier and more troublesome to self-host, and at the same time, from what I've seen, the cross-instance mechanisms for aggregation in services like Mastodon are lacking.
I agree that the siloing happening with AP instances is not a good thing and it's the main reason I have not bothered with the fediverse. But this isn't a technological limitation with the protocol at all but a policy chosen by the operators. What makes you think that this won't apply to BlueSky or other ATProto instances (when they allow federation at all)?
If the protocol is designed in such a way that it allows the operators of one instance to have full control over what some user identities can access to in the whole network, then it's an issue in the protocol. Imho, the problem is likely inherent to the way the AP fediverse commonly defines "federation".
I'd rather favor a more decentralized structure that allows the users to directly access content from any content provider that hosts it through the protocol (ie. without necessarily requiring another specific instance to index that content from their side.. if they index it great, but if they don't it should still be possible to access it using the same user account from a different index), with a common protocol that allows a separation between identity management and content provider.
From what I understood, ATProto is closer to that concept.
> User Identity in ATProto is decentralized, it's meant to use W3C DIDs.
The W3C spec leaves all the hard parts to vendors, which is why the only DID implementation up to now has been Microsoft's, which relies on an AD server in Azure. Much decentralise.
Bluesky's isn't that, but a hash of some sort, which is centrally decentralised ... on their servers? I think this is one of the bits of AT that isn't finished yet.
But "W3C DID" is not a usable spec in itself, it's a sketch at best.
There's a growing base of users who have reached the epiphany (by multiple paths) that both identities & content-addressing MUST be cryptographically-rooted, or else users' privacy & communications will remain at the mercy of feudal centralizers with endless strong incentives to work against their interests.
For such users, any offering without these is a non-starter, dead-on-arrival.
People with resistance to this epiphany sound like those who used to insist, "HTTP is fine" (even when it put people at risk) or "MD5 is fine" (long after it was cryptographically broken). Most will get it eventually, either through painful tangible experiences or the gradual accumulation of social proof.
A bolt-on/fix-up of an older protocol might work, if done with extreme competence & broad consensus. And, some in the ActivityPub world had the cryptoepiphany very early! Ideas for related upgrades have been kicked around for a long time. But progress has been negligible, & knee-jerk resistance strong, & the deployed-habits/technical-debts make it harder there than in a green-field project.
Hence: a new generation of systems that bake the epiphany in at their core – which is, ultimately, a more robust approach than a bolt-on/fix-up.
Because so many of those recently experiencing this cryptoepiphany reached it via experience with cryptotokens, many of these systems enthusiastically integrate other aspects of the cryptotoken world – which of course turns off many people, for a variety of good and bad reasons.
But the link with cryptotokens is plausibly inessential, at least at the get-go. The essentials of grounding identity & addressing in cryptography predate Bitcoin by decades, and had communities-of-practice totally independent of the cryptoeconomics world.
A relative advantage Bluesky may have is their embrace of cryptographic addressing behind-the-scenes, without pushing its details to those who might confuse it with promotional crypto-froth. Users will, if all goes well, just see the extra security, scalability, and user sovereignty against abuses that it offers. We'll see.
It was clear that MD5 didn't meet the goals it was designed for in 1994, when experts recommended it be phased out for its originally intended uses.
It's not fine here in 2023.
If you need a secure hash, it's been proven broken for 10 years now.
If you don't need a secure hash, others are far more performant.
Using it, or worse, advocating for its use, is a way to signal your thinking is years behind the leading edge, and also best practices, and even justifiable practices.
HTTP's simplicity could make it tolerable for some places where world-readability is a goal - but people, echoing your sentiments here, have said it was "fine" even in situations where it was putting people at risk.
Major browser makers recognize the risk, and are now subtly discouraging HTTP, and this discouragement will grow more intense over time.
> If you're going to go NIH your replacement needs to not just be better but substantially better, and you should also show understanding of the original open spec.
I just disagree with this in principle. I wonder what the tech equivalent of "laissez faire" would be.
To this day I don't understand how anyone in the tech world thinks they can make a single demand of anyone else. Even as a customer I believe you can really only demand that the people you are paying deliver what is contractually and legally required. But outside of that ... I just don't understand people's mentality on this subject.
What is it about this specific area of the web that attracts these idealogical zealots? I had the same head-scratching moment in the early 2000s when RSS and Atom were duking it out.
> I don't understand how anyone in the tech world thinks they can make a single demand of anyone else.
Ultimately, yes you're right, no-one can make actual "demands" of anyone else. My language above is demanding, certainly, but ultimately I'm just arguing opinion. I cannot control any outcomes of what Bluesky or any other enterprise choose to pursue.
> What is it about this specific area of the web that attracts these idealogical zealots?
I think it stems from the unprecedented success of such zealots in the 1980s, which have differentiated the technological landscape of software technologies from previous areas of engineering by making them more approachable, accessible, interoperable and ultimately democratised. That's largely been the result of people arguing passionately on the internet to advocate for that level of openness, collab & interop.
> I think it stems from the unprecedented success of such zealots in the 1980s,
I would challenge this belief, that the success of Linux, or the Web technologies (TCP, HTTP, HTML, etc.) were primarily the result of the zealots from the 80s. I would challenge the belief that protocols for Twitter-like communication fall into the same category as things like POSIX, TCP/IP, HTTP, HTML, etc.
> That's largely been the result of people arguing passionately on the internet to advocate for that level of openness, collab & interop.
My own opinion is the key to success was a large number of people writing useful code and an even larger number of people using that code.
One example that comes to mind is how HTML was spun out from w3c into WhatWG. Controversial at the time to say the least but IMO necessary to get away from the bickering of semantic web folks who were grinding the progress to a halt. HTML 5 won the day over XHTML (actually to my disappointment). The reason wasn't the impassioned arguments of the semantic web zealots, it was the working implementations delivered by the WhatWG members to serve the literal millions of people using their applications. Another example is the success of Linux over Hurd - the latter being a technology initially supported by the most vocal and ideological of all the zealots.
It is a simple fact that if AT garners sufficiently useful implementations and those implementations garner sufficient numbers of users - all of the impassioned arguments against it will have been for naught. So those arguing should probably stop so that they can focus on implementing ActivityPub (or whatever they think is best) and attracting users. I guarantee you that if ActivityPub attracts millions of users then Bluesky will suddenly see the light. They will change course just like every big tech company did when Linux became successful.
It is also why I think all of the hot-air on RSS and Atom was literally wasted. As technologies both have thus-far failed to attract enough users to make it worth it. I would bet that the same will be true of both AT and ActivityPub. Unless someone develops an application that uses one or the other and it manages to attract as many users as Twitter then people are just wasting time for nothing.
Calling out undesirable behavior and demanding people act better is how society improves. See things you don't like, campaign for change. Zealotry or activism - the label depends mostly on whether you agree with the cause.
That you have no legal basis to enforce those demands does not mean that you cannot make them. They can be ignored of course but they may not be. Not all interactions have to be governed by a literal contract as opposed to a social contract.
This is a great summary of what my concerns are as well. I'd maybe add one more point:
* Improvements that'd layer cleanly on top of ActivityPub if they'd made any attempt at all.
E.g. being able to "cheaply" ensure that you have a current view of all a given users objects is not covered in ActivityPub - you're expected to basically want to get the current state of one specific object, because most of the time that is what you'll want.
So maybe it falls in the "highly debatable" category, but we also have a trivial existing solution from another open spec: RemoteStorage mandates etag's where a parent "directory" object's etag will change if a contained object changes, and embeds that in a JSON-LD collection as the directory listings. If you feel strongly that this is needed to be able to rapidly sync changes to a large collection, an ActivityPub implementation can just support that mechanism without any spec changes being needed at all (but documenting it in case other implementations wants to do so would be nice). Heck, you can "just" add minimal RemoteStorage compatibility to your ActivityPub implementation, since that too users Webfinger, and exposing your posts as objects in a RemoteStorage share would be easy enough.
Want to do a purely "pull" based ActivityPub version the way AT is "pull"? Support the above (w/fallback of assuming every object may have changed if you care about interop), make your "inbox" endpoints do nothing and tell people you've added an attribute to the actor objects to indicate you prefer to pull and so to not push to you.
Upside is, if any of the AT functionality turns out to be worthwhile, it'll be trivial to "steal" the good bits without dropping interop.
(Also, I wanted to see exactly how AT did pulls, and looked at the AT Proto spec, and now I fully concur with the title of this article)
Your added point fits in with the initial red flag for me - before I saw pfraze's (excellent) post here - that the vast majority of what I've read advocating for ATProto says very clearly: "Account portability is the major reason why we chose to build a separate protocol.".
Account portability is in no way incompatible with ActivityPub. It's not built into the spec., but it's also not forbidden / prevented by the spec. in any way. From ActivityPub's perspective it's an implementation detail.
Would it be nice if it was built into the spec: yes. Does that justify throwing out the baby with the bathwater?
I was hoping pfraze would offer some better reasoning, but while the post above is a great read for the technically curious, its very "in the weeds" so doesn't really address the important high-level questions. Textbook "technologists just want to technologise" vibes: engineers love reinventing things because working out problems for yourself is the fun part, even if many people have come together to collaborate on solving those problems before.
An unspecified "implementation detail" is essentially another way of saying that it doesn't work.
I've ported my account on ActivityPub a couple time, and it's a horrendous experience -- not only do I lose all my posts and have to manually move a ton of bits, but the server you port from continues to believe you have an account and doesn't like to show you direct links on there anymore.
The latter could probably be easily solved, the former needs to be built into the spec or it will continue to be broken.
100% agree with everything in your post. I don't see how it contradicts anything I've said though.
> the former needs to be built into the spec or it will continue to be broken.
Absolutely, but ActivityPub doesn't preclude that. There's no reason for that proposed feature to be incompatible with the spec., or to have to exist in an implementation that is incompatible with ActivityPub.
Fwiw Mastodon, the most popular ActivityPub implementation, is (as is often the case with open standards) not actually fully compliant with the spec. They implement features they need as they need & propose them. This is obviously a potential source of integration pains, but as long the intent to be compatible is still there, it's still a better situation.
Even framing it as 'account portability' is missing the point. I want to own my identity and take it with me anywhere. I shouldn't need to transfer anything, its mine. This model is fundamentally incompatible with how ActivityPub works (no, running your own server is not the same thing as decoupling identity from the concept of a server itself.) Could that change in the future? Maybe, but not without prior art. Even if you think @proto, farcaster, ssb, nostr, or others are doomed, we should be applauding them for attempting to push the needle forward.
It's not incompatible with how ActivityPub works at all.
The ActivityPub spec says that object id's should be https URI's, not that they must. The underlying ActivityStreams spec just requires them to be unique.
All that's needed to provide full portability without a "transfer" is for an implementation to use URI's to e.g. DID's, or any other distributed URI scheme. Optionally, if you want full backwards compatibility, point the id to a proxy and add a separate URI until there's broader buyin.
I think there's be benefit in updating the ActivityPub spec to be less demanding of URIs, and instead of saying that they "should" be https require them to be a secure transport, and maybe provide a fallback mechanism if the specific URI mechanism is not known (e.g. allow implementations to provide a fallback proxy URL), but the main challenge there is not the spec but getting buying from at least Mastodon. The approach of providing a https URI but give a transport-neutral id separate to the origin https URI would on the other hand degrade gracefully in the absence of buyin.
I ran into this trap very early in my entrepreneurial journey. Developed a data "standard" for events in a vacuum, convinced that nothing else in the world could possibly exist. Then spent a good year learning the lessons that already-existing standards had learned and adapted from years ago.
A protocol like this doesn't work without community adoption. And the best way to get the community to adopt is to build on what's already there rather than trying to reinvent the fridge.
Not having read the article, I got pretty far into your second paragraph before realizing you all aren't talking about configuring a modem over a serial link. It's funny because the "Hayes AT command set" protocol is also an obtuse crock. I was really hoping you were going to open my mind with some deep wisdom straight out of 1981.
Yeah, I wish they hadn't clobbered the name of an existing, well-known protocol. It's still used in drivers for cellular modems (I'm working with it right now), which are getting more and more numerous for IoT applications.
They might be searchable on the wider internet if you’re looking for info on them, but I can never remember them when I need them on the actual system.
On my windows PC, I type ‘note’ and slam enter to open notepad a thousand times a day without a problem. On my KDE desktop ‘text’ seems to 50/50 bring up… whatever the text editor is called and 50/50 something else. Apparently “Kate” is what I’m after.
There is real value in naming things after what they do and it’s my sole gripe with KDE that they have stupid names.
I agree that 'go' has to be the worst naming from an SEO perspective. Only ones worse a single letter names. And TBH, they might be _better_ simply because they aren't a common English language word.
I think to use the fetus the @ should represent a kernel, object, but in a generic form such / representing the top of the tree.
I can also see the term becoming controversial such male and female vs. plug and socket. Some see the former as blatantly sexual and the latter requiring dirty mind to be sexual.
Maybe it's best to hold off on that reference until we get more voices to chime in.
LOL I was mostly thinking of the swiss roll but I know it's only called rocambole in Brazil. It's got all sorts of names all over the world according to Wikipedia
I've certainly had to implement AT commands in C, but within a proprietary codebase.
It's tough to do in a freestanding way given that it's a command-response protocol. It's very convenient to depend on the specific uart API that's available.
I thought exactly the same. I will dive into configuring a Raspberry Pi to work with a 5G hat/SIM and will use at commands for this (first time ever). I was looking forward to reading how horrible an experience it will be and was a bit confused :D
> the "Hayes AT command set" protocol is also an obtuse crock
Yeah, but it's an obtuse crock that you can literally still here in your memories. How many protocols can say that? It has a special place in my heart, I think.
Yes, I'm going to call my object database the ObtuseCrock. Everything that inherits from this base will be an obtuseCrock object.
These can be addressed obtuseCrock.[1] or by name directly.
I can't say that I have had that pleasure, or if I have I didn't realize it by name. I've just spent a lot of time hammering out reliability issues on embedded cell modems with buggy firmware and power supplies that droop too much on 2G transmissions.
Having done a decent bit of hacking around ActivityPub, when I read the (thin, which 'pfraze and others have copped to) documentation I immediately went "oh, this is going to be way more scalable than ActivityPub once it's done."
It's not all roses. I'm not sold on lexicons and xrpc, but that's probably because I am up to my eyeballs in JSON Schema and OpenAPI on a daily basis and my experience and my existing toolkit probably biases me. I think starting with generally-accepted tooling probably would've been a better idea--but, in a vacuum, they're reasonably thought-out, they do address real problems, and I can't shake the feeling that the fine article is spitting mad for the sake of being spitting mad.
While federation isn't there yet, granted, the idea that you can't write code against this is hogwash. There's a crazily thriving ecosystem already from the word jump, ~1K folks in the development Discord and a bunch of tooling being added on top of the platform by independent developers right now.
Calm down. Nobody's taking ActivityPub away from people who like elephants.
If they'd proposed (or even just implemented) improvements that at least suggested they'd considered existing options and wanted to try to maximise the ability to do interop (even with proxies), I'd have been more sympathetic. But AT to mean seems to be a big ball of Not Invented Here that makes me worry that either they didn't care and try, or that they choice to make interop worse for a non-technical reason.
During this stage of discovery I'm completely comfortable with ground up rethinks.
I don't feel we have the correct solution and there is no commercial reason to ge this thing shipped. Now is the time to explore all the possibilities.
Once we have explored the problem space we should graft the best bits together for a final solution, if needed.
I'm not sure I see the value of standardizing on a single protocol. Multiple protocols can access the same data store. Adopting one protocol doesn't preclude other protocols. I believe Developers should adopt all the protocols.
Ground up rethinks that takes into account whether or not there's an actual reason to make a change is good. Ground up rethinks that throws things away for the sake of throwing them away even what they end up doing would layer cleanly are not. They're at best lazy. At worst intentional attempts at diluting effort. I'm hoping they've only been lazy.
I'm not disagreeing. To say that there is only one way or to project presumed goals and intentions is too far for me.
I firmly believe that protocols are developed through vigorous rewrites and aren't nearly as important as the data-stores they provide access to. I would like our data-stores to be stagnant and as required we develop protocols. Figuring out a method to deal with whatever the hosted data-store's chosen protocol is seems correct to me.
I just don't see mutual exclusivity. Consider the power of supporting both protocols.
I think this is referring to the content-hashed user posts. Using this model one can pull content from _anywhere_ without having to worry about MITM forgeries etc. This opens up the structure of the network, basically decentralizing it even _more_.
ActivityStreams just requires an object to have a unique URI. ActivityPub says it "should" be a https URI. However, since this URI is expected to be both unique and unchanging (if you put up the same content with a different id, it's a different object), you can choose to use it as the input to a hash function and put the posts in a content-addressable store.
Mastodon will already check the local server first if you paste the URL of a post from another server in the Mastodon search bar, so it's already sort-of doing this, but only with the local server.
So you can already do that with ActivityPub. If it becomes a need, people will consider it. There's already been more than one discussion about variations over this.
(EDIT: More so than an improvement on scaling this would be helpful in ensuring there's a mechanism for posts to still be reachable by looking up their original URI after a user moves off a - possibly closing - server, though)
The Fediverse also uses signatures, though they're not always passed on - fixing that (ensuring the JSON-LD signature is always carried with the post) would be nice.
Because the URI is only _expected_ to be immutable, not required, servers consuming these objects need to consider the case where the expectation is broken.
For example, imagine the serving host has a bug and returns the wrong content for a URI. At scale this is guaranteed to happen. Because it can happen, downstream servers need to consider this case and build infrastructure to periodically revalidate content. This then propagates into the entire system. For example, any caching layer also needs to be aware that the content isn't actually immutable.
With content hashes such a thing is just impossible. The data self-validates. If the hash matches, the data is valid, and it doesn't matter where you got it from. Data can be trivially propagated through the network.
The URI is expected to be immutable. The URI can be used as a key. Whether the object is depends on the type of object. A hash over the content can not directly be used that way, but it can e.g be used to derive an original URI in a way that allows for predictable lookups without necessarily having access to the origin server.
Posts are explicitly not immutable, so they do need to be revalidated, and that's fine.
For a social network immutable content is a bad thing. People want to be able to edit, and delete, for all kinds of legitimate reasons, and while you can't protect yourself against people keeping copies you can at least make the defaults better.
> Posts are explicitly not immutable, so they do need to be revalidated, and that's fine.
OK that's my point. In the AT protocol design the data backing posts is immutable. This makes sync, and especially caching a lot easier to make correct and robust because you never need to worry about revalidation at any level.
> People want to be able to edit, and delete
Immutable in this context just means the data blocks are immutable. You can still model logically mutable things, and implement edit/delete/whatever. Just like how Git does this.
But to model mutable things over and immutable blocks you need to revalidate which blocks are still valid.
You need to know that the user expects you to now have a different view. That you're not mutating individual blocks but replacing them has little practical value.
It'd be nice to implement a mechanism that made it easier to validate whole collections of ActivityPub objects in one go, but that just requires adding hashes to collections so you don't need to validate individual objects. Nothing in ActivityPub precludes an implementation from adding an optional mechanism for doing that the same way e.g. Remote storage does (JSON-LD directories equivalent to the JSON-LD collections in ActivityPub, with Etags at collection level required to change if subordinate objects do).
> you need to revalidate which blocks are still valid.
No you don't. Sorry if I'm misunderstanding, but it sounds like maybe you don't have a clear idea of how systems like git work. One of their core advantages is what we're talking about here -- that they make replication so much simpler.
When you pull from a git remote you ask the remote what the root hash is, then you fetch all the chunks reachable from that hash which you don't yet have. If the remote says you need the chunk with hash X, and you have a chunk with hash X, then you have the data. You don't have to worry if it has changed. Once you have all the chunks reachable from the latest head, you have the latest state of the entire repository. That's it.
(I mean simple in the sense of clear/direct/correct, not in the sense of "easy". It's certainly the case that a design based on consuming a stream of change events is a lot less code).
> When you pull from a git remote you ask the remote what the root hash is, then you fetch all the chunks reachable from that hash which you don't yet have. If the remote says you need the chunk with hash X, and you have a chunk with hash X, then you have the data. You don't have to worry if it has changed. Once you have all the chunks reachable from the latest head, you have the latest state of the entire repository. That's it.
Yes, I know how Merkle trees work, what it allows you to do. In other words you use the hash to validate which blocks are still valid/applicable. Just as I said, you need to revalidate. In this context (a single user updating a collection that has an authoritative location at any given point in time) it effectively just serves a shortcut to to prune the tree of what you need to consider re-retrieving in this context.
It is also exactly why I pointed at RemoteStorage, which models the same thing with a tree of etags, rooted in the current state of a given directory to provide the same shortcut. RemoteStorage does not require them to be hashes from a Merkle tree, as long as they are guaranteed to update if any contained object updates (you could e.g. keep a database of version numbers if you want to, as long as you propagate changes up the tree), but it's easy to model as a Merkle tree. Since RemoteStorage also uses JSON-LD as a means to provide directories of objects, it provides a "ready lift" model for a minimally invasive way of transparently adding it to an ActivityPub implementation in a backwards compatible way.
(In fact, I'm toying with the idea of writing an ActivityPub implementation that also supports RemoteStorage, in which case you'd get that entirely for "free").
> (I mean simple in the sense of clear/direct/correct, not in the sense of "easy". It's certainly the case that a design based on consuming a stream of change events is a lot less code).
That is, if anything, poorly fleshed out in ActivityPub. In effect you want to revalidate incoming changes with the origin server unless you have agreed some (non-standard) authentication method, so really that part could be simplified to a notification that there has been a change. If you layer a merkle like hash on top of the collections you could batch those notifications further. If we ever get to the point where scaling ActivityPub becomes hard, then a combination of those two would be an easy update to add (just add a new activity type that carries a list of actor urls and hashes of the highest root to check for updates).
Doesn't it make it easier? A list of hashes which should be blacklisted means servers obeying rulings are never at risk of returning that data (this would also work offensively: poll for forbidden hashes and see who responds).
...and now you have to track which instance is authorize to block which hash, creating a lot of extra complexity. Plus, we need to trust all instances to really delete stuff.
My reaction whenever I see such headline is always "there must be an engineer out there who worked on this, I wonder how they feel about this". I had the same reaction today, opened comments and you were at the top. I love HN sometimes.
A few years ago I had to work with an obscure 90's era embedded microcontroller with a proprietary language and IDE, and a serial adapter based programmer and debugger. The IDE sucked, and programming would fail 9 times out of 10, but at least the debugger was solid.
By complete chance, I happened to interview someone that had, "wrote the debugger for that obscure microcontroller back in the 90's," tucked away in their resume. It was hard not to spend the entire interview session just picking their brain for anecdotes about it.
You were interviewing them for an unrelated role? Just trying to understand why that would be 'complete chance' and that you (by the sounds of it) didn't hire them to keep picking their brain on it.
The role, although also in the field of embedded software development, was indeed unrelated to that particular technology or its use within the company.
I spend my days scraping and reverse engineering various embedded legacy systems and a recurring thought is "Who is the braindead person who specified or implemented this?" Then I realize those bad technical decisions most of the time end up in production due to business concerns. Often they show a clear misunderstanding of how the underlying tech works, i.e. inexperience. Only rarely those bad designs seem to stem from pure incompetence.
That said, my frustration still gets converted into not-so-polite comments in the source code the culprits will never see.
> The schema is a well-defined machine language which translates to static types and runtime validation through code generation.
Can you elaborate on this point? There's multiple mature solutions in this space like OpenAPI+Json Schema, GraphQL, gRPC. All try to solve the same problems to varying degrees and provide similar benefits like generated static typed and runtime validation. Was there something unique for you that made these tools not appropriate and prevented you from building upon an existing ecosystem?
I don't know what AT is doing but I can say that while JSON Schema is okay as a validation schema it is less okay as codegen schema. I don't know if there is a fundamental divide between these two uses but in JSON Schema there is definitely an impedance mismatch.
For example, the JSON Schema structures: `anyOf`, `oneOf` and `allOf` are fairly clear when applied toward validation. But how do you map these to generating code for, say, C++ data structures?
You can of course minimize the problem by restricting to a subset of JSON Schema. But that leaves others. For example, a limited JSON Schema `object` construct can be mapped to, say, a C++ `struct` but it can also map to a homogeneous associative array, eg `std::map` or a fixed heterogeneous AA, eg `std::tuple`. It can also be mapped to an open-ended hetero AA which has no standard form in C++ (maybe `std::map<string,variant<...>>`). Figuring out the mapping intended by the author of the JSON Schema is non trivial and can be ambiguous.
At some level, I think this is an inherent divide as each language one targets for codegen supports different types of data structures differently. One can not escape that codegen is not always a fully functional transformation.
There's indeed few languages that can model all of JSON Schema in their type system. Typescript comes close. However, you can just use a subset as you said.
I don't really understand why this is a problem. Unless you're using things like Haskell, Julia, or Shapeless Scala, you generally accept that not everything is modeled at the type level. I don't know the nuances of the C++ types you mentioned, but I have not encountered the ambiguity you described in Typescript or the JVM. E.g. JSON Schema is pretty clear on that any object can contain additional unspecified keys (std::map I assume) unless additionalProperties: false is specified.
> `anyOf`, `oneOf` and `allOf` for [...] C++ data structures?
Like I said I don't know C++ well enough, but these have clear translations in type theory which are supported by multiple languages.I don't know if C++ types are powerful enough to express this.
allOf is an intersection type and anyOf is an union type. oneOf is challenging, but usually modelled OK enough as an union type.
Thanks for the comment. It helps me think how to clarify what I was trying to say.
What I wanted to express is that using JSON Schema (or any such) for validation encounters a many-to-one mapping from multiple possible types across any/all given programming languages to a single JSON Schema form. That is, instances of multiple programming language types may be serialized to JSON such that their data may be validated according to a single, common JSON Schema form. This is fine, no problem.
OTOH, using JSON Schema (or any such) for codegen reverses that mapping to be one-to-many. It is this that leads to ambiguity and problems.
Restricting to a subset of JSON Schema is only goes so far. For example, we can not discard JSON Schema `object` as it is too fundamental. But, given a simple `object` schema that happens to specify all properties have a common type `T` it is ambiguous to generate C++'s `class` or `struct` or a `std::map<string,T>`. Likewise, a JSON Schema `array` can be mapped to a large set of possible collection types.
To fight the ambiguity, one possibility is to augment the schema with language-specific information. At least, if we have a JSON Schema `object` we may add a (non `required`) property to provide a hint. Eg, we may add `cpp_type` propety. Then, typically, the overhead of using a codegen schema is only beneficial if we will generate code in multiple languages. So, this type hinting approach means growing our hints to include a `java_type`, `python_type`, etc. This is minor overhead compared to writing language types in "long hand" but still somewhat unsatisfying. With enough type-theory expertise (which I lack) perhaps it is possible to abstractly and precisely name the desired type which then codegen for each language can implement without ambiguity. But, given the wealth of types, even sticking with just a programming language's standard library, this abstraction may be fraught with complication. I think of the remaining ambiguity between specifying use of C++'s `std::map` vs `std::unordered_map` given an abstract type hint of, say, `associative_array`. Or `std::array`, `std::list`, `std::vector`, `std::tuple` if given a JSON Schema `array`).
I don't think this is a failing of JSON Schema per se but is an inherent problem for any codegen schema to confront. Something new must enter the picture to remove the ambiguity. In (my) practice, this ambiguity is killed simply by making limiting choices in the implementation of the codegen. This is fine until it isn't and the user says, "what do you mean I can't generate a `std::map`!". Ask me how I know. :)
AT and Bluesky are unfinished. Not ready for primetime. It's not fair to compare it to mature, well-developed stuff with W3C specs and millions of active users on thousands of servers with numerous popular forks.
But, also, everyone who hates Mastodon and spent the last months-years complaining about it is treating your project like the promised land that will lead them into the Twitterless future, somehow having gotten the impression that it's finished and ready to scale.
I think most critiques of Bluesky/AT are actually responding to this even if the authors don't realize it. They're frustrated at the discourse, the potshots, and the noise from these people.
> It's not fair to compare it to mature, well-developed stuff
Really? So rather than try to compare, contrast, and course-correct a project in its early stages by understanding the priors and alternatives, we should only do retrospectives after it has matured?
I would have thought this was the whole point of planning in early development: figuring out what you actually need to make? And that is usually a relative proposition, a project is rarely in a vacuum!
We should never just uncritically go ahead with the first draft in anything, especially not in the protocols we use, as they have this annoying habit of sticking around once adopted and being very hard to change after the fact.
Sure, but that's not really compatible with the "quiet, stealthy beta" thing OP claims they were aiming for. If it was meant to be a quiet beta journalists should probably have been invited at a later point.
Thanks for taking the time to write this up. I’m thrilled that there is some motion happening to make social a bit less terrible and your write up is maybe the most objective, least rhetorical commentary I have seen on bluesky so far (side note, is it too late to change the name? I always end up calling it “BS”).
I’m still waiting for an invite to try it myself but I’m excited to see how it compares to mastodon.
I’m also curious to know if AT Protocol can be used for multiple service types. I’m more interested in a reddit replacement (lemmy) than a Twitter replacement (mastodon) and I’m hoping we can see another full scale fediverse come to fruition.
I have not looked much at AT, but it seems it solves many of the same problems as Matrix. Instead of redoing all the Crypto and everything, why not build on martrix. There were other Twitter like things on Matrix before.
As far as I'm aware its developed on Matrix and had some connection with it, so this isn't a case of not knowing about it so there is likely some engineering reasons why this was not done.
* Both are designed as big-world communication networks. You don't have the server balkanisation that affects ActivityPub.
* Both eschew cryptocurrency systems and incentives.
* Both have names which everyone complains about being hard to google, despite "AT protocol" and "Matrix protocol" or "Matrix.org" being trivial to search for :P
There are some significant differences too:
* Matrix aspires to be the secure communication layer for the open web.
* AT aspires (i think) to be an open decentralised social networking protocol for the internet.
* AT is proposing a asymmetrical federation architecture where user data is stored on Personal Data Servers (PDS), but indexing/fan-out/etc is done by Big Graph Servers (BGS). Matrix is symmetrical and by default federates full-mesh between all servers participating in a conversation, which on one hand is arguably better from a self-sovereignty and resilience perspective - but empirically has created headaches where an underpowered server joins some massive public chatroom and then melts. Matrix has improved this by steady optimisation of both protocol and implementation (i.e. adding lazy loading everywhere - e.g. https://matrix-org.github.io/synapse/latest/development/syna...), but formalising an asymmetrical architecture is an interesting different approach :)
* AT is (today) focused on for public conversations (e.g. prioritising big-world search and indexing etc), whereas Matrix focuses both on private and public communication - whether that's public chatrooms with 100K users over 10K servers, or private encrypted group conversations. For instance, one of Matrix's big novelties is decentralised access control without finality (https://matrix.org/blog/2020/06/16/matrix-decomposition-an-i...) in order to enforce access control for private conversations.
* Matrix also provides end-to-end encryption for private conversations by default, today via Double Ratchet (Olm/Megolm) and in the nearish future MLS (https://arewemlsyet.com). We're also starting to work on post quantum crypto.
* AT's lexicon approach looks to be a more modular to extend the protocol than Matrix's extensible event schemas - in that AT lexicons include both RPC definitions as well as the schemas for the underlying datatypes, whereas in Matrix the OpenAPI evolves separately to the message schemas.
* AT uses IPLD; Matrix uses Canonical JSON (for now)
* Matrix is perhaps more sophisticated on auth, in that we're switching to OpenID Connect for all authentication (and so get things like passkeys and MFA for free): https://areweoidcyet.com
* Matrix has an open governance model with >50% of spec proposals coming from the wider community these days: https://spec.matrix.org/proposals
* AT has done a much better job of getting mainstream uptake so far, perhaps thanks to building a flagship app from day one (before even finishing or opening up the protocol) - whereas Element coming relatively late to the picture has meant that Element development has been constantly slowed by dealing with existing protocol considerations (and even then we've had constant complaints about Element being too influential in driving Matrix development).
* AT backs up all your personal data on your client (space allowing), to aid portability, whereas Matrix is typically thin-client.
* Architecturally, Matrix is increasingly experimenting with a hybrid P2P model (https://arewep2pyet.com) as our long-term solution - which effectively would end up with all your data being synced to your client. I'd assume bluesky is consciously avoiding P2P having been overextended on previous adventures with DAT/hypercore: https://github.com/beakerbrowser/beaker/blob/master/archive-.... Whereas we're playing the long game to slowly converge on P2P, even if that means building our own overlay networks etc: https://github.com/matrix-org/pinecone
I'm sure there are a bunch of other differences, but these are the ones which pop to the top of my head, plus I'm far from an expert in AT protocol.
It's worth noting that in the early days of bluesky, the Matrix team built out Cerulean (https://matrix.org/blog/2020/12/18/introducing-cerulean) as a demonstration to the bluesky team of how you could build big-world microblogging on top of Matrix, and that Matrix is not just for chat. We demoed it to Jack and Parag, but they opted to fund something entirely new in the form of AT proto. I'm guessing that the factors that went into this were: a) wanting to be able to optimise the architecture purely for social networking (although it's ironic that ATproto has ended up pretty generic too, similar to Matrix), b) wanting to be able to control the strategy and not have to follow Matrix's open governance model, c) wanting to create something new :)
From the Matrix side; we keep in touch with the bluesky team and wish them the best, and it's super depressing to see folks from ActivityPub and Nostr throwing their toys in this manner. It reminds me of the unpleasant behaviour we see from certain XMPP folks who resent the existence of Matrix (e.g. https://news.ycombinator.com/item?id=35874291). The reality is that the 'enemy' here, if anyone, are the centralised communication/social platforms - not other decentralisation projects. And even the centralised platforms have the option of seeing the light and becoming decentralised one day if we play our parts well.
What would be really cool, from my perspective, would be if Matrix ended up being able to help out with the private communication use cases for AT proto - as we obviously have a tonne of prior art now for efficient & audited E2EE private comms and decentralised access control. Moreover, I /think/ the lexicon approach in AT proto could let Matrix itself be expressed as an AT proto lexicon - providing interop with existing Matrix rooms (at least semantically), and supporting existing Matrix clients/SDKs, while using AT proto's ID model and storing data in PDSes etc. Coincidentally, this matches work we've been doing on the Matrix side as part of the MIMI IETF working group to figure out how to layer Matrix on top of other existing protocols: e.g. https://datatracker.ietf.org/doc/draft-ralston-mimi-matrix-t... and https://datatracker.ietf.org/doc/draft-ralston-mimi-matrix-m... - and if I had infinite time right now I'd certainly be trying to map Matrix's CS & SS APIs onto an AT proto lexicon to see what it looks like.
TL;DR: I think AT proto is cool, and I wish that open projects saw each other as fellow travellers rather than competitors.
> Both define strict data schemas for extensible sets of events (Matrix uses JSON schema
Matrix uses JSONSchema to define event schemas, but how can they be considered strict if the Matrix spec doesn't specify that any of them have to be validated apart from the PDU fields and a sprinkling of authorization events?
> Matrix has an open governance model with >50% of spec proposals coming from the wider community these days.
Do you have a percentage for the proportion of spec proposals from the wider community making it into spec releases?
This is a great comment, and I think one of the top comments on this post in that you actually know what you’re talking about. I found the breakdown comparing the protocols’ similarities and differences really enlightening.
It’s refreshing to see this perspective, and I appreciate trying to quash the us-vs-them thing, so thanks.
Ah, this is a wonderful analysis! Thank you, I've favorited it. :)
I'd love to get the decentralized protocols to work together. I work on braid.org, where we want to find standards for decentralized state sync, and would love to help facilitate a group dialogue. Hopefully I can connect with you more in the future.
have been tracking braid since the outset - providing state sync at the HTTP layer itself is a really interesting idea. would be great to figure out how to interop the various DAG based sync systems, or at least map between them. very happy to chat about this!
I'm still positive about Bluesky, but the company hasn't yet proven itself like many non-profits. You could clear a lot of misconceptions by having multiple demonstrations of the protocol:
If I don't want to use BGS, how can I access the PDSs of the people I follow, and get e.g. 10 latest entries for each:
Starting from a handle, let's say @pfrazee.com, how do I fetch using CURL your content without the BGS?
Of course, there are also other issues regarding monetization, and if BGS becomes the de facto way to get content, then some PDSs might become locked to your official BGS.
I hope you are willing to contribute with known non-profits such as Mozilla to do a wider consortium of players for BGS space, which is mostly inaccessible for self-hosters.
I would definitely be curious for elaboration on what requirements the project had that weren't met by OpenAPI or gRPC or Cap'n Proto or Apache Thrift or any of the other existing things that solve this general category of problem.
> We sign the records so that authenticity can be determined without polling the home server, and we use a repository structure rather than signing individual records so that we can establish whether a record has been deleted (signature revocation).
Why do you need an entirely separate protocol to do this? Email had this exact same problem, yet was able to build protocols on top of it in order to fix the authenticity problem. This is the issue: instead of using ActivityPub, which is simpler to implement, more generic, and significantly easier for developers to understand, you invented an overly-complex alternative that doesn't work with the rest of the federated Internet.
> The schema is a well-defined machine language which translates to static types and runtime validation through code generation. It helps us maintain correctness when coordinating across multiple servers that span orgs, and any protocol that doesn't have one is informally speccing its logic across multiple codebases and non-machine-readable specs.
OpenAPI specs already exist and do the same job. They support much more tooling and are much easier for developers to understand. There is objectively no reason why you could not have used them, you are literally just making GET and POST requests with XRPC. If you really wanted to you could've used GraphQL.
There are plenty of protocols which do not include machine-readable specs (including TCP, IP, and HTTP) that are incredibly reliable and work just fine. If you make the protocol simple to understand and easy to implement, you really don't need this (watch Simple Made Easy by Rich Hickey).
> The DID system uses the recovery key to move from one server to another without coordinating with the server (ie because it suddenly disappeared).
Why is this necessary? The likelihood of a server just randomly disappearing is incredibly low. There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue. You're storing all of a user's post history on their own device in the case of an immediate outage. That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider. That is an extremely high cost (I have 55k tweets, that would be a nightmare to host locally) for an outcome that is very unlikely.
> It supports key rotations and it enables very low friction moves between servers without any loss of past activity or data.
This forces community servers to store even more data, data that may not even be relevant or useful. Folks might have gigabytes of attachments and hundreds of thousands of tweets. That is not a fast or easy thing to import if you're hosting a community server. This stacks the decks against community servers.
Most people want some of their content archived, not all, and there is no reason why archival can be separate from where content is posted. Those can be two separate problems.
> That design is why we felt comfortable just defaulting to our hosting service; because we made it easy to switch off after the fact if/when you learn there's a better option. Given that the number one gripe about activitypub's onboarding is server selection, I think we made the right call.
Mastodon is able to do this on top of ActivityPub. Pleroma works with it. Akkoma works with it. There's already a standard for this. Why are you inventing an unnecessary one?
Mastodon also changed their app to use Mastodon.Social as the default server, so this is a non-issue.
I think it’s important to say this: I think asking questions is great, and I’m glad that we’re not just taking statements at face value because making social suck less is a worthy goal.
However, you are coming across as highly adversarial here. Mostly because you immediately follow your questions with assertions, indicating that your questions may be rhetorical rather than genuine.
I’m not accusing you of anything per say but I very much want a dialog to happen in this space and I think your framing is damaging the chances of that happening.
Whether on Twitter or Mastodon, people deep into that type of social network love TO SHOUT LIKE THIS to get likes or boosts.
It is why passersby like me can't get into either Twitter or Mastodon when it is a culture of getting outraged and shouting at each other, to collect a choir of people nodding and agreeing in the replies: "well done for saying it like it is."
These people forgot how humans talk and have arguments outside of their Internet echo chambers.
Anger, insults and hate sells more (creates more engagement) than reasoned arguments. No one would have posted this on HN if is was otherwise. So don't worry, they are not hurting their chances, the next topic that can be summarised with an angry title like "xxx is the most obtuse crock of shit" will get great traction on HN.
"Also I don't care if I'm spreading FUD or if I'm wrong on some of this stuff. I spent an insane amount of time reading the docs and looking at implementation code, moreso than most other people. If I'm getting anything wrong, it's the fault of the Bluesky authors for not having an understandable protocol and for not bothering to document it correctly."
> in some circles still primarily means cryptography.
It still does in my circle. The overloading of "crypto", though, has become such a source of confusion and misunderstanding that I have stopped using it and just use the full word, be it cryptography or cryptocurrency, instead.
I don't think it's "not in good faith" to say "I made a real substantial effort to understand this, and am trying to describe it accurately; if at this point my descriptions don't match the reality, it's not my fault but that of the people who made it impossible to understand".
(Of course it's perfectly possible, for all I know, that SW is not debating in good faith. But what you quote doesn't look to me like an admission of bad faith.)
Well, I thought I already described what seemed to me to be a charitable and reasonable take on it.
"I put as much effort in as can reasonably be expected; I tried to evaluate it fairly; but the documentation and supporting code is so bad that I may have made mistakes. If so, blame them for making it impossible to evaluate fairly, not me for falling over their tripwires."
If something is badly documented and badly implemented, then I think it's OK to say "I think this is badly designed" even if you found it incomprehensible enough that you aren't completely certain that some of what looks like bad design is actually bad explanation.
If some of the faults you think you see are in fact "only" bad documentation, then in some sense you're "spreading FUD". But after putting in a certain amount of effort, I think it's reasonable to say: I've tried to understand it, I've done my best, and they've made that unreasonably difficult; any mistakes in my account of what they did are their fault, not mine.
(I should reiterate that I haven't myself looked at the AT protocol or Bluesky's code or anything, and I don't know how much effort SW actually put in or how skilled SW actually is. It is consistent with what I know for SW to be just maliciously or incompetently spreading FUD, and I am not saying that that would be OK. Only that what SW is admitting to -- making a reasonable best effort, and possibly getting things wrong because the protocol is badly documented -- is not a bad thing even when described with the words "I don't care if I'm spreading FUD".)
I agree, thank you for stating that in a respectful way.
The linked article/toot and certain replies seriously makes me want to just shutdown my personal mastodon server and move on from the technology altogether.
> The likelihood of a server just randomly disappearing is incredibly low. There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue.
Another example: Mastodon.lol, which had 12,000 users literally shutdown a few hours ago. They did manage to give notice but the point remains that people had to move instances, cannot take their posts with them, and it’s a giant PITA, server covenant or not.
To call this stuff a “non-issue” seems incredibly obtuse, especially when the data portability piece is clearly an after thought by the Mastodon devs, and something that ActivityPub would need some major changes to get accomplished. Changes that the project leads have been fairly against implementing.
Y’all should see the dead letters in the publish queues from dead indie servers of which thousands have gone offline but whose addresses will get looked up forevermore
> Email had this exact same problem, yet was able to build protocols on top of it in order to fix the authenticity problem.
On the contrary, email has no solution to the authenticity problem that’s being talked about. Even what there is is a right mess and not even slightly how you would choose to build such a thing deliberately.
If you want to verify authenticity via SPF/DKIM/DMARC, you have to query DNS on the sender’s domain name. This works to verify at the time you receive the email, but doesn’t work persistently: in the future those records may have changed (and regular DKIM key rotation is even strongly encouraged and widely practised).
What you are replying to says that AT wants to be able to determine authenticity without polling the home server, and establish whether a record has been deleted. Email has nothing like either of those features.
I think they're talking about GPG, not SPF/DKIM/DMARC.
Which is a risky thing to do, because most people don't associate GPG with positive feelings about well designed solutions, but they're right in that it works well, solves the problem and is built squarely on top of email.
The reason that it's not generally well received is that there's no good social network for distributing the keys, and no popular clients integrate it transparently.
In this case GPG, DKIM and even S/MIME are on equal standing. Validity can be checked only on reception because there's no validity stapling mechanisms.
I’m curious about this. So email that I’ve sent, let’s say from a gmail account to an iCloud account, isn’t guaranteed to be verifiable years later because of dkim key rotation?
That’s not great. I wonder if the receiver could append a signed message upon receipt with something like “the sender’s identity was valid upon receipt”.
The receiver absolutely does that with the Authentication-Results header, but can you trust its integrity in your mailbox, your email provider and all your email clients (to not modify it)? It's indeed not great for non-repudiation.
> I wonder if the receiver could append a signed message upon receipt with something like “the sender’s identity was valid upon receipt”.
That's exactly what does happen, if you view the raw message in GMail/iCloud, you should see DMARC pass/fail header added by the receiving server (iCloud in your example).
(Well not exactly, it's not signed, but I'm not sure that's necessary? Headers are applied in order, like a wrapper on all the content underneath/already present, so you know in this case it was added by iCloud not GMail, because it's coming after (above) 'message received at x from y' etc.)
Thanks for the response. Do you know if this extra “dkim sig was verified header” is part of a protocol or is it just something that is done bc otherwise bad stuff happens?
I’m also curious how this plays into the original comment about dkim/spf/dmarc not being sufficient due to key rotation still factors into the conversation after having discussed this?
I'm not sure off the top of my head, I'd guess it's a MAY or SHOULD. Verifying DKIM/SPF/DMARC is optional anyway, if you want to just read everything without caring you can; you've received the message by that point, I can't see what bad stuff would happen if it wasn't added.
Key rotation would have the same effect as 'DNS rotation' (if you stopped leasing the domain, or changed records) - you might get a different result if you attempted to re-verify later.
I just don't really see it as a problem, you check when you receive the message; why would you check again later? (And generally you 'can't', not as a layman user of GMail or whatever - it's not checked in the client, but the actual receiving server. Once it's received, it delivers the message, doesn't even have it to recheck any more. Perhaps a clearer example: if you use AWS SES to receive, ultimately to an S3 bucket or whatever for your client or application, SES does this check, and then you just have an eml file in S3, there's no 'hey SES take this message back and run your DKIM & virus scan on it again'.)
It's just for humans, it's not usually used for anything else. For machines we have ARC (Authenticated Received Chain) which basically contains almost the same info but signed across the entire chain.
The notion that server disappearance is a non-issue is quite misleading. Servers go offline for various reasons, such as technical difficulties, financial constraints, or legal issues. Recovering and transferring data without relying on the original server is essential for users to maintain control over their data and identities. DIDs and recovery keys provide a valuable solution to this problem, ensuring user autonomy.
Your reply fails to address that push-based systems are prone to overwhelming home servers due to burst loads when content becomes viral. By implementing pull-based federation, the AT Protocol allows for a more balanced and efficient distribution of resources, making self-hosting more affordable and sustainable in the long run.
> The likelihood of a server just randomly disappearing is incredibly low.
Everything else aside, this is completely untrue.
I self-hosted my first fediverse account on Mastodon and got fed up with the complexity of it for a single person instance and shut it off one day (2018 or so?).
On another account at some point 50% of my followed people vanished because 2 servers where everyone in that bubble were on just went offline. Took a while to recreate the list manually.
This may be anecdotal but I've seen it happen very often. Also people on small instances blocking mastodon.social for its mod policies comes close to this experience.
Alternatively: the likelihood of any one server going away tomorrow is small, but the likelihood of something in your social graph going away tomorrow is high.
> I have 55k tweets, that would be a nightmare to host locally)
theyre tweets, how much could they cost? @ 280 bytes each, that's like 15MB. double it for cryptographic signatures and reply-to metadata. is that really too much to ask for the capacity to transfer to another host at anytime?
(also, leaving aside the fact that 55k tweets puts you in the 0.1% of most prodigious users)
I have every post made on BlueSky up to a certain point last weekend and it's only 3 GB.
I have every email I've ever received or sent (and not deleted) and it's only 4GB.
Should something require I download all that every time I login? No. But having a local copy is amazing, and a truly federated system should have and even be able to depend on those.
The Mastodon Server Covenant is a joke; the only enforcement is to remove the server from the list of signup servers; which if it just fell over dead because the admin died/doesn't care/got arrested/got a job will not matter.
How did we get to 55k tweets being a nightmare for any social media platform?
A quick search got me to twitter stats from 2013 when people were posting 200 billion tweets per year. Thats 5-6 orders of magnitude more. You don't get a 10000x improvement just by federating and hosting multiple nodes.
The discussion here was about archiving each user's tweets on their own client device - this is where the 55k was brought up as a problem. I still think it's a low number, even if it includes plenty of images.
> double it for cryptographic signatures and reply-to metadata
Ah, email, where a message of 114 characters with no formatting ends up over 9KB due to authentication and signatures, spam analysis stuff, delivery path information and other metadata. Sigh. Although I doubt this will end up as large as email, the lesson is that metadata can end up surprisingly large.
In this instance, I think 1–2KB is probably more realistic than the half kilobyte of “double it”.
Sure, they could. Most people don't post tons of hi res photos. But I'm sure there are ways you could optimize to not have all the content on local device, if it's such a big deal. But this is a really strange point to me to be hung up on.
"The likelihood of a server just randomly disappearing is incredibly low."
No. Just no.
If (IF!) some distributed social network breaks through and hundreds of millions or billions of people are participating, they are going to do things that The Powers That Be don't like. For better or worse, when that happens they will target servers, and servers WILL just disappear. Domains will disappear. Hosting providers will disappear. You can take that straight to the bank and cash it.
Uncoordinated moves are table stakes for a real distributed social network at scale. The fact AT Protocol provides this affordance on day one is a great credit.
> That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider. That is an extremely high cost (I have 55k tweets, that would be a nightmare to host locally) for an outcome that is very unlikely.
If your identity is separate from your Gmail account (as it can be with a custom domain, for email and for bluesky), this seems like a very plausible and desirable thing to be able to do. Just recently there was an article about how Gmail is increasing the number of ads in the inbox; for some people that might change the equation of whether Gmail's UX is better than it is bad. If packing up and leaving is low-friction enough, people might do it (and that would also put downward pressure on the provider to not make the experience suck over time)
And that's not even getting into things like censorship, getting auto-banned because you tripped some alarm, hosts deciding they no longer want to host (which has happened to some Mastodon instances), etc.
> The likelihood of a server just randomly disappearing is incredibly low.
It happens all the time. mastodon.social, the oldest and biggest Mastodon instance, has filled up with cached ghost profiles of users on dead instances. Last I checked, I could still find my old server in there, which hasn't existed for several years.
Email has only solved the "authenticity problem" by centralizing to a tiny number of megaproviders with privileged trusted relationships. Forestalling that sort of "solution" seems to me one of the Blueksy team's design goals.
Servers go down or get flaky all the time for various reasons. Easy relocation (with no loss of content & relationships) and signed content (that remains readable/verifiable even through server bounciness) soften the frustrations.
55k tweets is little challenge to replicate, just like 50k signatures is little challenge to verify, here in the 2020s.
If Mastodon does everything better with a head start, it should have no problem continuing to serve its users, and new ones.
Alas, even just the Mastodon et al community emphasis on extreme limits on visibility & distribution – by personal preferences, by idiosyncratic server-to-server discourse standards, by sysop grudges, whatever – suppress a lot of the 'sizzle' that initially brought people to Twitter.
Bluesky having an even slightly greater tilt towards wider distribution, easier search, and relationships that can outlive server drama may attract some users who'd never be satisfied by Mastodon's twisty little warrens & handcrafted patterns-of-trust.
There's room for multiple approaches, different strokes for different folks.
> Why do you need an entirely separate protocol to do this? Email had this exact same problem, yet was able to build protocols on top of it in order to fix the authenticity problem.
But if we started today, we wouldn't build email that way. There are so many baked-in well-intended fuckups in email that reflect a simpler time where the first spam message was met with "wtf is this, go away!" I remember pranking a teacher with a "From: president@whitehouse.gov" spoofed header in the 90s.
Email is the way it is because it can't be changed, not because it shouldn't be.
I'm sorry but this is ridiculous. Just because a protocol exists doesn't mean that if someone doesn't build on top of it, you can describe it as a crock of shit.
> > The DID system uses the recovery key to move from one server to another without coordinating with the server (ie because it suddenly disappeared).
> Why is this necessary? The likelihood of a server just randomly disappearing is incredibly low.
The likelihood of a server just randomly disappearing at any point in time is low. The likelihood of said server disappearing altogether, based on the 20+ years of the internet, can & will approach 100% as the decades go on. Most of the websites I know in the early 2000s are defunct now. Heck, I have a few webcomic sites from the 2010s in my bookmarks that are nxdomain'd.
Also, as noted by lapcat, these sudden server disappearances will happen. Marking this problem as a non-issue is not, in any realm of possibility, a good UX decision.
This is coupled with the fact that Mastodon (& ActivityPub in general) don't have to do anything when it comes to user migration: The current system in place on Mastodon is completely optional, wherein servers can simply choose to not allow users to migrate.
> There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue.
*The Covenant is not enforced in code by Mastodon's system, nor by AcitivtyPub's protocol.* It's heavily reliant on good faith & manual human review, with no system-inherent capabilities to check if the server actually allows user data to be exported.
> You're storing all of a user's post history on their own device in the case of an immediate outage. That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider. That is an extremely high cost (I have 55k tweets, that would be a nightmare to host locally) for an outcome that is very unlikely.
An outcome *that can still happen*. As noted by the incidents linked above, they're happening within the Mastodon platform itself, with many users from those incidents being unable to fully recover their own user data. Assuming that this isn't needed at all is the equivalent of playing with lightning.
The recovery key bit is the one part I actually like.
But improving on the ActivityPub user migration store is also a minor/trivial change away from doing much better than today: you just need to change ActivityPub Ids to either fully a contentadressable hash or referencing a base that is under user control, plus a revocation key style mechanism for letting the user sign claims about their identity in order to allow unilateral moves.
> You're storing all of a user's post history on their own device in the case of an immediate outage. That's equivalent to Gmail storing all of your emails on your device in case you want to immediately pack up and move to another email provider.
In the world I currently live in, all my emails are stored locally on my devices. Also, text files take up little to no storage, so why does it matter?
> Why is this necessary? The likelihood of a server just randomly disappearing is incredibly low. There are community standards and things like the Mastodon Server Covenant that make this essentially a non-issue.
I literally read about a case over a month ago where some obscure Mastodon-server admin blocked someone's account on their server so it was impossible to move to another instance. The motivation was "I don't want capitalist here, can change my mind for money" (slightly paraphrasing). Basically, it's stupid to use any Mastodon instance other than the few largest one or your own.
That's why BlueSky's approach makes sense.
>with the rest of the federated Internet.
You're saying like it's a thing that won and not a niche project for <10M users globally.
Some of this seems very familiar, in a good way, which makes me very interested in Bluesky and the AT protocol. I worked with XRIs and was on the XDI TC. I also was on the fringes of some of the early DID spec work, and experimented with Telehash for while.
I know Jeremie Miller is on the board, is Telehash or something similar being used within Bluesky?
Also, I'm sure you get this a lot, but I'd love a BlueSky invite please.
IMHO, any protocol that isn't signing content (like mastodon) is merely moving the problem. Signatures allow people to authenticate content and sources and allow people to build networks of trust on top of that. So bluesky is getting that right. Unsigned content should not be acceptable in this century.
Signed content immediately solves two issues:
- reputation: reputation is based on a history of content that is liked and appreciated by trusted sources that is associated with an identity and the associated set of public keys. You can know 100% for sure whether content is reputable or not. Either it is signed by some identity with a known reputation or it is not.
- AI/bot content could sign content with some key of course but it would be hard to fake reputation. Not impossible of course but you'd have to work at it for some time to build up the reputation. And people can moderate content and destroy the reputation of the identity and keys.
The whole problem with essentially all social media networks so far is a complete and utter lack of trustworthiness. You could be reading content by an AI, a seemingly bonafide comment from somebody you trust might actually come from some Chinese, Russian or North Korean troll farm, or you are just wading through mountains of click bait posted by "viral" marketing companies, scammers, or similarly obnoxious/malicious publishers. Twitter's blue tick is laughably inadequate for dealing with this problem. And people can yell whatever without having to worry about their reputation; which causes them to behave in all sorts of nasty ways.
Signed content addresses a lot of that. You can still choose to be nasty, obnoxious, misleading, malicious, etc. but not without staking and risking your reputation.
Mastodon not having a lot of these issues (yet) is more a function of its relative obscurity rather than any built in features. I like Mastodon mainly because it still feels a bit like Twitter before that got popular. But it's not sustainable. If a few hundred million people join, it will get just as bad as other networks. In other words, I don't see how this could last unless they address this. I don't think that this should be technically hard. You need some kind of key management and a few minor extensions to the protocol. The rest can be done client side.
That would be more productive than this rant against the AT protocol.
How much of this is actually implemented? Correct me if I'm wrong but bluesky doesn't actually implement federation yet?
As someone who wants to like bluesky, I feel like a lot of the scepticism comes from the seeming prioritisation of a single centralised server, over federation, for what is "sold" as a decentralised system.
my advice is dont stress, whatever you do, some people will not like it
for some reason people think what we have now is good, and they say "dont reinvent the wheel" but there is no wheel, what we have is just garbage, 50 years and later we still cant beat the "unix pipe"
I want to say that "if there was something off the shelf that met all our needs we would've used it" has been the justification for many over-engineered, not-invented-here projects. On the other hand, in many cases it is completely legitimate.
Not sure if you'll see this, but I work for an internal tooling startup and I'm trying to mess around with your API. Our tool has support for REST and python/js support with a limited set of libraries. I've been trying to figure out how to connect to the API via CURL so I can write some blogposts about building with y'all and I've been struggling.
I can get com.atproto.server.createSession to return me a token, but then when I try to transition into another endpoint (such as app.bsky.richtext.facet) I only get 404s. Is their any examples of using y'all with rest? Happy to take my question elsewhere if y'all have any places for folks to ask questions?
It is literally the first result for "atproto", "at proto", "atprotocol" and "at protocol" on Google. How much more Google-able would you like it to be?
Not to derail this thread, but is there any way of seeing whether I'm actually on the waitlist? I remember signing up late last year, but there was never any confirmation and I never got the survey which I've heard people talking about. I'm super interested to try out Bluesky, but haven't been able to find anyone with an invite.
Also would like to know this, I haven’t received any confirmation email or survey either, and each time I “join the waitlist” it just gives me a success message as if I’m not already on the list.
I would like to know this as well. I never got a confirmation and I could have swore I signed up months ago when it first opened. I signed up again a few weeks ago and didn't receive a confirmation then either.
I suspect it's due to me using an uncommon tld for my primary email.
Only 65K users? Why does NOSTR have 10 times[1] as many users already? I guess you choose to close the doors and only let 65K users come in as Beta testers?
Because people are actually using this as a community now and there's a desire to not just open the floodgates before all the moderation pieces etc are in place. I joined when it was roughly 15-20k users and each wave of invites brought distinct subgroups. Then, the original short invite code tokens began getting brute forced and there was a large wave that completely disrupted the balance of things.
There's the protocol, but then there's also the community – and communities are ecosystems not just technical problems.
It's hard to remember sometimes every project is building both technology and community.
Technical merits of the different approaches aside... this is a calm, kind and well reasoned response to a rather, uh, emotional critique. Thanks for bringing the discourse back to where it should be.
> "Imagine if I had to store the 50k+ tweets I've made on Twitter on my device, and upload ALL of them to a new server whenever a community server went down."
This... doesn't seem too bad at all?
Let's assume an average of 150 bytes of text per tweet, and an additional 50 bytes of actually important metadata. That's only 10 megabytes for the entire archive of 50,000 items. A single HDR photo from a modern smartphone is larger.
IMO this was the original point of Twitter: the extreme limitation on post size (140 characters) made it feasible to build applications that work with large amounts of content. There was a time around 2010 when building a Twitter API client was the standard demo for a new UI framework — the data model was simple enough that a basic client was the next step up from "Hello world" complexity.
This was actually a nice vision for a social network! Lots of simple near-realtime data, a jungle of interesting clients to make sense of it. I wish someone was building a Twitter competitor with this kind of minimalist approach.
> There was a time around 2010 when building a Twitter API client was the standard demo for a new UI framework — the data model was simple enough that a basic client was the next step up from "Hello world" complexity.
It's fascinating the Twitter we have today and the Twitter from that era are even considered the same app. I wrote a simple Twitter client in Cocoa and Python and even NodeJS. It was such a simple way to get a working, functional, usable demo going that you could immediately show to friends.
Then they realized it's impossible to make money advertising this way and shut it down.
In this hypothetical minimalist Twitter clone in the 2009 spirit, images are elsewhere. We have a widely used hypertext protocol that lets you reference media objects from anywhere, so let’s use that. Client apps can individually solve the image upload usability question in ways that fit their user base.
Yes, there will be broken links. But IMO that’s better than having all your data in one centralized location where it can eventually be taken over by private equity looking to make a buck, or a billionaire who wants to be a media mogul.
Edit: Looking back at historical Twitter news, it’s clear that in 2010 their vision was still to have images embedded from other sites. Here’s a relevant TechCrunch article:
Note how they added display support for embedded Flickr photo set links. Your pictures could be on the dedicated photo service and you’d still get the nice set browsing UI inside Twitter. They should have stuck with this and built a protocol that lets sites interoperate on things like “here’s a set of photos to be embedded” (so you could use something else than Flickr). But instead they wanted to chase the wannabe-Facebook dream of sucking everything into their servers and using a closed client to sell inline ads.
And today both of those sites are purging historical user uploads, which is completely their right, but definitely doesn't "solve the problem" of keeping control over your social media history/legacy/output.
One thing I do like about the decentralized model is the ability to modularize this.
I would love a Bluesky/ATProto/whatever client that allowed me to specify some compatible image service where images I post get seamlessly uploaded and linked.
Then I can choose what I use, maybe it's something I pay for and have better guarantees around durability and availability.
Or maybe it's a free or ad-supported service where my image gets deleted after 6 months and I don't care that's fine with me.
I just transfer old photos to my new phones so this doesn't really change anything for me. Considering that I have posted different devices though (PC, Laptop, phone) and only copied some images or deleted them after posting though you are probably still correct that I don't have all of them stored on my phone.
The storage on your phone would probably still be enough for nearly everyone and for the rest doing the account moving process on a PC or through cloud storage would probably be ok.
> I was like "I'll just make a simple alternative to the BlueSky server in Elixir". But it CAN'T be a simple implementation like ActivityPub can be, because it is extraordinarily complex and requires you to make guarantees about your storage and how your application works.
You might suspect that OP has a familiarity bias here but actually there is objective evidence that ActivityPub based implementations are (relatively) simple: there are dozens of implementations of both servers and clients, will all sorts of functionality that is not emulating the "twitter/mastodon" experience. Heck, even a Wordpress plugin in the works.
How well all these things will scale etc. is still somewhat of a question mark but this aspect feels important beyond specific details and choices. The simpler, more generic an approach, the more likely it is to find fertile ground and grow as a decentralized architecture. The original decentralized Web was simple and generic and this has been touted as key factor for its explosive adoption.
In a sense this was also its ultimate downfall: it did not provide (out of the box) the tools to build the connectivity / social graph experience that was so enormously desirable (and was not provided e.g., by RSS). ActivityPub is somehow making up for that original gap. There might be other ways to do this. But unless there is a bigger agenda (commercialization, financialization), gratuitous complexity driven by not-invented-here or desire for centralized control is more likely to hinder than help.
> You might suspect that OP has a familiarity bias here but actually there is objective evidence that ActivityPub based implementations are (relatively) simple: there are dozens of implementations of both servers and clients, will all sorts of functionality that is not emulating the "twitter/mastodon" experience. Heck, even a Wordpress plugin in the works.
there's AP and there's AP - it's a bit underspecified. Mastodon added a lot of stuff to make it much more usable as social media, and other implementations are increasingly converging on the Mastodon API, which isn't super simple, but also lets you talk to the Mastodon API.
It seems to me that a lot of the problems discussed in this thread (it's using a new protocol that doesn't work with existing tools, it uses crypto, handles auth in the protocol, server can goes down) are just frustrations that don't have to do with the core innovation that Bluesky promises to deliver, and are instead confusing the AT protocol to be another ActivityPub-related protocol, rather than something completely different. In fact, it misses that having pull-based indexes is part of the idea of separating data publication vs. data curation. Yes, invite codes are kind of a weird thing to shove so deep into the protocol but sometimes these kinds of things are necessary to bootstrap a social network, and become relatively unimportant in the long term.
But on the other hand, the AT protocol is not all what it seems either. It's fundamentally not possible to do what they promise unless there is some kind of synchronization point where your most up-to-date identity can be searched. The documentation goes on extensively about the format of DIDs and the fact that there _is_ a resolution scheme, but it fails to mention that this resolution is being done by Bluesky themselves, and is not planned to expand into something that others can control. As a decentralized protocol, you would expect this to be something DNS-based, or if you really wanted something more peer-to-peer, DHT lookup, or at the worst case, a distributed blockchain. Seeing as this is a fundamental protocol-level change, the fact that they're rolling with the current approach makes me believe they will not be changing this in the near future. Bluesky is just converting one problem into another.
The DID method has to satisfy a lot of requirements. I did a ton of research on DHT and blockchain approaches and none of them give the right performance, reliability, and cost outcomes while also supporting key rotation. DNS isn't far off but it's a little obtuse to cram it into, so we're starting with did:web and did:plc. We'll add others if they have the right characteristics.
The only options that seem reasonable for PLC is for it either to be operated by some trusted multi-stakeholder institution like ICANN or to be a closed-group blockchain. I'm open to either approach but they require the creation of a consortium, which requires buyin from a set of stakeholders that don't exist yet. So until that happens, we run it, and that's why we called it Placeholder. We'll backronym it into something else once we get there (the C most likely being Consortium).
What are your thoughts on KERI? Some of Phillip's slide decks are weird to go through, but I really like its security & usability goals (key rotation needs to be realistic!) and the fact that it's going through the IETF RFC process. (Also, it avoids cryptocurrency-related blockchain baggage. It's disappointing how much blockchains have infested decentralized identity efforts.)
Since you're familiar with this, what was wrong with how Farcaster approached name registration? Signing up or rotating your key is (relatively) cheap on Ethereum mainnet, and client apps could front the cost of signing users up. The user registry doesn't need to be maintained by a closed group in the long run (could start out that way, with Bluesky maintaining the contracts but eventually removing upgradeability when out of beta).
> are just frustrations that don't have to do with the core innovation that Bluesky promises to deliver, and are instead confusing the AT protocol to be another ActivityPub-related protocol, rather than something completely different.
AtProto is designed to be a federated protocol. The issue I have is that it is not interoperable with the major standard used on the federated internet right now: ActivityPub. You can built protocols on top of each other. Instead of doing that, Bluesky built a confusing alternative that is difficult to implement and difficult to work with.
> In fact, it misses that having pull-based indexes is part of the idea of separating data publication vs. data curation.
Mastodon does this already. There is literally an 'explore' part of the app that is completely separate from the feed and does not rely on ActivityPub.
You cannot 'publish' content without having some sort of a federation protocol to go overtop of it, to network things together and send your content to the people who follow you. There is nothing about ActivityPub that makes a single global view impossible or hard to implement.
As far as I can tell ActivityPub is (at least part of) the problem. If it’s not, then Mastodon is simply not trying to be a “Twitter replacement” or a useful global social network at all.
The only worse idea than Bluesky using ActivityPub would be to build something new making all the same design decisions.
Deciding “ActivityPub is the standard” (seriously?!) and demanding we give up already is the opposite of what we need.
I don’t know if AT is the best long term solution, but — and I’ve tried it multiple times — ActivityPub/Mastodon sure as hell isn’t. There’s no value in being interoperable with it that I can see, beyond the potential short term boost to vanity metrics on user numbers.
I absolutely want my identity to use public key cryptography, and I absolutely want to store all my emails, tweets (or whatever), and DMs locally first. I don’t like how accounts and servers work in Mastodon and the federation and global discovery approach sucks too.
The best thing we can do now is keep an open mind and try as many approaches as possible, because if something better than ActivityPub doesn’t come along society is stuck with centralised social media forever.
Fediverse posts are signed with public key crypto (e.g. I'm looking at signatures for my post literally in the window next to my browser right now). It could use a multi-sig model so you'd be able to unilaterally prove claims about your posts, but that's not an ActivityPub limitation or issue. If you want to store it locally first, the only thing stopping you currently is that current servers are clunky, not the protocol. You don't give much detail of the specifics of why you think the protocol is the problem, so I can't address much more than that.
ActivityPub is very generic; it can accommodate all kinds of changes. E.g. if you want to introduce activity / object types that have improvements over what Mastodon supports, you can do so. If you want to introduce vocabulary within existing object types that Mastodon wouldn't understand, you can do so without breaking federation with Mastodon or other Fediverse servers. I'd be a lot more sympathetic to anyone who decided to extend ActivityPub.
The minimum viable subset of ActivityPub is largely: Provide an endpoint that returns an Actor with a list of the required endpoints (inbox, outbox, follower, following etc.) - the spec requires a list of them, but you don't even need all of them for basic interop - and handle POST's to your announced inbox, and GET requests to the outbox. Add unique URL's as "id" fields in the JSON, and support GET to them. Follow the format of the JSON to provide at least the minimum set of fields to address the activity, provide a type, and provide the minimum fields for the given type.
For interop w/Mastodon you'd want to support Webfinger to find the Actor. Nothing stops you from also supporting other mechanisms, like BlueSky's domain validation.
Nothing stops you from supporting additional federation mechanisms. Nothing stops you from providing additional fields. Nothing stops you from storing data locally. Nothing stops you from adding additional ways of signing claims about individual objects or a whole repository of objects. Nothing stops you from providing additional mechanisms for distributed lookup of objects by id. Many of those things would be welcome if people wanted to do it on top of ActivityPub.
Sure nothing stops you doing those things, but also nothing makes them seem like a good idea.
Why bother with Mastodon interop? Why bother building on top of a standard that doesn’t do what you want if you don’t think being part of the “Fediverse” is particularly interesting or a goal of the platform you’re building?
Something new is a better bet at this point than being anchored to or seeming like part of Mastodon IMO.
> Why bother with Mastodon interop? Why bother building on top of a standard that doesn’t do what you want if you don’t think being part of the “Fediverse” is particularly interesting or a goal of the platform you’re building?
Because they pretend to want to be open, and it's sending a very clear signal that is not their goal if they're not even trying to work with the existing ecosystem.
If they just want to be a silo, that's fine. But in that case be honest about it.
Er why are you conflating “being open” with “being part of the Fediverse”? The Fediverse has no monopoly on that notion, and in fact shaming folks for not wanting to integrate with it is the opposite of open. It’s like saying “BSDs pretend to be open, but they’re not even trying to be compatible with Linux”.
Making a conscious choice about differences is fine. The BSDs and Linux share many things, and not other, on the basis of different goals. They're not different for the sake of being different. The vibe I'm getting from the way AT is different is that they're different for the sake of being different.
The “Fediverse” is not an existing ecosystem in any way that replaces Twitter or builds an interesting social network, from my perspective at least.
No matter how open I wanted to be, if I was setting out to build a social network, I’d please precisely zero value on being connected to that “existing network”.
It is irrelevant to me, despite what a vocal minority might want me to believe.
“We have a thing that is arcane and hasn’t taken off with the public so don’t make a new one” isn’t exactly a stellar argument.
The only winning move is to onboard users and given hundreds of millions of people may be looking to leave Twitter, Mastodon utterly shit the bed and missed the moment.
That’s the end of the story: if the system can’t catch users raining down from the sky during a generational upheaval in social media it ain’t never gonna make it.
I wondered if Twitter would try extending its protocol so it integrated into the Fediverse, and just become the best Mastodon server ever that everyone ended up congregating on.
My experience with mastodon is that both explore and the feed are miles behind the experience I have on twitter to discover content I care about. No offense but everybody that switched from twitter to mastodon that I know went back. In the end I dont care what protocol you use but mastodon experience really isn't great IMO so I tend to be willing to give Bluesky a shot before I judge.
Creating a restriction where every new social media protocol has to be built on top of ActivityPub sounds awful. There is room for innovation outside of this narrow scope of ideas.
> the major standard used on the federated internet right now: ActivityPub
This might be a true statement for microblogging, but the major-est "standard used on the federated internet right now" is surely SMTP+IMAP, and I wouldn't be surprised if RSS/Atom were in the #2 spot even though "nobody" is using it anymore (esp. considering "nobody" includes most podcasts).
this seeming belief that activitypub - or rather, mastodon - is be-all end-all and the only thing that anybody should be concerned with and cater to, is kinda off-putting
personally I hope it doesn't actually become "the one thing" with how opinionated and callous it is about some features
Came here to see if anyone else was thinking of Hayes commands! Agree, those suck too. Might be worth it to keep this thread in mind next time I have to deal with them to remind myself that new shiny things can be even shittier :)
While I am here: is anyone aware of a solid client library for handling AT commands? Ideally, suited for an embedded environment so C, no POSIX, no malloc, but really anything would do. Even just a solid implementation of handling the client to steal ideas & code from?
And a script to keep doing ATDT##### in a loop till it got through a busy signal, only to find out that short DTMF tones aren't always recognized, and seemingly the tones for "9" and "1" and "1" work when others didn't.
Explaining to my parents why there was a cop at the door complaining that the 911 operators were mad at us was fun.
It only works if you're sending it, so you have to put it in a ping payload, to make a "ping of death".
This only works because Hayes patented the idea of the escape code being +++ followed by a delay, so to evade the patent, most other modem OEMs removed the delay requirement.
I've tried to read through the AT Protocol specs but they're very vague and clearly unfinished. Their view on federation isn't exactly what I'm looking for ("a bunch of Twitters that can theoretically interoperate" rather than a decentralized platform) but I suppose only having one or two main servers does solve the "I don't know whether to pick outlook.com or gmail.com" problem that many new Mastodon users seem to face.
ActivityPub is no walk in the park either. It's a protocol closer to "the HTTP of social media" than a way to interoperate with other servers. Basic functionality is easy, all you need is a few static JSON files, but if you want to write an application around it you're going to have to dive deep into the docs.
What I don't really see is why BlueSky decided to make its own protocol. Like Mastodon is an API that works on top of ActivityPub, BlueSky could've just been a better ActivityPub server as far as I can tell. Most of the big problems ("my toots disappeared after the server shut down") can be resolved without an entirely new protocol.
Going back to the drawing board seems like an excellent way to find all the hurdles every other protocol already encountered. The recent "official" s3 account is just one example of that, and I'm sure there will be more.
Maybe the value add of the AT protocol will become clearer once it's finished. I'm sure there will be ATProto <-> ActivityPub bridges to make both networks integrate for the people who wish to do so. As far as I can tell, BlueSky is just a new, exclusive Twitter with an API at this moment.
Yes, me. But old modems? Two months ago I was integrating a 5G modem into a bare metal embedded application. AT is also used for the WiFi and Bluetooth chips/modems we use. AT commands on two- or four-wire RS232. Not everything is Linux and PCI/USB.
Same boat. A few years back I spent a summer living off LTE internet (before it was cool!) using a Sierra Wireless USB modem bodged into barely working with the help of pirated reference docs. Man am I glad that I gave up on that idea -- the OG AT protocol is, indeed, a totally obtuse crock of shit to the point where, honestly, it barely even qualifies for the moniker of "protocol" at all.
Yes, this is a brutally poor choice of name. The AT command set is not only well known to virtually every computer person on the planet and has been for 40+ years, it is also not even obsolete but in constant use everywhere on the planet. Just... please change the name of whatever this is.
I work in IoT and write firmware, among other things, that talk to modems. I got really excited to explore a good, in-depth analysis of why the Hayes AT command set is terrible (because it objectively is), instead, we were treated to complaining about a relatively obscure social network API.
Definitely it rang as Hayes AT command set to me ( https://en.wikipedia.org/wiki/Hayes_AT_command_set ). So I was kind of expecting a dig into those commands which I recently had to relearn again for automobiles OBD-2 :)
Yes, I was expecting an explanation in the comments why it is not a crock of shit because back in the day we had only very few bytes to communicate and the "young kids" today don't know what they have etc...
I typed in AT commands today to connect to my mobile provider. YMMV. (In fairness it's only because I hosed my ModemManager config earlier this week and haven't had time to figure out what I did wrong yet, but still.)
> It turns out using Git, which is almost always used with a centralized 'remote', to do federation, which needs to be weakly consistent, IS A BAD IDEA!!!!!
Given that a user owns their feed, I'm not sure why this is a bad idea?
I want to take this critiques here seriously but there's not a ton to grasp onto. It definitely didn't feel great that AtProto built from bare ground, reinvented json-schema & OpenAPI for no reason. But ultimately this is one of a number of grievances that feels like doesn't really matter. It's stupid & dumb but in the end it doesn't matter. It's hard to tell which of these points really have hard & real impact. And which are just bias.
The general feeling is an unencumbered letting out of biases, which makes it harder to trust.
My limited understanding of the problem about git as the basis for user data storage is that it then would expose the complexity of git rebasing and the rest via federation. Git is great with a centralized remote (like I believe the original post is getting at) but less so in a federated setting.
Rebasing and similar operations are only a hassle because multiple authors can edit the same files, so you end up with conflicts.
In the scenario where every authors can only change their own files, you can avoid those sort of conflicts, as they'll never happen.
Remaining is how to solve files referencing other files (in the case of a federated social media, posts that are replies to other posts) which are way easier to solve, as it's not really conflicts but invariants that have to be created.
> A single author can still run into these issues with git if they use multiple devices.
No, using multiple devices are fine, as long as you're not editing the same content in two different ways on them. Meaning changing "A B" to "A C" on one device, and to "A D" on the other.
> Can the same happen in this social media network for a user that uses the same account on their phone, their tablet and their computers?
When using AT protocol, you'd still be connected to the same PDS, so no conflicts. The conflicts mentioned earlier are about server<>server federation, not server<>client communication.
It must be noted that in the context of the AT protocol the "file tree" is (likely) mostly append only, flat, and with globally unique ids, that is because the commits are not for files but for database entries.
If I am correct then conflicts are impossible and you are the only one that can write to your* "branch"
* I doubt that forking and merging are possible so it does not really have any practical UX relations to git rebase.
This is not true. Users won’t be exposed to rebasing, only developers. Rebasing is what powers some deletion features and is what would power the ability to edit.
The functionality would be obfuscated behind easy to use UX constructs.
To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
Good job, Mastodon. (No, I am not the same guys that complain they don't see javascript games or utilities without javascript. But I think I should be able to see a text on the Internet?)
You might want to try Brutaldon, a free and open source Mastodon web client that works without JavaScript. Brutaldon can optionally be self-hosted and it also supports Pleroma.
After logging in to Brutaldon, click the Search button at the top, paste the URL of the thread into the field, and submit the form. The Mastodon post appears. Click the "thread" link to expand the entire thread.
Still, it doesn't show the rest of the toots, so content posted on Mastodon is still not readable without javascript. Which is ridiculous, we have medium, twitter, facebook, reddit etc, and they decided to publish somewhere not readable without javascript.
Sadly, like most web application devs these days, most Mastodon frontends require Javascript to function. There are a few smaller projects that don't but there's no nitter.net for Mastodon just yet. The people behind Mastodon don't seem to have any interest in server side rendering at least.
I'm really surprised every time anyone complains that a web-application (not just a plain web site) doesn't work without JS.
I mean it's an application, applications contain code and tracking that code state server side can involve a non small amount of additional cost.
Sure they could have a web-side like non-logged in viewer mode and then somehow using hydration magic transition to being a app on demand, but that is a bunch of additional work (i.e. cost) for a very small number of people.
But here is a good thing, it's open source so you or any of the view people which care about no-js could try to contribute such a "view only web-site" functionality, or crate an alternative web client which solely uses server side rendering or similar.
But in the end that's not worth your time right (at least for me it wouldn't be if I where in your situation)? So why expect anyone who doesn't even get any benefits from it to invest that time?
I complain about the content (which is text) being not available. I couldn't care less about the web-application. In general too, but especially here, where people are supposed to share mostly text each other. (We can share web-applications as well, and you won't find me complain there.)
At this point I could just flag every mastodon content, since it is not available. Good idea, I will just do that.
I'm glad somebody looked, because basics like when messages are actually passed around is something I didn't see in the atproto docs either.
Alice on server X follows Bob on server Y. When are Bob's posts delivered to X such that Alice may see them? Are they pushed from Y? Are they pulled by X?
Polling seems like an absolute disaster in an environment where people have an expectation of semi real-time communication.
The post alleges Bluesky pull based, which would imply it's like RSS but with signatures:
> It uses pull-based federation instead of push-based like Mastodon.
I was hoping they would employ both: push and pull. Some scenarios I could imagine pull being more efficient, and sometimes push. For instance if I run my own server I don't need to be pushed all content when I sleep. However when I return it could do pulls and then continue pushing. It doesn't look like it's clever like that.
There are likely scenarios where some level of push-based comms will be necessary, as I understand it. Direct messaging, for example, seems like it'll require pinging a server to pull an asymmetrically encrypted message, or similar.
I don’t know anything about XRPC and Lexicon, but claiming that OpenAPI is better than them because it’s more flexible, is not a great argument.
OpenAPI is a very complex spec, and most of the tooling around it, only supports some subset of it.
Sure it’s complex for a reason, it’s design goals are to be able to document the wide array of ways that HTTP APIs can be built, but if you don’t need that complexity, it absolutely makes sense to use a simpler RPC spec.
Claiming that OpenAPI or JSONSchema can generate good client or server code automatically is a bad joke if the author had bothered to try the current code generators that exist. They might work but the code they produce is really not great nor idiomatic to their language.
> “The 'account portability' piece is bullshit! The way 'account portability' works is by having two separate keys, one for signing and one as a 'recovery' key. You're supposed to be able to use the 'recovery' key to rewrite history if your account gets hacked or some shit.
WE HAVE THE ABILITY TO DO THAT AS SERVER ADMINS!!! MASTODON HAS THIS ALREADY!!“
I’m sorry, but how can you take someone seriously who makes comments this absurd.
I'm curious how this will play out for BlueSky with some fundamental human readability vs. decentralization tradeoffs such as the ICN naming tradeoff (http://www.icsi.berkeley.edu/pubs/networking/ICSI_naminginco...) and Zooko's triangle. Often folks point to biometric binding for DID's to get around Zooko's triangle, which is problematic from a privacy/trust standpoint.
Other systems such as Mastodon avoid running into these problems because of direct and implicit use of DNS for the namespace. (And use of DNS embedding in DIDs ends up undermining the value of flat naming.)
The tone of that whole thread in combination with posts like
> Imagine if I had to store the 50k+ tweets I've made on Twitter on my device, and upload ALL of them to a new server whenever a community server went down.
doesn't make this person seem particularly competent. I'm getting a vague feeling that this is normal discourse on Mastodon though, and that not using social media much shifts your personal overton window for what is acceptable communication until you essentially don't overlap with the very online crowd anymore.
For what it's worth, when a server does go down and thousands of people start uploading their 50k skeets that all need to be cryptographically verified, other major servers will have quite the scaling challenge. Cryptographic verification is intentionally compute heavy after all.
50k entries isn't a whole lot but as people hosting Mastodon servers have found out, things start slowing down when 1000 people transfer those 50k entries at the same time.
> Cryptographic verification is intentionally compute heavy after all
You're thinking of password hashing functions like argon2, there's no reason for normal signing and verification operations to be intentionally expensive as that's not where the security guarantees come from.
Indeed, verifying a signature does not rely on mathematical complexity like hashing does. However, they do usually involve finite fields and large numbers that require a whole bunch of complex mathematical operations. Then there's also the fact that optimizations are often not available as they would allow for timing attacks.
Compared to something like simple a CRC32 checksum to verify that the data was transmitted correctly, these operations will always be complex and more time intensive than you'd prefer them to be.
That might be true - but how else can it work? The data has to get onto the server somehow, and the days of unsigned or unencrypted data ended with Snowden.
The OP was going on about storing all the data on-device and uploading it, but regardless of where it’s stored, if a bunch of people have to move, the thundering herd problem, so to speak, will still exist.
Also, signature verification is not slow. I don’t know what AT uses or how good this source from 2020 is [0] nor what machine it ran on, but ed25519 verification takes about 50us for a 32 byte signature verification. That suggests this guys 50k posts will be validated in 2.5s. If we assume just a single server then that’s about 34k users per day. Or put another way, 30 servers to onboard one million people in a day. None of this seems outrageous to me.
BlueSky uses @noble/secp256k1 which performs this stuff in Javascript, with about 880* verifications per second on the Apple M2 (a chip with a relatively high IPC, likely higher than your average server).
Verifying those messages will take about a minute of CPU time per user (assuming no impact from cache misses due to threads swapping in and out and processing new data). I think that's quite significant.
Yes, OK I agree, that is pretty bad. That’s about 1400 users per day per server using napkin maths. Definitely not the kind of scale I would have expected.
But is this the only possible implementation? I suppose for the OP to have a reasonable point (that it’s “a crock of shit”), this would need to be an intractable problem.
I assume a server written in native code would be faster, but I don't know by how much. The normal use case for this algorithm has to do with Bitcoin and the Bitcoin libraries aren't written for multithreaded social media servers so it's hard to say. Theoretically it could be fast, but you'll need to find someone crazy enough to implement thie protofol first.
Mastodon does not currently support importing posts from your old account. Just followers/following. It's a nuisance, and there's nothing preventing it per se, since the posts are signed, but that part could be better.
I know, and although I understand why (ActivityPub IDs are generally URLs and changing them would probably cause duplicates, but there are URLs that don't change). It's quite annoying that it can't be done right now, but I'll admit that I also don't care enough to come up with a patch for it.
1. They are URI's, and while ActivityPub say they should be https URL's, they don't need to be, and could e.g. point at IPFS or similar.
2. JSON-LD signatures are used by Mastodon, and included in the export, and nothing stops another instance from validating those and serving them up with the original URIs in the id from new URLs, as a means of making it clear the server didn't originate them (there'd be a trust issue if the other servers is unable to get hold of the keys because the original server is gone, but no more so than if the new server had simply republished the content, so the "worst case" is to distrust the original id's).
There are some corner cases there, around trusting the identity of the old and new account represents the same user, so I do think a recovery key type scheme would be nice to allow a user to prove the old and new id is the same (if changing id; I also think we could really use decoupling the expectation that a webfinger id is inherently tied to a Mastodon account - you can sort of do that today; nothing stops you from serving up a separate webfinger result and use it as an alias, but there are usability issues to solve there).
You're right, there's actually no reason why a second server couldn't serve up toots with the old IDs.
The main problem would probably be keeping track of what server to fetch these messages from after a move (or even a second move) to a different server and keeping the metadata attached in sync.
On your last point, I think the key distinction is the number of people in the social circle.
On Twitter, it was the entire world. And if you wanted to hold the conch shell, you did what the algorithm wanted which was OUTRAGE.
On Mastodon, it's theoretically scoped down to just your server and a handful of deliberate federations. This exists. You can find these servers and have a BBS-like experience. Tooters give you the time of day and assholes are shown the door.
The Very Online folks, however, use Mastodon differently. For them it's more like "I'll build my own Twitter!" And what they want is more OUTRAGE feed to tap into 20 times a day, but with them in control of the algorithm. So they federate freely, with very large servers, making no material progress on the thing that makes Twitter unhealthy.
The fact that Mastodon can seemingly be used both ways is a good accomplishment. It's not the technology's fault the second use-case is socially toxic. But it does make it very hard to talk about Mastodon because you don't immediately know which way anyone is using it.
Everything is upper case and absolutes these days, so many appear to have lost the capacity for nuance; it's turned into a shouting match for our attention.
I suppose there are exceptions, but through my career (and life), I've found that negative blustering hyperbole like that is a pretty good indicator that I should not take the criticism (and usually the criticizer) very seriously.
It's like a person ranting about a way of building decentralized social media that is currently subpar on a decentralized social media platform so that the people building the new decentralized social media platform can redesign it to be improved such that the ranter can move onto the new social media platform that facilitates him ranting about even more things in a vastly more efficient manner because they listened to his rants and improved the federation protocol.
I wrote code for a PC and an embedded system to use modems -- relatively simple stuff like dial up, hang up, wait for a call, set baud rate. Nothing fancy.
We tested about 10 different modems. All of them had their own unique bugs I had to work around. Or maybe one of them didn't have any bugs (that I ran into) -- it's been thirty years.
It seems like blue sky is fun right now because it’s exclusive. There doesn’t seem to be anything fundamental to the technology that differentiates the experience from Twitter.
To be fair, you can have two products that uses different technologies but have a similar UI/UX, and in a way BlueSky is more similar to Twitter than to Mastodon, UI/UX-wise.
And to be honest it's a good thing: many non-tech friends were a bit confused about Mastodon, where you have to understand parts of the technology behind to do basic things (like follow someone from another instance, IIRC). BlueSky is (currently) a bit more friendly UI/UX-wise. I don't know how that will evolve with more instances, though. I also found that discovering people was easier, but YMMV.
The technology allows the community to iterate on the experience from twitter. There are already several alternate clients, people have made custom feed algorithms, custom tools and spam-fighting systems, and we're just getting started
The twitter experience has been in decline for years, and took a sharp downward turn when ownership changed. But people on twitter just have to live with it, especially now that the (already limited) API has been closed down almost completely, because it's not an open platform like bluesky. If bluesky ever starts to make the official client a bad experience, or starts to make moderation (or non-moderation!) decisions people don't like, users don't have to just live with it. They can go find or create the experience they want without losing anything
> There doesn’t seem to be anything fundamental to the technology that differentiates the experience from Twitter
It seems like you're conflating the technology with the user experience. Yes, the user experience is similar. No, the technology underpinning it isn't.
"Instead you have a 'DID:PLC', which is a kind of 'DID' (invented by, not a surprise, CRYPTO PEOPLE)"
Interesting. About two weeks ago at the HIMMS conference (Healthcare Information and Management Systems) in Chicago I ran into this. DID has crypto stink on it and people are actively avoiding it as a result. In this case it was the CTO of an established US consortium involved with CMS standards.
This is a shame and it seems irrational to me. Is W3's work on DID doomed? Do they know just how bad the optics of DID are?
I was really hoping for a scathing review of the old AT modem protocol. My understanding is it’s still what’s used by cell phones to communicate with their cellular modems. Back in the early nineties I spent a lot of time learning and using these commands to send files over the phone to friends without an internet connection.
> I think the fundamental reason why we keep seeing more and more bullshit protocols and projects pop up like this is one fundamental mindset: a refusal to attribute the problems of the modern internet solely to capitalism.
I mean, I'm sure they have some points in this ranty thread of toots, but it's hard to take it serious when it ends up blaming it all on capitalism (which, I hear you brother, I'm no fan either) and it's FILLED WITH SHOUTING FOR NO GOOD REASON.
Sometimes, when you feel strongly about something, it's useful to write a first draft, and come back to it after cooling down for a day or two, and rewrite it to be more nuanced.
The AT Protocol has many flaws, like any protocol, that much is evident. But I don't think this toot-thread gives a accurate view of those flaws. It also doesn't seem to consider that all these different protocols make different tradeoffs, and none of the protocols try to be "one protocol to rule them all". They are simply better at some things, and worse at others.
> Also I don't care if I'm spreading FUD or if I'm wrong on some of this stuff. I spent an insane amount of time reading the docs and looking at implementation code, moreso than most other people.
> If I'm getting anything wrong, it's the fault of the Bluesky authors
This is a really disappointing way of reviewing things, I hope that it doesn't become more popular, because no one actually learns anything from it. It ends up being just a rant, but masqueraded as education.
Who is this Sam guy? He yells a lot and while he has identified some interesting points, he also seems very... Alarmist. Is there a reason this post deserves attention beyond its content?
It's a nice bit of marketing for Bluesky: in the process of reading the discussion I went from "it's some vague cryptosocial thing" to "well, it does solve things and actually might be worth a look".
We still have a ton of remote IIoT edge nodes with with Ericsson f3507g modems as the local SMS gateway for the site. Every now and then they hang and require manual reinitialization with minicom and AT commands (unless you want to reboot the whole edge node)
No one outside HN cares about AT Protocol either, they just want Twitter 2.0 and have found a promising exclusive club over at BlueSky. I doubt anyone but a small minority of BS users will even know that you can theoretically run your own server and that eventually other servers will join the network.
My 5G modem needed some AT commands (AT^CUSTOMER among others) to be made compatible with the linux kernel. It turns out the exact same Foxconn device is rebranded HP, Dell, Telit and possibly some other by simply running AT^CUSTOMER=1(Dell), 2(Telit), 4(HP). The HP variant wasn't compatible with Linux, but the Dell one was.
There's large forums such as wirelessjoint, ispreview and sierra where AT is still extremely relevant.
The Hayes AT command set is still the primary way to talk to embedded modems. I would argue quite a few people outside of HN care about it to this day while very few people outside of HN know or would care about the AT Protocol.
I would say most of the people who know or care about either are on (or know about) HN - but that doesn't mean either's unimportant, it's just the nature of what it is, a technical detail behind the scenes that some of us work with.
> Imagine if I had to store the 50k+ tweets I've made on Twitter on my device, and upload ALL of them to a new server whenever a community server went down.
Twitter had what, 160 or 280 char limit per message? 50k*280 is 14MB. What's to imagine here?
With all that screaming from a user on the main competing network, I really need to hear a response from the Bluesky team to form an opinion here. And understand better why this guy is so upset about this destined-to-fail network to curse and foam about it.
> With all that screaming from a user on the main competing network
Isn't everyone on a "competing" social network before they join Bluesky?
I managed to snag an invite (sorry, I don't have any others), so I'm on both at the moment. Though I've found Bluesky to be pretty boring and don't check in much.
> And understand better why this guy is so upset about this destined-to-fail network to curse and foam about it.
It didn't sound like he wanted to fail: 'And I went into this with an open mind. I was like "I'll just make a simple alternative to the BlueSky server in Elixir".'
Exactly this, I literally was just trying to implement a simple Elixir server that would scale pretty well. The problem is the protocol is incorrectly spec-ed, is not thought through very well, is incredibly complex and hard to explain, and reinvents the wheel in just about every way it can. This means that in languages like Elixir, which has stuff like an ActivityPub implementation, OpenAPI libraries, and JSON Schema support, you have to write everything from scratch because the user base is smaller than TypeScript / whatever they're writing their implementation in. The protocols they're layering on don't have proper implementations in Elixir, because they're hard to implement right and are annoying to deal with.
It's a lot better than Mastodon, where you simply lose everything.
The author totally glosses over that and scoffs at it being a real problem both in the thread linked and here in the comments, yet real people have experienced the extraordinary pain of watching their accounts evaporate overnight because of some capricious server admin. It's a total nonstarter for ever using Mastodon.
People might be imagining it as something tightly coupled with the application rather than a data file you can just backup anywhere. Could be worth chucking a "store your profile on your own device, Apple iCloud, Dropbox, Google Drive, or your preferred file storage solution" somewhere.
The issue is not "access" to your data. You're not forbidden from accessing your data in any case. The real issue is that it may be bad to have full and sole responsibility for maintaining backups of your social media data.
1) If you add image files and especially video files to the mix, the size of your data can get huge.
2) Smartphone users especially don't have a ton of free storage space.
3) In general, people aren't great at maintaining their own backups.
The question is why a so-called distributed network can't maintain distributed copies of user data, rather than forcing the burden onto the users.
It's pretty obvious that bluesky is mostly handwaving around decentralization and federation. It will all go in the too hard basket once the service gains critical mass.
Another VC-funded bait and switch; nothing to see here.
I've had a good heuristic in my life of simply writing off people that go on lengthy, cuss-filled rants about how bad something is without soberly explaining the technical components, offering alternatives, etc. This author seems no different.
A bit difficult to take anything seriously that this hyperventilating dweeb says:
> But I think what's key is to keep an anticapitalist mindset. We can make things easier for users without allowing in what makes social media so fucking awful: capitalism.
Seeing .bsky.social under every username is all I need to see to know the federated aspect of Bluesky will be dumped if it ever reaches success.
Normies do not care about this stuff in the slightest and it isn’t worth confusing them to include it and doesn’t benefit the site now it’s gunning for being “new Twitter”
I don't know, sounds pretty interesting that every user has their own repository and can upload it to a new instance, however how will content moderation even work on a platform like this?
I'm no expert, but piecing together what I've read, content moderation will not really be a thing. Instead there will be content labeling services which can be provided by third parties. You would subscribe to a labeling feed then use those labels to filter content you didn't want to be visible.
Pretty much yeah. Like the web, filtering happens on the read side, not the write side. This:
- Dodges any free-speech issues
- Gives individuals more choice and control over what they see
- Allows labellers to not worry as much about false-positives because their impact is limited by the above, which means they can use more automation, etc.
I'm amused. First, that it's the AT protocol... I thought someone was getting pissy about their 45 yr old vintage 1200 baud modem. Second, that people are discovering that Twitter 2.Oh isn't truly federated but just pretends to be. The open source community's spent how many years designing a non-proprietary protocol and putting out reference implementations (and extending it to Facebook-like and Youtube-like and Wordpress-like offerings)...
But no, everyone's all just butthurt over Elon Rocketman and will accept whatever garbage anyone puts in front of them.
> The open source community's spent how many years designing a non-proprietary protocol and putting out reference implementations (and extending it to Facebook-like and Youtube-like and Wordpress-like offerings)...
And it's awkward to write code for and the people who inhabit it range from right-wing reactionaries to, as was so effectively put, the homeowners' association. Turns out that people don't care about "open source" unless it's easy to work with and people especially don't care about "open source" when they want to talk and shitpost with their extended friend group. (Which no, Mastodon doesn't do a good job of! Even setting aside that quote-toots are Good, Actually, Mastodon doesn't let me see replies to a toot unless I go digging so I don't know if I'm just being one of another set of replies that might already have said what I was going to say, so why post at all?)
I tend to think that the Bluesky crowd seems like they have their shit together as well as having a small beta explode can allow it (the web app is literally at `staging.bsky.app`, come on) and I think the AT Protocol docs identify real and probably intractable shortcomings in ActivityPub. But it doesn't somehow render ActivityPub moot if you want to use it. Go for it. It's still there.
> and I think the AT Protocol docs identify real and probably intractable shortcomings in ActivityPub
Except that they are objectively wrong about nearly everything they talk about with regard to ActivityPub. Quoting from the FAQ:
> Account portability is the major reason why we chose to build a separate protocol.
There is a widely-accepted account portability protocol built on top of ActivityPub that multiple servers, including Mastodon and Pleroma, all support.
> We consider portability to be crucial because it protects users from sudden bans, server shutdowns, and policy disagreements.
There is nothing inherent about their protocol that solves this. The app on iOS (the Bluesky app) solves this by downloading all tweets locally, which is incredibly space-inefficient and keeps the server from... doing the job of a server (storing that data for you). Additionally, user data is still accessible and downloadable after a suspension on Mastodon and Pleroma.
> Our solution for portability requires both signed data repositories and DIDs, neither of which are easy to retrofit into ActivityPub.
There is quite literally no need for this and they absolutely could have built something that addresses these issues on top of ActivityPub. We're talking about the people who couldn't use OpenAPI, but instead built a shittier version of GraphQL while bold-faced saying 'there was no alternative'.
> a preference for domain usernames over AP’s double-@ email usernames
No need to build a separate protocol for this.
> and the goal of having large scale search and discovery (rather than the hashtag style of discovery that ActivityPub favors).
Nothing about ActivityPub, Mastodon, or the general Fediverse prohibits you from scraping it to make this happen. There are services that do this right now. Mastodon has discovery built into it, this literally completely ignores that.
ActivityPub is the standard for federation on the internet, and for allowing interoperation between social networks. That is the key here. I don't give a shit if people use Mastodon. I myself probably wouldn't use it if I wasn't running a Mastodon server.
What I do care about is whether a service is built on ActivityPub. Even if my friends want to use another social media service (maybe they're on PixelFed or whatever it's called), I can still follow them and interact with them over there while using Mastodon. You cannot do that with Bluesky.
The problems that Bluesky identified with ActivityPub objectively could've been solved by building something on top of ActivityPub, retaining the 'interoperable social network' quality of it. Email had the same problems, and instead of throwing out the email protocol, we built DMARC and co on top of it.
So no, I don't want people to use Mastodon, I don't give a shit about people using Mastodon. I literally criticized Mastodon later on in the thread. What I care about is interoperability, ease of use, and open standards, and AtProto is objectively not that.
At an incredibly practical level, bootstrapping a new ecosystem in closed beta that isn't linked to the fediverse has been really great at attracting posters from Twitter who find the original fediverse too stale and curmudgeonly.
Just like twitter before it, and every other social network, they change in tone over time. Sure right now I guess you could argue that the fediverse is stuffy, though not in my little corner apparently. But that won't stay true. Every person that joins changes the network. Sure some servers will ban other servers for any number of reasons and it will probably fragment. Who cares it will all come out in the wash.
To add regarding the signatures and portability, Mastodon at least signs its posts with JSON-LD signatures. They're in the downloaded archive I got from mastodon.social when I moved off it, for example. So is the key required to validate them.
The only thing missing there to make this better - and this is not an ActivityPub thing - is including those JSON-LD signatures in more contexts, so that you don't need to rely on the export functionality (of Mastodon) to get an archive, allowing clients to choose to keep a local copy (or nominate someone to back it up for them), and providing an upload functionality for posts (Mastodon doesn't do this, but that's also a Mastodon thing, not an ActivityPub thing).
I wouldn't have had an issue if they added extensions to ActivityPub. There are even things Mastodon refuses to add that I'd applaud people for forcing the issue on by adding extensions to support. But their choice to reinvent everything puts me off.
That was the protocol I thought I was going to read a rant about by the title, and I was all ready to agree wholeheartedly with the author.
But it was a different protocol that I have no experience with. It might be a crock of shit, too, but it's hard to tell from that rant.
Interestingly, the original modem AT protocol was reasonable. Then as modems became more featureful, the protocol kept getting extended, and extended, and then I stopped using POTS modems and thought I'd never have to deal with it again.
Until I built my own cellphone discovered that not only does it live on, but it's been extended even more.
Yep, definitely right there with you on this one. Every time I think oh this will be an interesting blast from the past, followed by bah, get off my lawn...
With the first comment mentioning XRPC, I was even more confused, since that's the name of a protocol from around the same time period. I believe TI-RPC won that battle.
Yeah, i just wanted to comment how the AT protocol is really old, and that mobile developers decided to reuse it and add a gajillion extensions to it, so it's not really the (base) protocols fault...
I'm happy I went straight to the comments because with yours I found out the article won't talk about the "good" old AT command protocol, which indeed, not so long ago I had to integrate with a C project for AVR chips. What else could it be? :-)
For the normal consumer, Twitter is better than Bluesky/Mastodon, Macs and Windows are better than Linux, ChatGPT and Open AI have better predictions than open source LLMs. Its not even competitive the closed source solutions are so much better than open source ones, in these scenarios.
Also came here to read about AT modem commands. Still have no idea (and admittedly little interest) what this other nonsense is... Did it have to use the existing name? Did the authors not have access to a search engine?
When he was ranting about crypto, I had assumed that Bluesky had done another silly Coin or something. Nope, this is just apparently a dislike of public keys and signing messages, which seems like a great idea for a federated protocol.
Parts of the BlueSky protocol do use stuff designed for "web3" cryptocurrency stuff (https://www.w3.org/TR/did-core/), but there's no cryptocurrency in use for the AT protocol.
I'm not sure why they're even using this because the docs state that the current DID stuff is all placeholders until they can find something better.
> I'm not sure why they're even using this because the docs state that the current DID stuff is all placeholders until they can find something better.
DID is there to stay, it's that the DID spec has a concept called "verification methods" https://www.w3.org/TR/did-core/#verification-methods. They are providing a verification method called "DID Placeholder (did:plc)", because none of the existing methods suited their goals. The idea is that if and when a better verification method appears, they can move to that, but that doesn't mean abandoning DIDs altogether.
Between that and the bitching about capitalism it is hard to take the rant seriously. If you have a technical argument why the protocol is bad, adding political stuff on top of it makes it less credible.
BTW this is really a critique at bluesky FOR using the AT protocol and not the AT protocol itself. It's not like Sam has an option to use something else if he wants to interface with Bluesky.
“Twitter outrage” is found anywhere humans are found.
I think people just really wanted to believe that it was Twitter-based because then 1) it’s someone else’s fault (Twitter, not human nature), and 2) it can be magically fixed by leaving.
Centralization is inevitable and normal users only care if they can use it easily or not and don’t have to choose a instance or set up their own mail server, instance, or whatever.
As always with typical techies, the emotions put into this post were already running high given Bluesky itself has gotten someone extremely angry over the tiniest things.
You (probably) and I have both had a more tech oriented upbringing. Current and future urban generations seem to be more tech oriented, anyway, so perhaps we, who probably form the majority of tech consumers, will become these "normal users" we condescendingly refer to in this forum.
With sufficient investment in education or at least awareness, the "normal users" may eventually be privacy and freedom oriented.
So far that's been the case, but the spectacular failure of Twitter as a centralized service has definitely taught some people a lesson. It remains to be seen if wariness of centralization will be a major factor for people in the future.
Before I do, let me just say: Bluesky and the AT Proto are in beta. The stuff that seems incomplete or poorly documented is incomplete and poorly documented. Everything has moved enormously faster than we expected it to. We have around 65k users on the beta server right now. We _thought_ that this would be a quiet, stealthy beta for us while we finished the technology and the client. We've instead gotten a ton of attention, and while that's wonderful it means that we're getting kind of bowled over. So I apologize for the things that aren't there yet. I haven't really rested in over a month.
ATProto doesn't use crypto in the coin sense. It uses cryptography. The underlying premise is actually pretty similar to git. Every user runs a data repository where commits to the repository are signed. The data repositories are synced between nodes to exchange data, and interactions are committed as records to the repositories.
The purpose of the data repository is to create a clear assertion of the user's records that can be gossiped and cached across the network. We sign the records so that authenticity can be determined without polling the home server, and we use a repository structure rather than signing individual records so that we can establish whether a record has been deleted (signature revocation).
Repositories are pulled through replication streams. We chose not to push events to home servers because you can easily overwhelm a home server with a lot of burst loads when some content goes viral, which in turn makes self hosting too expensive. If a home server wants to crawl & pull records or repositories it can, and there's a very sensible model for doing so based on its users' social graph. However the general goal is to create a global network that aggregates activity (such as likes) across the entire network, and so we use large scale aggregation services to provide that aggregated firehose. Unless somebody solves federated queries with the sufficient performance then any network that's trying to give a global view is going to need similar large indexes. If you don't want a global view that's fine, then you want a different product experience and you can do that with ATProto. You can also use a different global indexer than the one we provide, same as search engines.
The schema is a well-defined machine language which translates to static types and runtime validation through code generation. It helps us maintain correctness when coordinating across multiple servers that span orgs, and any protocol that doesn't have one is informally speccing its logic across multiple codebases and non-machine-readable specs. The schema helps the system with extensibility and correctness, and if there was something off the shelf that met all our needs we would've used it.
The DID system uses the recovery key to move from one server to another without coordinating with the server (ie because it suddenly disappeared). It supports key rotations and it enables very low friction moves between servers without any loss of past activity or data. That design is why we felt comfortable just defaulting to our hosting service; because we made it easy to switch off after the fact if/when you learn there's a better option. Given that the number one gripe about activitypub's onboarding is server selection, I think we made the right call.
We'll keep writing about what we're doing and I hope we change some minds over time. The team has put a lot of thought into the work, and we really don't want to fight with other projects that have a similar mission.