Hacker Newsnew | past | comments | ask | show | jobs | submit | ricardo81's commentslogin

agreed. we interact with so many different types of software and I presume like me, we designate a confidence score of how things will work out because there's so many unknown quantities out there. Those little thoughts you have while an app/page is doing what it's doing, wondering whether it even works as it says in the first place.

I place value on grammar but appreciate in the web today that surely around half of English words in it are ESL (while ignoring AI). And that's fine, it's a human thing- not everyone was taught English or has known it a long time, or has dyslexia etc etc.

I guess in the end, allow end users to have full confidence in you in all ways possible.


It's good that he has his own website! I can relate (for non famous reasons) about the Facebook issues. I can't even sign up any more, using my real name anyway.

It can be a pain as so many local organisations use Facebook as a free way to share information. Unfortunately if you're not logged in pages can be rate limited, get spammed with modals to sign up, can't scroll very far into any feed and probably in his case a nuisance as a platform for his business.


An IPv6 would also. Then this chap could have his website hosted on it.

That's what I read on the surface. Any useful links for the context?

The best I've read is "The Eighth Day Of Creation" (which is amazing book beyond the part that covers the elucidation of the structure of DNA). He references multiple internal data sources that establish the process by which Gosling's photo made it to Watson and Crick. Of all the accounts I've read, it seems to be the most factual. I think it's also worth reading Watson's account ("The Double Helix") and the book that originally brought the most attention to the treatment of Franklin ("Rosalind Franklin: The Dark Lady of DNA")

I believe this article has some updated results: https://www.nytimes.com/2023/04/25/science/rosalind-franklin... and it appears there was an earlier book before Dark Lady, referenced here: https://www.nytimes.com/1975/09/21/archives/rosalind-frankli...


A thing you'll have to watch for is these agents actually being a user's browser, just the browser provider is using them as a proxy.

Otherwise, there are residential IP proxy services that cost around $1/GB which is cheap, but why pay when you can get the user to agree to be a proxy.

If the margin of error is small enough in detecting automated requests, may as well serve up some crypto mining code for the AI bots to work through but again, it could easily be an (unsuspecting) user.

I haven't looked into it much, it'd be interesting to know whether some of the AI requests are using mobile agents (and show genuine mobile fingerprints)


I wonder if unknown /s powers persuaded us to homogenise things which ultimately suited AI training for AI to be viable.

- search engine algorithms used be be the main place of information discovery. Before 200x it would involve not using javascript for any text you wanted to be readable by a bot

- "best viewed in x browser" which happened in the late 90s and early 00s. If a website looked crap, use the other browser.

- social graph metadata. Have a better image, title, description for people who see a snippet of your page on a social network

Nowadays everything is best viewed in Chrome/Safari, Firefox does have some issues.

Google owns the majority of the search market.

Facebook/Twitter/Linkedin at least in the Western world drive most social traffic.

I would guess the 'taste' of AI has been predetermined by these strong factors on the web.

An alternative could be a DMOZ like directory with cohorts voting on the value of things, maybe with the help of AI. It does seem like the web has been 'shaped' for the past 15 years or so.


Lol youre giving too much credit to certain people.

People have trouble thinking 2 years out, let alone 5, 10, 15, 20 years...


What certain people do you mean?

To me it's undeniable that the web has become more centralised, more homogenised, and certain agents find that very convenient.

even wiki(pedia|data) is very convenient for large scale training, and most of their sources are from the 'open' web.


True. It'd be illuminating to know how far and wide it is used. It was always been my go-to library for parsing XML in a number of languages.


With Google aggressively blocking bots on search, likely due to AI scraping- it wouldn't surprise me that the view counts are also affected by bot detection.


WHOIS used to be semi useful, though most records tend to be redacted for the average user now.

`dig` on DNS also, if it resolves to the same IP as paypal for example, that adds confidence. Though again, nowadays less useful due to a lot of things being behind Cloudflare.


I'm definitely not one who thinks about these things deeply (as others surely do more), though the act of having a private conversation seems sacrosanct, why should distance or medium be a factor.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: