More

Nevolihs · on June 1, 2024

That's not quite true. There's overall a big, distributed and decentraliszed effort to archive our electronic past, it's not just archive.org. For example, it's fun to look at old newsgroup discussions from 1994, and it's something I can find at Google's archive. And more importantly, it's available to everybody. And it's not the only place to find historical internet stuff.

I do agree though that archive.org is too valuable a resource to ever lose.

Google also isn't perfect. There was a post the other week about how there were no searchable images on Google older than ... I don't remember, 2005?

Nevolihs · on June 1, 2024

I wish the author had given the model of the HDD and pictures of the PCB and connector. I think it'll probably be reasonably simple to hook that up somewhere else to copy the data. I work at a place where we do this sort of thing all the time, even with 30 year old hardware or older, the problem is just getting out of the software solution box. This isn't a software problem and it requires a different sort of expertise for the most part.

Nevolihs · on June 1, 2024

I've been wanting something like this for my entire life and it seems like computers are finally heading in a direction of becoming extensions of ourselves instead of just clunky tools we have to adapt ourselves to. This has been the kind of sci-fi I've dreamt of since I was a kid.

I get the privacy concerns. And security concerns. But honestly, if somebody has this kind of access to your computer, they have access to your entire life anyway. Setting up a keylogger, getting all your passwords, getting any vital documents, etc. It's trivial. When is it just pointless fear mongering? It's subject to the same security concerns as everything else and we don't suggest people switch to pen and paper instead.

usrbinbash · on June 1, 2024

> But honestly, if somebody has this kind of access to your computer, they have access to your entire life anyway.

Exactly. So please tell me why I would want to essentially allow something like this on my machine?

As for the scifi-esque abilities: All my important information is stored in a personal wiki, which is automatically backed up and encrypted on a backup server. It's fully tagged, its fully searchable. And you know what? I have even set up a pipeline to feed it into a locally run LLM if I want to. Yes, I can do RAG on my personal wiki.

Nevolihs · on June 1, 2024

> All my important information is stored in a personal wiki

Ah, there's the crux, no? I dream of a system that automatically captures and allows me to interact with and query against everything I do, every ebook I download, every image, every document, every video, every site. What you're describing isn't this, it's not even the same ballpark.

Recall, even if it isn't perfect, is in the same ballpark.

> Exactly. So please tell me why I would want to essentially allow something like this on my machine?

Because it's useful. I don't understand this. It's like a circular argument. I want a useful feature, I install it on my PC. The security concerns apply regardless of Recall existing.

usrbinbash · on June 5, 2024

> The security concerns apply regardless of Recall existing.

Not quite. Because there is a world of difference between an attacker being able to grab information that is being entered or temporarily present, or permanently present and encrypted, and an attacker being able to, within seconds, grab everything the user ever did, watched, saw, entered, etc. on his machine in one fell swoop:

https://doublepulsar.com/recall-stealing-everything-youve-ev...

Quote from the Q&A Section of the post:

"Q. But if a hacker gains access to run code on your PC, it’s already game over!

A. If you run something like an info stealer, at present they will automatically scrape things like credential stores. At scale, hackers scrape rather than touch every victim (because there are so many) and resell them in online marketplaces.

Recall enables threat actors to automate scraping everything you’ve ever looked at within seconds.

During testing this with an off the shelf infostealer, I used Microsoft Defender for Endpoint — which detected the off the shelve infostealer — but by the time the automated remediation kicked in (which took over ten minutes) my Recall data was already long gone."

End Quote.

And this is THE core problem with such a system, that just hoovers up everything, everywhere, all at once: It creates a single point of failure so critical, it will instantly become the prime target of every attack, because no other target is needed any more.

usrbinbash · on June 2, 2024

> I dream of a system that automatically captures and allows me to interact with and query against everything I do

Well, I don't.

For starters, because not everything I do generates relevant information for me. All those hours spent surfing through random blogs or wikipedia articles, most news I read, or watching some random scifi series just isn't info that I will ever need to refer back to. It's just noise. Best case scenario, it's useless information that just bloats the database.

Alternative scenario: It messes with the results. Intelligence and statistics are not the same thing. So what if I read some scifi novel on my screen, and the content of that pages embedding just so happens to score higher to a future query than an actually useful document?

In my system, I decide what's relevant information and what isn't. Subsequently, when I use RAG on this, I use it on already curated data. That data is already having structure imposed on it, so it is accessible to me with our without the help of AI.

--------

Then there is also the whole issue about information that is simply too sensitive to store in my personal Wiki. For example: I would NEVER store my bank account details, or API keys in there, even though everything is encrypted. All passwords and similar information go into a separate system FOR A REASON; Single points of failure are bad, and this goes double in security.

Now, a system that takes a screenshot and runs OCR on everything? Cool. What if it does so when I am just running a testscript after setting a temporary envvar in preparation? I don't want that info anywhere outside of my password manager. And yes, this does matter, even if everything is stored locally.

--------

And, of course, we haven't even started on the whole issue of privacy, and how people will feel about a system that basically logs and stores everything the user does. It doesn't matter how useful it could be if people don't feel comfortable with it.

Nevolihs · on May 31, 2024

Ideally OP would keep the source images of the original journal pages around even after transcription. I think ChatGPT (or LLM in general) is probably the best option, but the best overall solution would accept that LLMs are flawed and would require long-term iteration.

d1sxeyes · on June 1, 2024

The problem with ChatGPT is that you might not know to check the original.

If the original text is “I’m getting married on the 10th July”, you’ll know to check the handwritten note if it says “I’m getting married on the l@ July” but not necessarily if it says “on the 16th July”. ChatGPT seems to do the second quite often.

bckr · on May 31, 2024

Thanks all, I tried ChatGPT and it didn’t like my handwriting at all.

Which is understandable… :’)

Nevolihs · on June 1, 2024

Have you considered training a model on your handwriting?

bckr · on June 1, 2024

Yep! However that needs a ton of labeled data, so a bootstrapping method is required.

I like the idea of doing it by speech recognition, or of chopping it up for privacy and then outsourcing that to humans at cost.

One thing I … Imagine … would help—is having a private web app where I could pull up a document and then make a voice recording on my phone.

Maybe I’ll put this together on my plane trip.

Nevolihs · on May 31, 2024

We don't have the technology for that yet, unfortunately. We can sell you a second MacBook though.

Nevolihs · on May 31, 2024

I would assume any differences in service quality come down to regional differences too.

Nevolihs · on May 31, 2024

I'm at 84% on this M1 MBP that I've been using for 2 years, I'm not too far off from his numbers.

Nevolihs · on May 31, 2024

A single example of it being done well doesn't have any bearing on all the issues stemming from the use of Discord as knowledge bases, wikis and documentation.

Out of curiosity, is any of that useful information on the Elixir Discord available to be searched for with Google or any other search engine?

barkerja · on May 31, 2024

> Out of curiosity, is any of that useful information on the Elixir Discord available to be searched for with Google or any other search engine?

It is not, and that is the one major knock against Discord. But see my response above about that.

Nevolihs · on May 31, 2024

Tbh, developers just need to test their site with existing tools or just try leaving the office. My cellular data reception in Germany in a major city sucks in a lot of spots. I experience sites not loading or breaking every single day.

LtWorf · on May 31, 2024

developers shouldn't be given those ultra performant machines. They can have a performant build server :D

Nevolihs · on May 31, 2024

Websites regularly break because I don't have perfect network coverage on my phone every single day. In a lot of places, I don't even have decent reception. This in Germany in and around a major city.

Why do you think this only applies to people on a boat?

chipdart · on May 31, 2024

> Websites regularly break because I don't have perfect network coverage on my phone every single day.

Indeed, that's true. However, the number of users that go through similar experiences are quite low and even those who do are always a F5 away from circumventing that issue.

I repeat: even supporting a browser other than the latest N releases of Chrome is a hard sell to some companies. Typically the test matrix is limited to N versions of Chrome and the latest release of Safari when Apple products are supported. If budgets don't stretch even to cover the basics, of course that even rarer edge cases such as a user accessing a service through a crappy network will be far from the list of concerns.