I did a showhn with similar idea(got a whooping 1 point and was flagged as spam which was later removed by mods), you paste your html and it encodes it into url, you can share the url without server involvement. I even added a url shortener because while technically feasible encoded url becomes long and QR code no longer works reliably. I also added annotation so you can add your comments and pass it to colleagues.
that guy is not including ffmpeg and is not encoding in browser. What he is doing is generating a ffmpeg command that you can run on your cli/scripts etc.
"We ran our own analysis sampling 150 profiles per repo across 20 projects and found repos where 36-76% of stargazers have zero followers and fork-to-star ratios 10x below organic baselines"
This does not looks like appropriate signal to use on github, i doubt that this is organic baseline.If this is used as metric than study might be flawed.
correct but it should be some ratio of model size like if model size is x GB, max context would occupy x * some constant of RAM. For quantized version assuming its 18GB for Q4 it should be able to support 64-128k with this mac
what was your data size? i am surprised 800kb made a difference? using stringzilla was smart approach,my guess is it being unusually faster made all the difference.
Usually I'm dealing with about 20mb of compressed data, almost 100mb uncompressed. Even with only a couple mb of data SQLite still has a startup time of a couple hundred milliseconds on my phone. But that's a couple hundred milliseconds when loading a database that's already decompressed. When loading 100mb SQLite usually took a second or so which I didn't really like for a pwa.
It took me quite a few attempts to get something faster than SQLite.
My new format loads instantly because I'm just casting the data to a struct. The only thing that takes time is decompressing, but that's still faster than loading the uncompressed via SQLite. My phone loads 100mb from 20mb compressed in about 400ms.
But writing my own format gives other benefits like being able to extract all the HTML tags and capital letters beforehand for fast and sensible search and reconstructing it on render. It's also just way easier for me to edit tsvs with markers for what parts are indexed and have that transformed into an indexed format with 3 indexes.
Also, with SQLite I was just running one module, but with my new format I'm running about 20 instances of it because it keeps the data nicer, more manageable and makes everything very parallel. Though I keep the number of web workers to 2 because it doesn't seem to benefit much to increase it more.
This is really cool. I'm working on stuff that is somewhat aligned with this - offline knowledge base/educational platform focused on things like appropriate technologies for rural people in the developing world. Storing in the browser and, more importantly, searching it is definitely one of the major challenges. (it's also just a much more dynamic app)
My main question about this is whether it can be dynamically/incrementally updated within the browser? Eg new material is available or edits have been made, so sync it from backend and it gets merged in.
I've been working on using rxdb to sync and store in browser - it can use its own indexeddb abstraction, sqlite or it's own OPFS-based DB. It can also load any of these into memory in its memory-mapped mechanism. I've also made a mechanism to load everything into flexsearch in a sharedworker, so that you can do full text search fairly performantly.
It's a lot of complexity though. I'd be curious to hear any of your thoughts. Or even to chat if you're open to it!
I'm not sure I follow exactly, but if I understand you mean that when the database file is updated that the app updates? Right now on app load it updates the service worker and shows the files in cache first. If there's a newer file it fetches it in the background, it then sends a message to the client that there is a new file. I haven't implemented the next part yet but it should be able to invalidate the current file and load the new file without refreshing the page. Right now the new files will load the refresh after the new service worker is activated.
But the page still had to be refreshed to load the new service worker. I'm looking into ways to cut the time to loading the new files down because right now you have to refresh the page 3 times for the new files to take over.
The .peak files aren't designed to be a database that you can just add to during runtime, they're rather static and highly efficient in that context. But it's easy to edit the source files and generate a new .peak file from that.
You can take a folder of any kind of files and run peakgen on it and it will create a compressed .slab file that you can search and fetch results from just like the .peak files. I first saw that done with SQLite and I really liked it, so I knew I could do it too.
i also used fragment technique for sharing html snippets but url's became very long, i had to implement optional url shortener after users complained. Unfortunately that meant server interaction.
https://sdocs.dev/s/{short id}#k={encryption key}
└────┬───┘ └───────┬──────┘
│ │
sent to never leaves
server your browser
We encrypt your document client side. The encrypted document is sent to the server with an id to save it against. The encryption key stays client side in the URL fragment. (And - probably very obviously - the encryption key is required to make the sever stored text readable again).
You can test this by opening your browser's developer tools, switch to the Network tab, click Generate next to the "Short URL" heading, and inspecting the request body. You will see a base64-encoded blob of random bytes, not your document.
Re URL length: Yes... I have a feeling it could become an issue. I was wondering if a browser extension might give users the ability to have shorter urls without losing privacy... but haven't looked into it deeply/don't know if it would be possible (browser extensions are decent bridges between the local machine and the browser, so maybe some sort of decryption key could be used to allow for more compressed urls...)
i doubt it would be possible, it boils down to compression problem compressing x amount of content to y bits, since content is unpredictable it cannot be done without having intermediary to store it.
wow that looks beautiful. I want to migrate my hugo site to astro because astro is easier to manage and more friendly to local llm's than hugo but postponing it.
interesting, i wanted something like this but i am on linux so i modified whisper example to run on cli. Its quite basic, uses ctrl+alt+s to start/stop, when you stop it copies text to clipboard that's it. Now its my daily driver https://github.com/newbeelearn/whisper.cpp
https://easyanalytica.com/tools/html-playground/
reply