For anyone who hasn't tried local models because they think it's too complicated...

vunderba · 2024-09-21T18:22:25 1726942945

If you're gonna go with a VS code extension and you're aiming for privacy, then I would at least recommend using the open source fork VS Codium.

https://vscodium.com/

unethical_ban · 2024-09-21T18:29:19 1726943359

It is true that VS Code has some non-optional telemetry, and if VS Codium works for people, that is great. However, the telemetry of VSCode is non-personal metrics, and some of the most popular extensions are only available with VSCode, not with Codium.

wkat4242 · 2024-09-21T18:48:10 1726944490

> and some of the most popular extensions are only available with VSCode, not with Codium

Which is an artificial restriction from MS that's really easily bypassed.

Personally I don't care whether the telemetry is identifiable. I just don't want it.

noman-land · 2024-09-21T18:54:28 1726944868

How is it bypassed?

wkat4242 · 2024-09-21T18:56:52 1726945012

There's a whitelist identifier that you can add bundle IDs to, to get access to the more sensitive APIs. Then you can download the extension file and install it manually. I don't have the exact process right now but just Google it :)

jjnoakes · 2024-09-22T17:42:21 1727026941

It is not just a technical limitation, it is a license limitation too, for what it is worth.

arcanemachiner · 2024-09-22T03:40:39 1726976439

Good to know. I never got too far with VSCodium because of this limitation.

eMPee584 · 2024-09-22T09:52:09 1726998729

To switch the extension market place to the non-free one, some URLs need to be swapped in a product.JSON file, as in this script: https://github.com/fsfw-dresden/usb-live-linux/commit/3e4fd4...

SoothingSorbet · 2024-09-22T01:22:59 1726968179

> However, the telemetry of VSCode is non-personal metrics

I don't care, I don't want my text editor to send _any_ telemetry, _especially_ without my explicit consent.

> some of the most popular extensions are only available with VSCode

This has never been an issue for me, fortunately. The only issue is Microsoft's proprietary extensions, which I have no interest in using either. If I wanted a proprietary editor I'd use something better.

aftbit · 2024-09-22T01:33:34 1726968814

I dropped VSCode when I found out that the remote editing and language server extensions were both proprietary. Back to vim and sorry I strayed.

causal · 2024-09-23T18:56:19 1727117779

Making the remote editing extension closed is particularly frustrating, as you have little visibility into what it's doing and it is impossible to debug obscure errors

all2 · 2024-09-22T02:49:01 1726973341

Jetbrains is pretty ok on this front. I've been enjoying using my beefy computer to do work from my potato laptop.

prometheon1 · 2024-09-22T11:23:06 1727004186

The way I read it, the message you replied to was a complaint about parts of VSCode being proprietary. Do you mean to say Jetbrains is pretty ok on the "not being proprietary" front?

aftbit · 2024-09-23T17:23:19 1727112199

Yeah, 100%. I'm not a hardcore FOSS only person, but for my core workflow, when a FOSS tool exists and works well, I am not likely to use a proprietary alternative if I can avoid it at all.

So yeah, I'll use Excel to interoperate with fancy spreadsheets, but if LibreOffice will do the job, I'll use it instead. I tried out several of the fancy proprietary editors at various times (SublimeText, VSCode, even Jetbrains), but IMO they were not better _enough_ to justify switching away from something like vim, which is both ubiquitously available and FOSS.

poincaredisk · 2024-09-21T23:56:25 1726962985

>the telemetry of VSCode is non-personal metrics

But I don't want it. I want my software to work for me, not against me.

>and some of the most popular extensions are only available with VSCode, not with Codium.

I'll manage without them. What's especially annoying is that this restriction is completely artificial.

Having said that, MS did a great job with VsCode and I applaud them for that. I guess nothing is perfect, and I bet these decisions were made by suits against engineer wishes.

neoberg · 2024-09-22T14:04:03 1727013843

> But I don't want it. I want my software to work for me, not against me.

How is said software working "against" you by collecint non-personal telemetry while purpose of that telemetry usually is making the software better for most users?

noman-land · 2024-10-04T19:28:33 1728070113

You just need to swap out some nouns and the offense will become more obvious.

"How is that chair working 'against' you by collecting 'non-personal' sitting patterns tagged with timestamps and information about the chair and house that it's in while the purpose of that data collection 'usually' is making the chair better for other people?"

When I use a product, I'm not implicitly inviting the makers of that product to perpetually monitor my usage of the product so that they can make more money based on my data. In any other part of life other than software, this would be an obscene assumption for a product maker to make. But in software, people give it a pass.

No.

This type of data collection is obscene when informed consent is not clearly and authoritatively acquired in advance.

akimbostrawman · 2024-09-24T10:10:23 1727172623

>usually is making the software better for most users?

that usually hasn't been the case since at least a decade. it's truly bewildering that someone especially on hackernews would voluntarily give big tech there finger and not expect to get bitten.

tripzilch · 2024-09-25T09:17:11 1727255831

Software did absolutely not get any better after corporations started adding telemetry to their software.

Point in case: Software actually got worse.

Second point in case: Great software and editors have been built without telemetry for decades.

spl757 · 2024-09-22T20:03:25 1727035405

Why did you use quotataion marks around that particular word?

qwezxcrty · 2024-09-21T20:35:58 1726950958

From the documentation (https://code.visualstudio.com/docs/getstarted/telemetry ) it seems there is a supported way to completely turn off telemetry. Is there something else in VSCode that doesn't respect this setting?

lr1970 · 2024-09-22T10:38:21 1727001501

From the documentation you linked above:

> extensions may be collecting their own usage data and are not controlled by the telemetry.telemetryLevel setting. Consult the specific extension's documentation to learn about its telemetry reporting and whether it can be disabled.

ENGNR · 2024-09-22T11:27:23 1727004443

That's new. Previously there was a setting, but they removed it, and it would even throw a warning in settings.json that the property no longer existed.

They must have reintroduced the telemetry setting. I can't remember if I deleted the old one, but my setting on that new value was set to "all" by default.

metadat · 2024-09-21T23:04:59 1726959899

Not allowing end-users to disable telemetry is actually awful. The gold standard is that IP addresses are considered personally identifiable information.

jaggederest · 2024-09-22T00:30:33 1726965033

> However, the telemetry of VSCode is non-personal metrics,

We know from the body of work in deobfuscation that there's no such thing as "strictly anonymous metrics".

_kidlike · 2024-09-21T17:16:23 1726938983

Or https://ollama.com/

fortyseven · 2024-09-21T21:52:06 1726955526

This has been my go-to for all of my local LLM interaction: it easy to get going, manages all of the models easily. Nice clean API for projects. Updated regularly; works across Windows, Mac, Linux. It's a wrapper around LlamaCpp, but it's a damned good one.

brewtide · 2024-09-21T23:31:19 1726961479

Same here, however minimal. I've also installed openwebui so the instance has a local web interface, and then use tailscale to access my at home LAN when put and about on the cellphone. (Goes16 weather data, ollama, a speed cam setup, and esphome temp sensors around the home / property).

It's been pretty flawless, and honestly pretty darn useful here and there. The big guns go faster and do more, but I'd prefer not having every interaction logged etc.

6core 8th gen i7 I think, with a 1050ti. Old stuff. And it's quick enough on the smaller 7/8b models for sure.

kaoD · 2024-09-21T18:51:28 1726944688

Well this was my experience...

    User: Hey, how are you?
    Llama: [object Object]

It's funny but I don't think I did anything wrong?

AlienRobot · 2024-09-21T19:03:21 1726945401

2000: Javascript is webpages.

2010: Javascript is webservers.

2020: Javascript is desktop applications.

2024: Javascript is AI.

evbogue · 2024-09-21T19:10:34 1726945834

From this data we must conclude that within our lifetimes all matter in the universe will eventually be reprogrammed in JavaScript.

mnky9800n · 2024-09-21T19:43:51 1726947831

I'm not sure I want to live in that reality.

mortenjorck · 2024-09-21T20:08:00 1726949280

If the simulation hypothesis is real, perhaps it would follow that all the dark matter and dark energy in the universe is really just extra cycles being burned on layers of interpreters and JIT compilation of a loosely-typed scripting language.

AlienRobot · 2024-09-21T20:20:38 1726950038

It's fine, it will be Typescript.

heresie-dabord · 2024-09-23T17:20:24 1727112024

... a reality where everything in software development that was previously established as robust foundation is discarded, only to be re-learned and re-implemented less well while burning VC cash.

anotherjesse · 2024-09-21T20:49:05 1726951745

WAT? https://www.destroyallsoftware.com/talks/wat

Jedd · 2024-09-22T00:51:20 1726966280

Often you'll find there's '-chat-' and '-instruct-' variants of an LLM available.

Trying to chat to an INSTRUCT model will be disappointing, much as you describe.

kaoD · 2024-09-22T01:28:35 1726968515

This was on their example LLaVA 1.5 7b q4 with all default parameters which does not specify chat or instruct... but after the first message it actually worked as expected so I guess it's RLHF'd for chat or chat+instruct.

I don't know if it was some sort of error on the UI or what.

Trying to interrogate it about the first message yielded no results. It just repeated back my question, verbatim, unlike the rest of the chat which was more or less chat-like :shrugh:

HiPHInch · 2024-09-22T07:57:37 1726991857

Thanks for your recommendation! I just ran Llamafile for the first time with a custom prompt on my Windows machine (i5-13600KF, RX6600) and found that it performed extremely slowly and wasn't as smart as ChatGPT. It doesn't seem suitable for productive writing. Did I do something wrong, or is there a way to improve its writing performance?

noman-land · 2024-09-22T16:13:52 1727021632

Local models are definitely not as smart as ChatGPT but you can get pretty close! I'd consider them to be about a year behind in terms of performance compared to hosted models, which is not surprising considering the resource constraints.

I've found that you can get faster performance by choosing a smaller model and/or by using a smaller quantization. You can use other models with llamafile as well. They have some prebuilt ones:

https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file...

You can also search for other llamafiles for other models on HuggingFace by using the llamafile tag.

https://huggingface.co/models?library=llamafile&sort=trendin...

And you can download model weights directly and use them by providing an -m flag to llamafile but that's getting a bit less straightforward.

https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file...

jonnycomputer · 2024-09-23T15:05:19 1727103919

RAM and what GPU you have are big determinants of how fast it will run, and how smart a model you can run. A large amount of RAM and GPU memory is required for larger models without significant slowdown because its much faster if it can keep the entire model in memory. Small models range from 3-8 gigabytes, but a 70B parameter model will be 30-50 gigabytes.

neop1x · 2024-09-23T22:41:27 1727131287

I am running 70B models on M2 Max with 96 GB of RAM and it works very well. As HW evolves, it will become a standard

creata · 2024-09-24T07:21:36 1727162496

Out of curiosity, what degree of quantization are you applying to these 70B models?

neop1x · 2024-09-24T09:35:14 1727170514

Q4_K_S. While not as good as top commercial models like chatgpt, they are still quite capable and I like that there are also uncensored/abliterated models like Dolphin.

xyc · 2024-09-21T23:53:07 1726962787

If anyone is interested in trying local AI, you can give https://recurse.chat/ a spin.

It lets you use local llama.cpp without setup, chat with PDF offline and provides chat history / nested folders chat organization, and can handle thousands of conversations. In addition you can import your ChatGPT history and continue chats with local AI.

giancarlostoro · 2024-09-22T04:53:44 1726980824

I dont see any indication that it runs on Linux, I'll stick to Jan which is free.

ryukoposting · 2024-09-21T20:09:52 1726949392

Not only are they the only future worth living in, incentives are aligned with client-side AI. For governments and government contractors, plumbing confidential information through a network isn't an option, let alone spewing it across the internet. It's a non-starter, regardless of the productivity bumps stuff like Copilot can provide. The only solution is to put AI compute on a cleared individual's work computer.

creata · 2024-09-22T01:28:41 1726968521

Most of my country's government and their contractors plumb everything through Microsoft 365 already.

CooCooCaCha · 2024-09-22T00:04:15 1726963455

> plumbing confidential information through a network isn't an option

So do you think government doesn't use networks?

ryukoposting · 2024-09-23T02:05:40 1727057140

Sneakernets are pervasive in environments that handle classified information. If something like that gets moved through a network, it's rarely leaving one physical room unless there's some seriously exotic hardware involved - "general dynamics MLS" is a great search prompt if you're curious what that looks like.

wkat4242 · 2024-09-21T18:37:30 1726943850

Yeah I set up a local server with a strong GPU but even without that it's ok, just a lot slower.

The biggest benefits for me are the uncensored models. I'm pretty kinky so the regular models tend to shut me out way too much, they all enforce this prudish victorian mentality that seems to be prevalent in the US but not where I live. Censored models are just unusable to me which includes all the hosted models. It's just so annoying. And of course the privacy.

It should really be possible for the user to decide what kind of restrictions they want, not the vendor. I understand they don't want to offer violent stuff but 18+ topics should be squarely up to me.

Lately I've been using grimjim's uncensored llama3.1 which works pretty well.

threecheese · 2024-09-21T20:34:50 1726950890

Any tips you can give for like minded folks? Besides grimjim (checking it out).

wkat4242 · 2024-09-22T01:58:19 1726970299

Well you can use some jailbreak prompts but with cloud models it's a cat and mouse game as they constantly fix known jailbreaks. With local models this isn't a problem of course. But I prefer getting a fine-tune model so I don't have to cascade prompts.

Not all uncensored models are great. Some return very sparse data or don't return the end tags sometimes so they keep hallucinating and never finish.

If you import grimjim's model, make sure you use the complete modelfile from vanilla lama3.1, not just an empty modelfile. Because he doesn't provide one. This really helps setting the correct parameters so the above doesn't happen so much.

But I have seen it happen with some official ollama models like wizard-vicuna and dolphin-llama. They come with modelfiles so they should be correct.

wkat4242 · 2024-09-21T20:40:48 1726951248

@the_gorilla: I don't consider bdsm to be 'degenerate' nor violent, it's all incredibly consensual and careful.

It's just that the LLMs trigger immediately on minor words and shut down completely.

a-dub · 2024-09-22T04:11:09 1726978269

idk, i have a pretty powerful laptop with 16GB VRAM and a 3080ti. when i've played with quantized llama2 and llama3 (with llamacpp), it was kinda underwhelming. inference was slow, the laptop would heat up and the results weren't as good. (is llama3.1 better?)

this was with 4bit quantization and offload of as many layers as possible to the gpu.

mafuy · 2024-09-22T04:32:15 1726979535

(A brief note: While not weak, the laptop version of a 3080 Ti is far surpassed by even just a desktop 4060 Ti, which is sold for less than 400$. So it's possible to setup a stronger system relatively cheaply. What's good enough depends on the expectations.)

wiether · 2024-09-22T06:02:40 1726984960

Unless you have special needs like very high usage, privacy or other ones depicted in the article, buying another computer many hundred dollars for the unique purpose of running local models is a hard sell.

If you use their API instead of their sub-based offers, the most popular models are cheap to use and with BYOK tools, switching model is as easy as entering another string in a form.

For instance I put $15 on my OpenAI account in August 2023, since then I used Dall-E weekly and I still got more than $5 credit left!

a-dub · 2024-09-22T04:48:53 1726980533

it seemed to me that the bottleneck mostly revolved around the layers that were in system ram and that a lack of vram was really the gating factor in terms of reasonable inference performance. (although i would imagine that there's probably some more optimization that could be done to make best use of a split vram/sysram setup.)

in any event it was fun to try out, but still didn't seem anywhere near how well the hosted models work. a heavy duty workstation with a bunch of gpus/vram would probably be a different story though.

froggit · 2024-09-24T04:02:13 1727150533

> it seemed to me that the bottleneck mostly revolved around the layers that were in system ram and that a lack of vram was really the gating factor in terms of reasonable inference performance. (although i would imagine that there's probably some more optimization that could be done to make best use of a split vram/sysram setup.)

You could try a model that fits entirely into in VRAM. It"s a trade of precision for a decent bit of performance. 16GB is plenty to work with as i've seen acceptable enough results with 7B models on my 8GB GPU.

zelphirkalt · 2024-09-21T18:17:56 1726942676

Many setup rely on Nvidia GPUs, Intel stuff, Windows or other stuff, that I would rather not use, or are not very clear about how to set things up.

What are some recommendations for running models locally, on decent CPUs and getting good valuable output from them? Is that llama stuff portable across CPUs and hardware vendors? And what do people use it for?

threecheese · 2024-09-21T18:28:41 1726943321

Have you tried a Llamafile? Not sure what platform you are using. From their readme:

  > … by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.

Low cost to experiment IMO. I am personally using MacOS with an M1 chip and 64gb memory and it works perfectly, but the idea behind this project is to democratize access to generative AI and so it is at least possible that you will be able to use it.

narrator · 2024-09-21T18:49:30 1726944570

With 64GB can you run the 70B size llama models well?

threecheese · 2024-09-21T20:37:15 1726951035

I should have qualified the meaning of “works perfectly” :) No 70b for me, but I am able to experiment with many quantized models (and I am using a Llama successfully, latency isn’t terrible)

credit_guy · 2024-09-21T19:37:37 1726947457

No, you can't. I have 128 GB and a 70B llamafile is unusable.

noman-land · 2024-09-21T18:27:07 1726943227

llamafile will run on all architectures because it is compiled by cosmopolitan.

https://github.com/jart/cosmopolitan

"Cosmopolitan Libc makes C a build-once run-anywhere language, like Java, except it doesn't need an interpreter or virtual machine. Instead, it reconfigures stock GCC and Clang to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS with the best possible performance and the tiniest footprint imaginable."

I use it just fine on a Mac M1. The only bottleneck is how much RAM you have.

I use whisper for podcast transcription. I use llama for code complete and general q&a and code assistance. You can use the llava models to ingest images and describe them.

RevEng · 2024-09-22T21:18:43 1727039923

What do you want to use it on?

Ollama works on anything: Windows, Linux, Mac and Nvidia or AMD. I don't know if other cards like Arc are supported by anything yet, bit of it supports the open Vulkan API (like AMD) then it should work.

Every inference server out there supports running from CPU, but realize that it's much slower than running on a GPU - that's why this revolution didn't begin until GPUs became powerful and affordable.

As far as being clear to setup, Ollama is trivial: it's a single command line that only asks what model you want and they provide you with a list on their website. They even have a Docker container if you don't want to worry about installing any dependencies. I don't know what could be easier than that.

Most other tools like LM Studio or Jan are just a fancy UI running llama.cpp as their server and using HuggingFace to download the models. They don't even offer anything beyond simple inference, such as RAG or agents.

I've yet to see anything more than a simple RAG that's available to use out of the box for local use. The only full service tools are online services like Microsoft Copilot or ChatGPT. Anyone else who wants to do that more advanced kind of system ends up writing their own code. It's not hard if you know Python - there are lots of libraries available like HuggingFace, LangChain, and Llama-Index, as well as millions of tutorials (every blog has one).

Maybe that's a sign that there's room for an open source platform for this kind of thing, but given that it's a young field and everyone is rushing to become the next big online service or toolkit, there might not be as much interest from developers to build an open source version of a high quality online service.

distances · 2024-09-21T19:40:45 1726947645

I'm using Ollama with an AMD GPU (7800, 16GB) on Linux. Works out of the box. Another question is then if I get much value out of these local models.

wkat4242 · 2024-09-21T18:49:32 1726944572

Not really. I run ollama on an AMD Radeon Pro and it works great.

For tooling to train models it's a bit more difficult but inference works great on AMD.

My CPU is an AMD Ryzen and the OS Linux. No problem.

I use OpenWebUI as frontend and it's great. I use it for everything that people use GPT for.

senkora · 2024-09-23T04:43:10 1727066590

> For anyone who hasn't tried local models because they think it's too complicated or their computer can't handle it

I have now learned that my laptop is capable of a whopping 0.37 tokens per second.

11th Gen Intel® Core™ i7-1185G7 @ 3.00GHz × 8

hmottestad · 2024-09-23T05:43:45 1727070225

Probably need to try a smaller model :P

When the article says that researchers are using their laptops those researchers are either using very small models on a gaming laptop or they have a fairly modern MacBook with a lot of ram.

There are also options for running open LLMs in the cloud. Groq (not to be confused with Grok) runs Llama, Mixtral and Gemma models really cheaply: https://groq.com/pricing/

senkora · 2024-09-23T13:03:35 1727096615

I'll play around with it some more later. I was running llava-v1.5-7b-q4.llamafile which is the example that they recommend trying first at https://github.com/Mozilla-Ocho/llamafile

Groq looks interesting and might be a better option for me. Thank you.

senkora · 2024-09-23T17:31:31 1727112691

I got better performance of 20.18 tokens per second using tinyllama-1.1b-chat-v1.0.Q8_0.llamafile from https://huggingface.co/Bojun-Feng/TinyLlama-1.1B-Chat-v1.0-l...

If anyone is reading this and had trouble with a larger model, that might be the one to try next.

AustinDev · 2024-09-21T22:19:44 1726957184

https://old.reddit.com/r/LocalLLaMA/ is a great community for this sort of thing as well.

akimbostrawman · 2024-09-24T10:42:20 1727174540

GPT4All for an even easier gui

https://github.com/nomic-ai/gpt4all

amelius · 2024-09-21T23:35:44 1726961744

How do we rate whether the smaller models are any good? How many questions do we need to ask it to know that it can be trusted and we didn't waste our time on it?

mkl · 2024-09-22T00:08:56 1726963736

You should never completely trust any LLM. They all get things wrong, make things up, and have blind spots. They're any good if they help you for some of your particular uses (but may still fail badly for other uses).

amelius · 2024-09-22T01:52:50 1726969970

I think you didn't understand my question and maybe I phrased it poorly. The problem is not whether we should trust any deep learning model (the answer is indeed no). But the question is how we can find out if a model is any good before investing our time into that model. Each bad reply we get has a price, because it wastes our time. So, how can we compare models objectively without having to try them out ourselves first?

abdullahkhalids · 2024-09-22T02:46:08 1726973168

There are leaderboards [1] that can provide a rough estimate of the relative capabilities of different models.

[1] https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

soco · 2024-09-23T13:50:40 1727099440

Fully agree with you, it should, but after trying a few times different llamas I think they're very far from "try it out in just a few moments". Unless all you want is to see one running, for anything beyond that you'll be in dependency hell...

jonnycomputer · 2024-09-23T14:59:56 1727103596

If you just want to chat, download https://lmstudio.ai/, then download their recommended LLM files, and you're good to go. Really that simple these days.

thebiss · 2024-09-23T20:09:32 1727122172

lmstudio prohibits commercial use [1]:

> Subject to the Agreement, Company grants you a limited license to reproduce portions of Company Properties for the sole purpose of using the Services for your personal, non-commercial purposes.

[1] https://lmstudio.ai/terms

noman-land · 2024-09-23T14:39:28 1727102368

There is no dependency hell. It's just a single file. If you want to get into trying different models and various settings, you can use LM Studio, and still no need to worry about dependencies.

m3affan · 2024-09-22T22:15:41 1727043341

But how good are these models compared to gpt4o? My last experience with llama2-8b was not great at all. Are there really that good models that would fit on an average consumer hardware (mine has already 32GB ram and 16GB vram)?

noman-land · 2024-09-22T23:38:30 1727048310

The post you're replying to couldn't have made it any easier to answer these questions yourself. No, it won't be as good as the state of the art with massive cloud infrastructure behind an http api.

privacyis1mp · 2024-09-21T18:00:54 1726941654

I built Fluid app exactly with that in mind. You can run local AI on mac without really knowing what an LLM/ollama is. Plug&Play.

Sorry for the blatant ad, though I do hope it's useful for some ppl reading this thread: https://getfluid.app

halostatue · 2024-09-30T19:41:20 1727725280

Probably not the best choice of names: https://fluidapp.com. I don't know that it's been updated in a while, but it still works nicely.

twh270 · 2024-09-21T18:54:29 1726944869

I'm interested, but I can't find any documentation for it. Can I give it local content (documents, spreadsheets, code, etc.) and ask questions?

privacyis1mp · 2024-09-21T19:05:48 1726945548

> Can I give it local content (documents, spreadsheets, code, etc.) It's coming roughly in December (may be sooner).

Roadmap is following:

- October - private remote AI (when you need smarter AI than your machine can handle, but don't want your data to be logged or stored anywhere)

- November - Web search capabilities (so the AI will be capable of doing websearch out of the box)

- December - PDF, docs, code embedding. 2025 - tighter MacOS integration with context awareness.

twh270 · 2024-09-21T19:56:29 1726948589

Oh awesome, thank you! I will check back in December.

upcoming-sesame · 2024-09-21T22:54:19 1726959259

I just tried now. Super easy indeed but slow to the point it's not usable on my PC

chaostheory · 2024-09-22T00:51:13 1726966273

You need an RTX 4090 if you want enough speed

leansensei · 2024-09-23T05:38:18 1727069898

Please do not turn your own equipment into unqualified advice. Ollama runs great with a 4070 Super, and very likely also with a 4060.

heyoni · 2024-09-21T17:23:00 1726939380

Isn’t there also some Firefox AI integration that’s being tested by one dev out there? I forgot the name and wonder if it got any traction.

ComputerGuru · 2024-09-21T21:18:32 1726953512

Do you know if whisperfile is akin to whisper or the much better whisperx? Does it do diarization?

noman-land · 2024-09-21T22:06:12 1726956372

Last I checked it was basically just whisper.cpp so not whisperx and no diarization by default but it moves pretty quickly so you may want to ask on the Mozilla AI Discord.

https://discord.com/invite/yTPd7GVG3H

christkv · 2024-09-21T23:12:49 1726960369

I recommend llmstudio for this usually