> however, buying brand new, they are the best VRAM per dollar in the NVIDIA world as far as I could see.
3060 12gb is cheaper upfront and a viable alternative. 3090ti used is also cheaper $/vram although a power hog.
4060 16gb is a nice product, just not for gaming. I would wait for price drops because Nvidia just released the 4070 super which should drive down the cost of the 4060 16gb. I also think the 4070ti super 16gb is nice for hybrid gaming/llm usage.
- motherboards and CPUs have a limited number of PCIe lanes available. I went with a second-hand Threadripper 2920x to be able to have 4 GPU's in the future. since you can only fit so many GPUs, your total available VRAM and future upgrade capacity is overall limited. these decisions limit me to PCIe gen 3x8 (motherboard only supports PCIe gen 3, and 4060Ti only supports 8 lanes), but I found that it's still quite workable. during regular inference, mixtral 8x7b at 4-bit GPTQ quant using vLLM can output text faster than I can read (maybe that says something about my reading speed rather than the inference speed, though). I average ~17 tokens/second.
- power consumption is big when you are self-hosting. not only when you get the power bill, but also for safety reasons. you need to make sure you don't trip the breaker (or worse!) during inference. the 4060Ti draws 180W at max load. 3090's are also notorious for (briefly) drawing well over their rated wattage, which scared me away.
3060 12gb is cheaper upfront and a viable alternative. 3090ti used is also cheaper $/vram although a power hog.
4060 16gb is a nice product, just not for gaming. I would wait for price drops because Nvidia just released the 4070 super which should drive down the cost of the 4060 16gb. I also think the 4070ti super 16gb is nice for hybrid gaming/llm usage.